Advances
in COMPUTERS
VOLUME 52
This Page Intentionally Left Blank
Advances in
COMPUT RS Fortieth Anniversary Volume: Advancing into the 21st Century EDITED BY
M A R V I N V. Z E L K O W I T Z Department of Computer Science and Institute for Advanced Computer Studies University of Maryland College Park, Maryland
V O L U M E 52
ACADEMIC PRESS A Harcourt Science and Technology Company
San Diego San Francisco New York London Sydney Tokyo
Boston
This book is printed on acid-free paper. Copyright ,~ 2000 by ACADEMIC PRESS All Rights Reserved. No part of this publication may be reproduced or transmitted in any form or by any means electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Academic Press A Harcourt Science and Technology Company 525 B Street, Suite 1900, San Diego, California 92101-4495, USA http://www.academicpress.com Academic Press A Harcourt Science and Technology Company 32 Jamestown Road, London NW1 7BY, UK h ttp://www, academicp re ss. c o m ISBN 0-12-012152-2 A catalogue for this book is available from the British Library
Typeset by Mathematical Composition Setters Ltd, Salisbury, UK Printed in Great Britain by Redwood Books, Trowbridge, Wiltshire 00 01 02 03 04 05 RB 9 8 7 6 5 4 3 2 1
Contents CONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii xi
Eras of Business C o m p u t i n g Alan R. Hevner and Donald J. Berndt 1. 2. 3. 4. 5. 6. 7.
A H a l f C e n t u r y of Business C o m p u t i n g . . . . . . . . . . . . . . . . . . . Business C o m p u t i n g Eras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . T h e C o m p u t a t i o n Platform: H a r d w a r e a n d O p e r a t i n g Systems Communication: Computer Networking ................... Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Business System A r c h i t e c t u r e s . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..
2 4 16 43 58 73 81 85
N u m e r i c a l W e a t h e r Prediction Ferdinand Baer 1. 2. 3. 4. 5. 6.
Introduction ...................................... Computational Methods .............................. D a t a Analysis, Assimilation, a n d Initialization . . . . . . . . . . . . . . R e g i o n a l Prediction M o d e l i n g . . . . . . . . . . . . . . . . . . . . . . . . . . Ensemble Prediction Techniques . . . . . . . . . . . . . . . . . . . . . . . . Conclusions ....................................... References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92 102 137 144 148 151 153
M a c h i n e Translation Sergei Nirenburg and Yorick Wilks 1. 2. 3. 4. 5. 6. 7. 8.
Introduction ...................................... Is M a c h i n e T r a n s l a t i o n Impossible? . . . . . . . . . . . . . . . . . . . . . W h a t Sort of C o m p u t a t i o n is M T ? . . . . . . . . . . . . . . . . . . . . . . M a i n P a r a d i g m s for M T - - D i v e r s e Strategies for Solving or N e u t r a l i z i n g the C o m p l e x i t y of L a n g u a g e Use . . . . . . . . . . . . . . T h e E v o l u t i o n of M T O v e r its 50-year H i s t o r y . . . . . . . . . . . . . . Choices a n d A r g u m e n t s F o r a n d A g a i n s t M T P a r a d i g m s . . . . . . M T in the Real W o r l d . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Current Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V
160 161 163 165 169 173 180 185
vi
CONTENTS
9. C o n c l u s i o n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References ........................................
186 187
The Games Computers (and People) Play Jonathan Schaeffer 1. 2. 3. 4.
Introduction ...................................... Advances ........................................ A d v a n c e s in C o m p u t e r G a m e s . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions ....................................... Acknowledgments .................................. References ........................................
190 191 211 261 262 262
From Single Word to Natural Dialogue Neils Ole Benson and Laila Dybkjaer 1. 2. 3. 4.
Introduction ...................................... Task-oriented Spoken Language Dialogue Systems ........... M a n a g i n g the D i a l o g u e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion ....................................... References ........................................
269 271 278 325 325
Embedded Microprocessors: Evolution, Trends and Challenges Manfred Schlett 1. 2. 3. 4. 5. 6. 7. 8.
Introduction ...................................... T h e 32-bit E m b e d d e d M a r k e t p l a c e . . . . . . . . . . . . . . . . . . . . . . General Microprocessor and Technology Evolution .......... Basic P r o c e s s o r C l a s s i f i c a t i o n . . . . . . . . . . . . . . . . . . . . . . . . . . Processor Architectures ............................... Embedded Processors and Systems ...................... The Integration Challenge ............................ Conclusion ....................................... References ........................................
330 332 337 342 348 363 374 376 377
AUTHOR INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
381
SUBJECT INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
389
CONTENTS OF VOLUMES IN THIS SERIES . . . . . . . . . . . . . . . . . . . .
401
Contributors Ferdinand Baer received his professional training from the Department of Geophysical Sciences at the University of Chicago from which he graduated in 1961. He began his academic career as an Assistant Professor at Colorado State University where he was one of the founding members of their Department of Atmospheric Science. In 1971 he took a position as Professor at the University of Michigan, and in 1977 he moved to the University of Maryland where he took on the chairmanship of the newly created Department of Meteorology. In 1987 he retired from his administrative post to devote himself to teaching and research as a professor in the same department. During his tenure at the various universities, Professor Baer was a WMO (World Meteorological Organization) expert to India, a research fellow at the GFDL (Geophysical Fluid Dynamics Laboratory) laboratory of Princeton University, a Visiting Professor at the University of Stockholm and the Freie University of Berlin and occasionally a summer visitor at NCAR (National Center for Atmospheric Research). His research interests span a variety of topics including atmospheric dynamic, numerical weather prediction, numerical analysis, initialization, spectral methods, atmospheric energetics, gravity waves, and high performance computing applications. He is a member of a number of professional societies and a fellow of the American Meteorological Society, the Royal Meteorological Society, and the American Association for the Advancement of Science. He has directed to completion the PhD research of 15 students and has several more in line. He has, or has had, research support from NSF, NASA, DOE, NOAA (National Oceanographic and Atmospheric Administration), and DOD. In support of his community he has served on a variety of boards and committees which included NAS/BASC (National Academy of Sciences/ Board on Atmospheric Sciences and Climate), two terms as a UCAR trustee, member representative to UCAR (University Corporation for Atmospheric Research) from UMCP (University of Maryland at College Park), and most recently, chair of the AAAS Section on Atmospheric and Hydrospheric Sciences. Donald J. Berndt is an Assistant Professor in the Information Systems and Decision Sciences Department in the college of Business Administration at the University of South Florida. He received his MPhil and PhD in Information Systems from the Stern School of Business at New York University. He also holds a MS in Computer Science from the State
vii
viii
CONTRIBUTORS
University of New York at Stony Brook and a BS in Zoology from the University of Rhode Island. Dr. Berndt's research and teaching interests include the intersection of artificial intelligence and database systems, knowledge discovery and data mining, data warehousing, software engineering methods, and parallel programming. He was a research scientist at Yale University and Scientific Computing Associates, where he participated in the development of commercial versions of the Linda parallel-programming environment. He also developed artificial intelligence applications in academic settings and at Cognitive Systems, Inc. In 1993-95 Dr. Berndt was a lecturer in the Computer Science Department at the State University of New York at Stony Brook and taught several courses for the Stern School of Business at New York University. He is a member of Beta Gamma Sigma, AAAI, ACM, and AIS (Association for Information Systems). Niels Ole Bernsen is Director of the Natural Interactive Systems Laboratory and Professor of Engineering at the University of Southern Denmark - - Odense. His research interests include interactive speech systems and natural interactive systems more generally, systems for communities, design support tools, usability engineering, modality theory and multimodality, and best practice in systems development and evaluation. He is Coordinator of the European Network for Intelligent Information Interfaces (i3net) and a member of the Executive Board of the European Language and Speech Network (Elsnet), and takes part in several European collaborative research projects in the above-mentioned research areas. He has authored and edited 10 books and is the author of more than 300 papers and reports. Laila Dybkja~r is a senior researcher at the Natural Interactive Systems Laboratory at the University of Southern Denmark. Her research interests include intelligent user interfaces, interactive speech systems, usability design, best practice, evaluation, dialogue model development, dialogue theory, corpus analysis, and multimodal systems. She received an MS and a PhD in computer science from the University of Copenhagen. Alan R. Hevner is an Eminent Scholar and Professor in the Information Systems and Decision Sciences Department in the College of Business Administration at the University of South Florida. He holds the Salomon Brothers/Hidden River Corporate Park Chair of Distributed Technology. Dr. Hevner's areas of research interest include information systems development, software engineering, distributed database systems, healthcare information systems, and telecommunications. He has published over 75 research papers on these topics and has consulted for several Fortune 500
CONTRIBUTORS
ix
companies. Dr. Hevner received a PhD in Computer Science from Purdue University. He has held faculty positions at the University of Maryland and the University of Minnesota. Dr. Hevner is a member of ACM, IEEE, AIS, and INFORMS (Institute for Operations Research and the Management Sciences). Sergei Nirenburg is Director of the Computing Research Laboratory and Professor of Computer Science at New Mexico State University. Dr. Nirenburg has written or edited 6 books and has published over 130 articles in various areas of computational linguistics and artificial intelligence. He has founded and is Steering Committee chair of a series of scientific conferences on Theoretical and Methodological Issues in Machine Translation, the eighth of which took place in August 1999 in Chester, UK. Between 1987 and 1996 he was Editor-in-Chief of the journal Machine Translation. He is a member of the International Committee on Computational Linguistics (ICCL). Jonathan Schaeffer is a professor of computing science at the University of Alberta (Edmonton, Canada). His BSc is from the University of Toronto, and his MMath and PhD degrees are from the University of Waterloo. His major research area is in artificial intelligence, using games as his experimental testbed. He is the principal author of the checkers program Chinook, the first program to win a human world championship in any game in 1994. He received an NSERC E.W.R. Memorial Fellowship in 1998. Manfred Schlett is currently working as Product Manager at Hitachi Europe GmbH. He received a diploma in technical mathematics in 1991 and a PhD in mathematics in 1994 for his work in numerical semiconductor device simulation, both from the University of Karlsruhe. In 1995 he joined hyperstone electronics as a VLSI design engineer working on the DSP integration into the 32-bit RISC hyperstone E1 architecture. Later he became a project and marketing manager at hyperstone. In 1998, he joined Hitachi Europe's marketing team focusing on Hitachi's 32-bit SuperH series. His research interests include microprocessor architectures, advanced VLSI design, signal processing, and multimedia. He has published several articles on numerical semiconductor device simulation, design of embedded microprocessors, and the unification of RISC and DSP architectures.
Yorick Wilks is Professor of Computer Science at the University of Sheffield and Director of the Institute of Language, Speech and Hearing (ILASH). He has published numerous articles and five books in that area of artificial intelligence, of which the most recent are Artificial Believers (with Afzal
X
CONTRIBUTORS
Ballim) from Lawrence Erlbaum Associates (1991) and Electric Words." Dictionaries, Computers and Meanings (with Brian Slator and Louise Guthrie), MIT Press (1995). He is also a Fellow of the American Association for Artificial Intelligence, and a member of the International Committee on Computational Linguistics (ICCL).
Preface to Volume 52" 40th Anniversary issue Advancing into a new century
Humanity is often distinguished from other animals by its ability, even its need, to see patterns in everyday life. As the 20th century draws to a close and we enter a new millennium according to the calendar, all aspects of society seem to want to take stock of what has happened in the past and what is likely to happen in the future. The computer industry is no different from others. The Advances in Computers series has been published continuously since 1960 and this year's volume is the 50th technical volume (if you ignore the two index volumes that have been produced) in the series. Since it is the 40th year of publication, and is being published in the year 2000, if you believe in numerology, since 40 times 50 is 2000, all the signs point to something special for this edition. As we enter the 21 st century, we decided to look back on the changes that have occurred since Volume 1 of Advances in Computers appeared in 1960. We looked at the six chapters of that initial volume and decided that an appropriate anniversary volume for this series would be a series of papers on the same topics that appeared in 1960. What has happened to those technologies? Are we making the progress we thought we would or are events moving more slowly? To refresh your memory, Volume 1 of the Advances contained the following chapters: 1. General-purpose programming for business applications, by Calvin C. Gotlieb 2. Numerical weather prediction, by Normal A. Phillips 3. The present status of automatic translation of languages, by Yehoshua Bar-Hillel 4. Programming computers to play games, by Arthur T. Samuel 5. Machine recognition of spoken words, by Richard Fatehchand 6. Binary arithmetic, by George W. Reitwiesner. We hope that the chapters included in this present volume will give you an appropriate current perspective on these technologies.
xi
xii
PREFACE TO VOLUME 52
In Volume 1, C. C. Gotlieb discussed business data processing. It is strange to think that this chapter predates the era of COBOL, while the first chapter of the present volume is describing a post-COBOL world. Alan Hevner and Donald Berndt, in their chapter entitled "Eras of business computing," give a history of business data processing that goes through the evolution of technology from the large mainframe processor to today's world wide web based electronic commerce (e-commerce). It seems clear that at least in the short term, web-based applications using an object oriented design process will dominate business applications. In the second chapter, Ferdinand Baer updates our knowledge of numerical weather prediction. In 1960, weather prediction was rather primitive; lack of computing power was a known primary problem. Today's machines are orders of magnitude faster, and weather prediction up to 14 days in advance, long regarded as the maximum length of time to make a prediction, is becoming a reality. Today's models are becoming more explicit, and the reduction of forecasting errors is on the horizon. In the third chapter, Sergei Nirenburg and Yorick Wilks update our knowledge of machine translation of natural language. In 1960 the prevailing opinion was that machine translation was important, but probably impossible to achieve in all cases. Although some people still hold that opinion, great advances have been made. Machine translation is now an economic necessity in our international economic community, and many routine documents can now be translated automatically. In 1960, Arthur Samuel wrote about his famous checkers program, the first successful game-playing computer program. In the fourth chapter of this volume, Jonathan Schaeffer updates our knowledge of computer game playing. Chess programs are now rated at the grandmaster level and have even succeeded in competing successfully against grandmasters. Schaeffer describes current search strategies in game-playing programs, and updates our knowledge on computer programs that play games such as backgammon, bridge, checkers, chess, Othello, poker, and Scrabble. In the fifth chapter, "From single word to natural dialogue" by N. O. Bernsen and L. Dybkja~r, the authors discuss spoken dialogue systems. In 1960 a speech recognition system could recognize about 10 spoken words, whereas today you can purchase systems that can recognize about 5000 words, and with some training, the systems can be taught to recognize about 60 000 words. Here is one research area that may be on the verge of going out of business as industrial competitiveness is leading to further and further advances in incredibly short periods of time. In the final chapter in 1960, George Reltwiesner discussed the ability to build better algorithms to process binary arithmetic. How to do addition, multiplication, division, extraction of square roots, etc., faster and more
PREFACE TO VOLUME 52
.
.
.
Xlll
efficiently was the major hardware design issue in these early machines. Today, we are not so concerned about such issues; we believe we have optimal or almost optimal instructions for such procedures. Of more concern is to embed computers so that they can work efficiently as part of our industrial society. In the final chapter of this present volume, "Embedded microprocessors: Evolution, trends, and challenges," Manfred Schlett discusses the role of the microprocessor as part of an embedded architecture, and how technologies such as reduced instruction set computers (RISC) are allowing hardware designers to build ever faster processors. We hope that you enjoy these chapters and that they will provide a graphic demonstration of how our industry has evolved from 1960 to today. If you have any suggestions for chapters in future volumes, please contact me at
[email protected]. MARVIN ZELKOWITZ University of Maryland, College Park, Maryland Fraunhofer Center for Experimental Software Engineering, College Park, Maryland
This Page Intentionally Left Blank
Eras of Business Computing ALAN R. HEVNER AND DONALD J. BERNDT Information Systems and Decision Sciences College of Business Administration, University of South Florida Tampa, FL 33620 USA {ahevner, dberndt}@coba.usf.edu
Abstract The past half-century has seen amazing progress in the use of information technology and computer systems in business. Computerization and communication technologies have truly revolutionized the business organization of today. This chapter presents a structured overview of the evolution of business computing systems through six distinct eras: 9 Era of Calculation 9 Era of Automation 9 Era of Integration and Innovation 9 Era of Decentralization 9 Era of Reengineering and Alignment 9 Era of the Internet and Ubiquitous Computing Advances in each of the major computing technologies--Computational Platform, Communications, Software, and System Architecture--are surveyed and placed in the context of the computing eras. The focus is on how technologies have enabled innovative strategic business directions based on new business system architectures. A key observation is that around 1975 the principal role of the computer in business systems changed from a computation engine to a digital communications platform. We close the chapter by presenting a set of major conclusions drawn from this survey. Within each conclusion we identify key future directions in business computing that we believe will have a profound impact into the 21st century.
1. 2.
A Half Century of Business Computing . . . . . . . . . . . . . . . . . . . . . . Business Computing Eras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Era of Calculation (Before 1950) . . . . . . . . . . . . . . . . . . . . . . . 2.2 Era of Automation (1950-64) . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Era of Integration and Innovation (1965-74) . . . . . . . . . . . . . . . . . 2.4 Era of Decentralization (1975-84) . . . . . . . . . . . . . . . . . . . . . . 2.5 Era of Reengineering and Alignment (1985-94) . . . . . . . . . . . . . . . 2.6 Era of the Internet and Ubiquitous Computing (1995 onwards) . . . . . . .
ADVANCES IN COMPUTERS, VOL. 52 ISBN 0-12-012152-2
2 4 5 7 9 12 14 15
Copyright :~i 2000 by AcademicPress All rights of reproduction in any form reserved.
2 3.
4.
5.
6.
7.
ALAN R. HEVNER AND DONALD J. BERNDT
The Computation Platform: Hardware and Operating Systems . . . . . . . . . . 3.1 Three Classic Computer Hardware Generations . . . . . . . . . . . . . . . 3.2 The Role of Universities and Military Research . . . . . . . . . . . . . . . 3.3 Twin Roads: Scientific Calculation and Business Automation . . . . . . . . 3.4 The Rise of the General Purpose Computer . . . . . . . . . . . . . . . . . 3.5 Computing Means Business . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Computers Leave the Computer Room . . . . . . . . . . . . . . . . . . . 3.7 Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Communication: Computer Networking . . . . . . . . . . . . . . . . . . . . . . 4.1 ARPA: Advanced Research Projects Agency . . . . . . . . . . . . . . . . . 4.2 Packet Switched Networking . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 A R P A N E T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Xerox PARC: The Office of the Future . . . . . . . . . . . . . . . . . . . 4.5 LANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Internetworking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 LANs, WANs, and the Desktop . . . . . . . . . . . . . . . . . . . . . . . Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Algorithmic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Data: File Systems and Database Systems . . . . . . . . . . . . . . . . . . 5.3 H u m a n - C o m p u t e r Interaction (HCI) . . . . . . . . . . . . . . . . . . . . 5.4 Software Development Processes and Methods . . . . . . . . . . . . . . . . 5.5 Software Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Business System Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Manual Business Processes . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Mainframe Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 On-Line, Real-Time Architectures . . . . . . . . . . . . . . . . . . . . . . 6.4 Distributed, Client-Server Architectures . . . . . . . . . . . . . . . . . . . 6.5 Component-Based Architectures . . . . . . . . . . . . . . . . . . . . . . . 6.6 Web-Based Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Computers as Tools of Business . . . . . . . . . . . . . . . . . . . . . . . 7.2 Miniaturization of the Computational Platform . . . . . . . . . . . . . . . 7.3 Communications Inflexion Point . . . . . . . . . . . . . . . . . . . . . . . 7.4 Growth of Business Information and Knowledge . . . . . . . . . . . . . . . 7.5 Component-Based Software Development . . . . . . . . . . . . . . . . . . 7.6 MIS and Business System Architecture Synergies . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1. Computers changed
A Half Century of Business Computing and by
business functions management most
the related
o u r w o r l d in m a n y
revolutionized
arts and
use
the introduction of manufacturing, of
sciences of digital computing
have
w a y s . O v e r t h e p a s t 50 y e a r s , b u s i n e s s h a s b e e n
have been redefined
effective
16 17 17 18 24 27 30 35 43 45 46 48 51 55 56 58 58 58 62 65 68 73 73 75 75 76 77 79 80 81 81 82 82 83 84 85 85
innovative
of computing accounting,
and reengineered computing
machinery. marketing many
strategies
Traditional
and sales, and
times to make and
the
information
ERAS OF BUSINESS COMPUTING
3
technologies. The great business visionaries of the first half of the 20th century (e.g., Ford, Rockefeller, Morgan, Vanderbilt, Mellon) would be overwhelmed by the pervasive use of computing technology in today's businesses. Computers and information technology are central to our understanding of how business is conducted. The invention and development of the original electronic computers occurred during the final years of World War II and the late 1940s. At first, computers were employed as ultra-fast calculators for scientific equations and engineering applications. Only with the advent of stored programs and efficient data storage and file-handling capabilities did their use for business applications become apparent. Albeit slowly at first, computers became more and more accessible to businesses for the automation of routine activities, such as sorting, searching, and organizing business data. Then, with increasing rapidity, computers were applied to all functions of the business, from assembly line production (e.g., robotics) to the highest levels of executive management (e.g., decision support systems). Without question, the dominant use of computing technology today is for business applications. This chapter presents a necessarily brief survey of how business computing has evolved over the past 50 years. We organize this survey along two dimensions:
Eras of Business Computing: We divide the timeline of business computing into six eras distinguished by one or more dominant themes of computer use during that era. Section 2 describes each of the eras and its dominant themes. Business Computer System Technologies: An integrated business computer system is composed of three essential technology components: - the computational platform (i.e., the hardware and systems software) - telecommunications (i.e., data transmission and networking) - the software (i.e., programming and application software). A business system architecture integrates these three components into a
functional, effective business system. Figure 1 illustrates the inter-relationships of the technologies in a business system. Sections 3-6 cover the evolution of these respective technologies through the six business computing eras. We conclude the chapter by presenting a summary of the major conclusions we draw from this survey of business computing. Within each conclusion we identify key future directions in business computing that we believe will have a profound impact into the 21st century.
4
ALAN R. HEVNER AND DONALD J. BERNDT
Sotware
Computational platform FIG.
Teleaommunications
1. Technologycomponents of a business computer system.
2.
Business C o m p u t i n g Eras
A business is a pattern of complex operations in the lives of people concerning all the functions that govern the production, distribution, and sale of goods and services for the benefit of the buyer and the profit of the seller. Businesses and essential business processes have existed since the beginning of civilization. The ability to manipulate and manage information has always been a critical feature of all business processes. Even before number systems were invented, shepherds would count their flocks by using a pebble to represent each animal. Each morning as animals left the fold to graze, the shepherd would place a pebble in his pocket for each animal. Then in the evening he would remove a pebble for each animal entering the fold. If any pebbles remained in his pocket, the shepherd knew that he would need to search for those who were lost. ~ Throughout the history of business, personnel and techniques for managing information have been central to all business processes. In the early part of this century, the term computer was defined in the Oxford English Dictionary as "one who computes; a calculator, reckoner; specifically a person employed to make calculations in an observatory, in surveying, etc." Thus, computers, as people, have always been with us. Several important mechanical tools and techniques to support the task of computing have evolved over the millennia, such as the abacus, the slide rule, adding machines, punched cards, and filing systems. Charles Babbage's Difference Engine, circa 1833, was a particularly interesting and significant example of a mechanical device that aided in the calculation and printing of I This example is taken from an exhibit in Boston's Computer Museum.
ERAS OF BUSINESS COMPUTING
5
large tables of data [1]. A limited ability to program the Difference Engine to perform different tasks foreshadowed the stored program concept of modern computers. The coming of the electronic computer during the 1940s changed the popular conception of a computer from a person to machine. From that point on the history of business computing has been one of rapid technical advances and continual evolution marked by several revolutionary events that divide the past 50 years into 6 distinct eras. Table I proposes a classification of Business Computing Eras. In the remainder of this section we discuss the major business computing themes found in each of the eras.
2.1
Era of Calculation (Before 1950)
It is possible to trace the links between the Moore School and virtually all the government, university, and industrial laboratories that established computer projects in America and Britain in the late 1940s. [2] Before 1945, humans as computers performed manually all of the essential business processes in organizations. They managed business data, performed calculations on the data, and synthesized the data into business information for decision-making. Human computers were aided by sophisticated mechanical tools to perform both routine and complex business tasks. However, the limitations of human capabilities to deal efficiently with large amounts of information severely constrained business effectiveness. There are several excellent references on this era of business computing [3, 4, 5]. In addition to Babbage's Difference Engine, several other business computing milestones stand out: 9 The 1890 Census and the Hollerith Card [6]: Herman Hollerith invented a punched-card record keeping system that was employed in tabulating the 1890 US census. The system was an overwhelming success. The quality of the data was improved, the census was completed in 2.5 years compared with 7 years for the previous census, and the cost was significantly reduced. Hollerith became a successful entrepreneur with his punched-card systems, founding a company that eventually led to the beginning of International Business Machines (IBM). 9 The Mechanical Office: Essential business processes were performed on innovative devices such as adding machines, typewriters, and Dictaphones. The mechanical office was considered a marvel of efficiency in this era before electronic computers became business tools.
6
ALAN R. HEVNER AND DONALD J. BERNDT
TABLE I BUSINESS COMPUTING ERAS Era
Years
Themes
Era of Calculation Digital computers for scientific calculation
< 1950
Era of Automation General purpose computers for business
1950-64
Era of Integration and Innovation Hardware and software for business solutions
1965-74
Era of Decentralization Communication dominates computation
1975-84
Era of Reengineering and Alignment Effective utilization of the technology spectrum to solve business problems
1985-94
Era of the Internet and Ubiquitous Computing Reorganization for the wired world
> 1995
Human computers The mechanical office IBM and punched-card machines NCR and point-of-sale equipment World War II and military research (Harvard Mark I, Whirlwind, ENIAC) UNIVAC and the Census Bureau UNIVAC predicts 1952 election IBM domination and the 700 series Automation of basic business processes with high cost-benefit IBM 1401 and the business solution IBM System/360 redefines the market Software outlives hardware Minicomputers enter the fray Human-computer synergies exploited in business systems Winds of change (integrated circuits, microprocessors, Xerox PARC and the office of the future) Powerful microprocessors Personal computers LANs WANs and the Internet Focus on the desktop PC networking supported All the pieces are in place Using all the pieces System and software architectures WWW and "'.com" Total Quality Management initiatives IT enables the reengineering of critical business processes Alignment of business strategy with information technology strategy Critical mass of desktops The Internet means business Traditional organizational boundaries fall The virtual organization New business models Electronic commerce and the digital economy
ERAS OF BUSINESS COMPUTING
7
9Cash Registers and the National Cash Register Company (NCR): Under John Patterson and Charles Kettering, NCR developed advanced cash register technology and invented the key sales practices of the industry. NCR, through its sales force, introduced point-of-sale computing technology across the US. 9The Founding of IBM [7]: Thomas Watson, Sr. was an early graduate of the NCR sales training school. His business genius brought together a number of fledging office machine companies into International Business Machines (IBM). By 1940, IBM was the leading seller of office machines in the world. The rapid evolution of mechanical business machines was eclipsed by the intense scientific pressures of World War II which resulted in the beginnings of the digital computer. Throughout history, the demands of war have often given rise to scientific advances and the resulting military inventions often proved crucial on the battlefield. World War II, coupled with already advancing technologies in many fields, was a truly scientific war and military research would play a central role in the field of digital computing. The goal of these early computing projects was scientific computing and the construction of flexible and fast calculators. The critical role of military research funding is easily recognized by the now famous projects that were initiated by the armed services. The Navy sponsored Howard Aiken (along with IBM) and the Harvard Mark I, the Air Force had Whirlwind at MIT under the direction of Jay Forrester, and the Army sponsored the construction of the ENIAC by Eckert and Mauchly at the University of Pennsylvania. All of these projects made significant contributions in both computer technology and the training of early computer pioneers. Probably the most influential work was that of Eckert and Mauchly on the ENIAC and EDVAC. Through their work, and the consultation of others such as John von Neumann, the fundamental principles of the stored-program computer would emerge, providing the foundation for the modern computers we use today. From a business computing perspective, Eckert and Mauchly were among the first to recognize the general-purpose nature of computers and would pursue their vision in a commercial form by developing the UNIVAC--marking the transition to the Era of Automation.
2.2
Era of Automation (1950-64)
When Eckert and Mauchly established their Electronic Control Company in March 1946, they were almost unique in seeing the potential for computers in business data processing, as opposed to science and engineering calculations. [2]
8
ALAN R. HEVNER AND DONALD J. BERNDT
The invention of the electronic computer and its realization in the ENIAC at the University of Pennsylvania Moore School by J. Presper Eckert and John Mauchly circa 1945 led to the first computer start-up firm and the UNIVAC. As the first UNIVAC became operational and passed the Census Bureau acceptance tests in 1951, the Era of Automation began. Over the next two decades, electronics companies (e.g., RCA, GE, Honeywell) and business machine companies (e.g., IBM, Burroughs, Control Data, NCR) scrambled to become leaders in the computer field. For a time, the UNIVAC became synonymous with modern business computing, enjoying public successes such as the 1952 Eisenhower election prediction televised on CBS. However, IBM would come to dominate the industry by the middle 1950s when the IBM installed computer base first exceeded that of the UNIVAC. Soon the computer industry was characterized as IBM and the seven dwarves, later to become IBM and the BUNCH as acquisitions and retrenchments reduced the competitors to five: Burroughs, UNIVAC, NCR, Control Data, and Honeywell. IBM would remain at the pinnacle for decades, defining most business computing trends, and continues to this day to be a powerful force in the industry. Two important themes emerged during this era of business computing 9 First, the essential technologies of hardware and software evolved rapidly as companies invested in computer-related research and development. 9 Secondly, the application of technology was conservative, targeting high-payoff, low-risk business processes, due to the difficulties and expense of constructing hardware intensive information systems. The computing platforms of the day included the IBM 700 series, culminating in the 705 for business computing and the 704 (with FORTRAN) for scientific applications. The high-end 700 series would form the basis for the introduction of a transistorized machine, the IBM 7090, the quintessential room-sized mainframe. At the low end, the IBM 650 used reliable and inexpensive magnetic drum technology to offer an affordable business computer. IBM would apply the lessons learned in providing solutions to the emerging cost-conscious business computing market, developing the IBM 1401 and associated peripherals. Thus, the business solution became the computer. Legacy code developed for the successful model 1401 is still in use as we enter the new millennium. A fascinating snapshot of software-based computer technology as it existed during the Era of Automation (in 1961) is provided by Calvin Gotlieb in his chapter, "General-Purpose Programming for Business Applications," which appeared in the first volume of Advances in Computers
ERAS OF BUSINESS COMPUTING
9
[8]. Gotlieb considered the major data processing problems of the day. These problems still sound familiar: 9 understanding and representing system requirements 9 achieving sufficient speed and storage for business applications 9 performing program checking and maintenance. Gotlieb examined the 1961 state-of-the-art in programming techniques and file management methods. Several example systems were presented (e.g., the IBM 705, Flow-Matic) and the chapter concluded with two future directions of business programming--parallel operations (which we still have not mastered) and improved computer instruction sets. As the era's second theme, organizations began to identify the critical business processes that would adapt most readily to automation by computer. The term automation entered the business lexicon in 1947 via the Ford Motor Company and was given widespread coverage in the business press [9]. For instance, John Diebold defined automation in a book by that title as the application of computer-based feedback mechanisms to business and industrial practice [10]. Resistance to being on the leading (or bleeding) edge of computerization was great, with the cost of new computers remaining high. Only the most obvious business processes with a high costbenefit payback were considered for automation. Early adopters of computer technology to automate their business processes were the US Census Bureau, the Prudential Insurance Company, A.C. Nielsen, and Northrop. The most dramatic initial application of computing was during the CBS broadcast of the 1952 election returns. A UNIVAC computer correctly predicted the Eisenhower landslide over Stevenson on the basis of preliminary voter results. However, election officials and network executives refused to announce the prediction because they did not trust the computer. The initial successes of business automation led rapidly to the introduction of computers into a majority of organizations throughout the 1950s and early 1960s. However, scarcity of qualified computer operators, the cost of computer hardware, and limited application software dampened rapid dissemination of computers into all aspects of the business. Automation was applied to the business processes considered "low hanging fruit" on the basis of thorough cost analyses.
2.3
Era of Integration and Innovation (1965-74)
The architecture of IBM's System/360-370 series of compatible processors is one of the most durable artifacts of the computer age. Through two major revisions of the product line and 23 years of technological change, it has remained a viable and versatile interface between machine and user. [11]
10
ALAN R. HEVNER AND DONALD J. BERNDT
As the title indicates, this era is characterized by two quite independent themes. The first is the introduction of the IBM System/360 family of compatible computers, which fostered the notion of true systems integration and platform-independent software. The second theme is the incredible changes in computing technologies, such as component miniaturization and new developments in data communications, that occurred even while IBM enjoyed its large-scale success in systems integration. The seeds of change were being sown. The new era in business computing began with IBM's announcement of the System/360 in April 1964. The daring move by IBM to replace its entire product line with a new and uniform computer system architecture caught the business community off guard. But when organizations realized the advantages of an integrated product line that ranged from smaller departmental computers to large organizational mainframes, they rushed to purchase the System/360. Within the first 2 years of production, IBM could fill less than half the over 9000 orders [2]. As it turned out, the 360370 computers remained the backbone of the IBM product line for nearly 30 years. During the decade from 1965 to 1974, businesses matured in their use of computer technology to support their business functions. The simplistic automation of basic business processes gave way to the full computerization of critical business functions at the center of major organizations. To a large extent, the success of System/360 recognized that the software investment was the important cost and that a family of compatible systems minimized the risks. This maturity resulted from two dominant themes of the e r a - integrated systems approaches for solving business problems and technology innovations in hardware, software, and telecommunications. The innovations that characterize the Era of Integration and Innovation include some important developments that laid the foundation for the radical changes of the next era. Though there were many innovations during this era, four stand out for the current discussion. The first is the development of integrated circuits, invented through pioneering projects at Texas Instruments and Fairchild Semiconductor begun in the late 1950s with patents granted in 1964. It is the astounding progress in integrated circuits as the technology moved through mediumscale integration (MSI), large-scale integration (LSI), and very large-scale integration (VLSI) that eventually led to the microprocessor--the veritable computer on a chip. Medium-scale integration provided the technological foundation for inexpensive, yet powerful minicomputers. As chip densities increased and components were miniaturized, the price/performance ratio of these machines followed the historic and favorable march that led to modern desktop computing. In fact, it was in 1964 that Gordon Moore, a
ERAS OF BUSINESS COMPUTING
11
pioneer at Fairchild Semiconductor and a cofounder of Intel, noted that integrated circuit density was doubling every year, an observation that came to be known as Moore's law. However, it would take some time before computer researchers were able to internalize Moore's law and grow accustomed to developing computing technologies ahead of the performance levels that would be delivered by the staggering leaps in integrated circuit technology. Essentially, Moore's law predicted that by the end of this era in 1975, all of the circuits of a classic mainframe such as the IBM 7090 could be implemented on a single chip. A second innovation was the developing model of interactive computing. That is, how should we use the new departmental computer and how do we get computing power to the desktop? While a wide range of researchers contributed to the vision, many of the pieces came together at Xerox PARC through their efforts to build the office of the future. The desktop metaphor and the W l M P (windows, icons, mouse, and pull-down menus) interface were realized on the Xerox Alto connected via a local area network. All these elements would find their way into commercial products in the next era. The third and fourth innovations are in the realm of data communications. In the local area environment, Ethernet was developed at Xerox PARC, breathing life into their conception of the office of the future. In the wide area environment, A R P A N E T was successfully implemented and flourishing in the research community. The infrastructure for internetworking was being developed as the descriptions of the influential Transmission Control Protocol and Internet Protocol (TCP/IP) suite were being published. These developments would provide the range of connectivity options that have wired the world. In summary, the Era of Integration and Innovation saw IBM and its flagship System/360 family of integrated computers and peripherals further dominate the industry. Even though minicomputers became a force in the commercial sector, they were perceived as co-existing rather than competing with the mainframe. Mainframe systems represented a business-sensitive balance of processor performance, critical peripherals, and data management through fast input/output channels. Herbert Grosch is associated with a less scientific law, Grosch's law [12, 13]. This stated that you would get more computer power for your money by buying a single large machine rather than two smaller machines. In fact, the rule was probably quite accurate at the time owing to the balance of strengths that mainframe systems embodied and the difficulties in connecting a group of smaller machines for work on coordinated tasks. Improvements in networking, software, and storage systems were necessary to make alternative information system architectures more competitive. Developments in many of these
12
ALAN R. HEVNER AND DONALD J. BERNDT
key technologies would characterize the Era of Decentralization and beyond. 2.4
Era of D e c e n t r a l i z a t i o n
(1975-84)
Ethernet was up against "sneakernet" from the very start. All that changed overnight in 1975 with the advent of SLOT, Starkweather's laser printer. The virtues of the combined system called EARS--the Ethernet, the Alto, the research character generator, and SLOT--were too powerful to ignore. One could now write a memo, letter, article, or dissertation and with the push of a button see it printed in professional-quality type. [14] The four innovations described in the previous era: microprocessors, interactive computing, local area networking, and internetworking protocols, as well as many other innovations, would be commercially realized in a torrent of groundbreaking products in the Era of Decentralization. This leads us to highlight 1975 as the computation-communication inflexionpoint when the role of the computer became more of a communication device than a computing machine (see Fig. 2). All of the technological pieces for effective computer communications came together in this era. Hardware that spans the range from large mainframes to minicomputers to desktop computers became available. Connectivity options in both the wide area and local area environment matured, moving into the business computing market. Both computation and communication technologies finally provided the flexibility to create a rich set of information systems architectures. The stage was set for a radical change in our model of computing, with traditional computation being supplanted by communication as the goal of the digital computer. The 1970s were characterized by rapid developments on a host of fronts. The microprocessor was commercially realized in the early 1970s with Intel's introduction of the 4004 and 8008. However, the appearance of the
Computation
Communication
9 focus
" 1950
1960
1975
1990
2000 Time
FIG. 2.
The computation-communication inflexion.
ERAS OF BUSINESS COMPUTING
13
Altair 8800 on the cover of Popular Electronics in January 1975 has become synonymous with the dawn of personal computing. The Altair was based on the Intel 8080 microprocessor and was essentially a minicomputer available for under $400. This entry in the hobbyist market soon gave way to personal computers from Apple and IBM. It was the 1981 introduction of the IBM Personal Computer (PC), with an open architecture, primitive operating system, and low cost that moved the personal computer from the hobbyist benches to corporate desktops running programs such as Lotus 1-2-3. It was the legendary Xerox Palo Alto Research Center (PARC) that implemented the interactive desktop, developed the laser printer, and networked the components to form the office of the future around the experimental Alto computer in the 1970s. Though Xerox would delay commercial introduction of much of this technology, eventually introducing the Xerox Star, it would be the lower cost offerings by Apple and IBM that would bring the desktop computing metaphor to the business computing market. The IBM PC was a major gamble by IBM to enter a market previously dominated by hobbyists, game players, and educators. Another IBM gamble was to outsource the PC processor, the 8088 chip, to Intel and the operating system, MS-DOS, to a little-known firm named Microsoft. Contrary to IBM's traditional way of doing business, the IBM PC architecture was open to competitors to make their own PC clones. All of these features plus the low individual unit cost made the PC the right vehicle to allow businesses to distribute processing power throughout their organizations. In 1983 Time magazine named the Personal Computer as its Man of the Year. In 1984, Apple Computer introduced the Macintosh with a commercial during the American Football Super Bowl that was recently named the greatest commercial in TV history by TV Guide. By 1985, businesses had accepted the distribution of computing power to the desktop and were ready to redesign their business processes based on the new distributed computing architectures. In the area of data communications, developments in both local area networking (LAN) and wide area networking (WAN) technologies would add to this revolutionary era. Again, Xerox PARC played a central role in fostering new technologies by developing and utilizing Ethernet as their LAN technology. Xerox would then license the technology for a nominal fee, spurring the development of a cost-effective LAN standard that remains the technology of choice to this day. Influential WAN technologies that had developed under the auspices of the ARPANET and stood the test of time became part of the computer research community infrastructure. In addition, the TCP/IP internetworking protocols were being refined. The ARPANET, which originally developed among a small group of pioneering institutions, moved to the TCP/IP protocol in 1983. The National Science
14
ALAN R. HEVNER AND DONALD J. BERNDT
Foundation (NSF)created a national network, NSFNET, based on TCP/IP to broaden connectivity throughout the computer research community as the original ARPANET was transformed into a collection of new networks - - a true Internet. So, by 1985 personal computers, the desktop metaphor, and a range of computer networking technologies were in place. The question became, how to use these technologies creatively in the business environment?
2.5 Era of Reengineering and Alignment (1985-94) Reengineering is the fundamental rethinking and radical redesign of business processes to achieve dramatic improvements in critical, contemporary measures of performance, such as cost, quality, service, and speed. [15] An impressive technological toolkit was invented and deployed during the previous two eras. The new Era of Reengineering and Alignment is characterized by efforts to apply the wide range of technologies to better support existing business processes, as well as to enable the design of entirely new processes. New business system architectures were developed to take advantage of inexpensive computing power and networking options for distributed information processing. In particular, client-server architectures provided the combined advantages of centralized storage of large businesscritical data while moving the application processing to the client location. Technical advances in LANs and WANs supported efficient data transmission among clients and servers. The mid-1980s saw the rise of total quality management (TQM) initiatives. TQM gurus, such as W. Edwards Deming and Phillip Crosby, brought the message of statistical quality control and continuous process improvement to business organizations all over the world [16, 17]. Business process reengineering (BPR) became an important activity to achieve higher levels of quality and productivity. Organizations took a clean, fresh look at their critical business processes in order to discover revolutionary new approaches. BPR was made possible by the new, flexible computer system architectures. Michael Hammer and Thomas Davenport championed the strategy of employing innovative information technologies to reengineering the business organization [18, 19]. By the early 1990s, businesses had accepted the importance of aligning their corporate strategies with their information technology strategies and vice versa [20,21]. Computer systems no longer played an administrative support role in the organization. They became a primary factor in determining whether the organization succeeded or failed in a highly competitive business environment.
ERAS OF BUSINESSCOMPUTING
2.6
15
Era of the Internet and Ubiquitous Computing (1995 onwards)
The Internet is quickly becoming the global data communications infrastructure. Whatever underlying changes are made to its technologies, the result will be called the Internet. [22] Most of the past eras included the invention of new technologies as major themes. In a sense, the Era of the Internet and Ubiquitous Computing is more about reaching a critical mass than any single technological innovation. The range of technologies described earlier, such as the PC and LAN, were deployed throughout the corporate environment, as well as the home. Of course, the ceaseless march of price/performance improvements provided new and better platforms, but the main theme is that a critical mass of desktops had been reached. In large part this latest era of business computing is centered on the revolutionary influence of the Internet and the World Wide Web (WWW). Again, the technological innovations had been developed in past eras, with the evolving precursor to the Internet used by the research community. In the early 1980s the Internet adopted the T C P / I P protocol suite and supported thousands of nodes. By the mid-1980s a consensus developed around the domain name system (DNS) giving us the now familiar naming conventions including ".com" for commercial entities. In 1989, Tim Berners-Lee at C E R N in Geneva had the insight to develop a simple protocol to utilize the Internet as a global hypertext system to share information. These initial protocols have expanded to become the World Wide Web, again a development driven by the critical mass of computing stations reached in this new era. Future visions of the WWW would have it becoming a World Wide Computer or even a World Wide Brain for our global community [23]. The potential of the Internet for business was not fully realized until around 1995. Product sales on the Internet in 1996 were approximately $500 million. Estimated Internet sales for 2000 are in the range of $10 billion. The reengineering of critical business processes is no longer enough for business success and survival. The business environment has changed dramatically. The critical mass of connected desktops and "tabletops," has been highlighted in the business press with a November 1994 issue of Business Week stating "How the Internet will change the way you do business" and the 70th anniversary issue of Business Week (October 4, 1999) announcing "The Internet Age." Traditional organizational boundaries and business models do not hold any more. Especially affected are the marketing and sales functions. The Internet provides connectivity to a large percentage of the world's population for dissemination of marketing information. However, important issues of information overload and privacy/security concerns must be understood and addressed.
16
ALAN R. HEVNERAND DONALDJ. BERNDT
As business boundaries dissolve, organizations must establish a recognizable and trusted presence on the Internet. In essence, the organization becomes a virtual organization. Most customer and client relationships are conducted via the push and pull of information on the Internet. Business computer systems during this Era of the Internet and Ubiquitous Computing will look very different from those in past eras. New business computing strategies will necessarily involve a major telecommunications component to support access to the Internet, internal intranets, and private networks with strategic partners. The opportunities and challenges facing organizations and their business computing systems are many. The future holds an exciting New World of business computing with future eras yet to be defined.
11
The C o m p u t a t i o n Platform: H a r d w a r e and Operating Systems
The subject of this section is the business history of computing machinery-the hardware and associated software, such as operating systems, that provide the computing platforms on which we construct business information systems. Historical accounts tell us that the first business computer was the LEO (Lyons Electronic Office), a British computer system that was used in the J. Lyons & Company catering firm [24, 25]. In 1951 the first routine office task, weekly bakery valuations, was automated via the LEO computer. LEO computers were used for other business applications in Britain during the 1950s, including payrolls and inventories. However, the story of business computing is largely a tale of the explosive growth of the American digital electronics industry, spurred on by World War II and fueled by a rapidly growing economy [26]. Although wartime demands and military funding played a critical role, an incredible array of research universities and corporate research laboratories developed technologies on many fronts, involving a truly international community of scholars that were drawn to a near endless stream of leading-edge projects. There is a rich history and body of literature that chronicles the rise of the computing engines, as well as their inventors, which have changed the direction of an industry and the world. Two excellent books on the history of computing are [2] and [9]. An interesting article-length treatment is found in [27]. We will draw upon classic historical milestones in the current discussion, but will focus our attention on how computing platforms evolved in support of business processes. The hardware and software that make up modern computing platforms were developed during the second through fourth eras: Automation,
ERAS OF BUSINESS COMPUTING
17
Integration and Innovation, and Decentralization. The last two eras, Reengineering and Alignment, as well as the Internet and Ubiquitous Computing Era, focus on the application of technologies that matured during the earlier eras. This section begins by considering the scientific computing that characterized the Era of Calculation, then moves on to consider the Era of Automation that began with the UNIVAC. The Era of Integration and Innovation is marked by the introduction of IBM's longlived System/360, but also includes innovations such as integrated circuits and other technologies that would lay the foundation for the following era. The Era of Decentralization focuses on the microprocessor, desktop computing platforms, and computer networking advances that made this one of the most revolutionary eras.
3.1
Three Classic Computer Hardware Generations
The history of computer hardware is often divided into three generations, corresponding to the transformational technologies that enabled new types of computers: vacuum tubes, transistors, and integrated circuits. A fourth generation is sometimes added with the advent of large-scale integrated circuits and microprocessors that provided the foundation for personal computers [28]. Each of these technologies allowed the next generation of computers to be more reliable, less expensive, smaller in size, and vastly more powerful. These technological milestones therefore define fundamental hardware generations. However, from a business computing perspective, the information systems that have developed result from a combination of computing platforms, communication technologies, and the ability to provide solutions through software applications. Even within the computer hardware industry, success in business computing did not always go to the most technologically advanced. The great business computing system products arose from strong computing platforms, well-designed peripheral equipment, knowledgeable sales and support teams, and a focus on business solutions and software applications. In viewing the history of computing from a business systems perspective, the combinations of technologies and the information systems architectures provide a more applicable set of business computing eras as outlined in Section 2.
3.2
The Role of Universities and Military Research
The development of computing technologies, clearly one of the most astounding industrial revolutions in history, was influenced at every early step by university and military research. World War II and the race for wartime advances drove the early development of computing technologies
18
ALAN R. HEVNER AND DONALD J. BERNDT
and investments by the military and other government-sponsored programs would continue to payoff at key points in the rise of the computer. However, it is even more interesting that a loose collection of universities and individual research scientists would play a pivotal role in the developments in almost every area of computing. From the University of Pennsylvania and the work of Eckert and Mauchly to the rise of Silicon Valley and Boston-based computer firms, universities and research laboratories have played a central role in the generation of whole new industries. Although universities throughout the world have made substantial contributions to the field, American research universities have been at the center of information technology advances, drawing graduate students and researchers from around the globe. The tale of computing history provides one of the most powerful arguments for continued support of research universities, the incubator of the Information Age.
3.3
Twin Roads: Scientific Calculation and Business Automation
The initial research projects and commercial computing endeavors were scientific in nature. The early computers were the equivalent of weapons, intensively pursued in the laboratories of research universities, military organizations, and the defense industry. Therefore, the natural focus of early computer projects was scientific computing. In fact, even after computer pioneers such as Eckert and Mauchly, as well as the businessoriented IBM, saw the commercial potential of computers, there remained a distinction between scientific and business computing. The twin markets did not move in tandem, often requiring different capabilities to create the next generation. The origins and early applications of the computer were driven by the demands of scientific calculations--the Era of Calculation.
3.3. 1 Punched-card Methods As the digital computer was emerging, most large businesses relied on punched-card machines for business data processing, usually IBM products. The legendary IBM sales force built strong relationships with accounting departments in most industries, and business processes reflected the information handling capabilities of punched-card equipment. The typical installation employed a collection of special purpose machines that could read, tabulate, or print reports from a deck of cards. The extraordinary benefits of the punched card are outlined in an IBM sales brochure from the early 1960s (Fig. 3) [9].
ERAS OF BUSINESS COMPUTING
19
What the punched hole will do? 9 It will add itself to something else. 9 It will subtract itself from something else. 9 It will multiply itself by something else. 9 It will divide itself by something else. 9 It will list itself. 9 It will reproduce itself. 9 It will classify itself. 9 It will select itself. 9 It will print itself on an IBM card. 9 It will produce an automatic balance forward. 9 It will file itself. 9 It will post itself. 9 It will reproduce and print itself on the end of a card. 9 It will be punched from a pencil mark on the card. 9 It will cause a total to be printed. 9 It will compare itself to something else. 9 It will cause a form to feed to a predetermined
position, or to be ejected automatically, or to space one position to another. FIG. 3. The benefits of punched cards.
Punched cards were a flexible and efficient medium for data processing. This technology was among the most successful attempts at automating business processes and was in widespread use in the 1930s. In many ways, the familiarity and reliability of punched-card technology stood in contrast with the fragile nature of early computers. One final observation from the IBM sales brochure points out another obvious benefit since "An IBM card--once punched and verified--is a permanent record" [9]. One of the most important aspects of the punched-card model of computation was that simple operations were performed on decks of cards, while human operators shuffled these decks from machine to machine. This meant that the order of data processing steps had to be carefully considered, with a single step being applied to an entire deck of cards. However, most business processes of the time had evolved in tandem with the punched-card model, and therefore were well suited to the technology. Applying a complex sequence of operations would have required moving a single card from machine to machine; a style of processing that would be a liberating characteristic of the newly emerging digital computers. The demands for complex sequences of operations--essentially the equivalent of our modern-day computer programs--came at first from the
20
ALAN R. HEVNER AND DONALD J. BERNDT
scientific and engineering communities. Scientists began using punchedcard equipment for scientific calculations, with one of the most influential centers being the IBM sponsored Watson Computing Bureau established at Columbia University in 1934 [9]. Wallace Eckert of the Computing Bureau outlined the scientific use of punched-card machines in his book Punched Card Methods in Scientific Computation [29]. In time, the generalpurpose computer would provide a uniquely flexible environment, becoming the dominant tool for both scientific calculations and business automation.
3.3.2
IBM 601 Series and the Card Programmed Calculator
The demand for complex sequences of operations led to a series of innovations that allowed punched-card machines to "store" several steps and apply them as each card was read. The work of Wallace Eckert noted above included the development of control switches that would allow short sequences of operations to be performed by interconnected punched-card machines. IBM would develop similar capabilities in the Pluggable Sequence Relay Calculator (PSRC), with the first machines being specially built for the Ballistic Research Laboratory (BRL) at the Aberdeen Proving Ground in Maryland. The Ballistic Research Laboratory, wartime challenges, and later military research would play a continuing role in the development of modern computing technology. In fact, the ENIAC would soon take up residence at the B RL, alongside several more relay calculators. Adaptable punched-card machines became a commercial reality with the IBM 601, a multiplying punch introduced in 1935 that found widespread use for scientific and statistical applications. In 1946, the IBM 603 was introduced, based on vacuum tube technology. Punched-card processing would reach its zenith with the IBM 604 and 605, among IBM's most popular products of the time, with an installed base of over 5000 machines [9]. These machines combined vacuum tube technology and support for short operation sequences, to produce a reliable and flexible computing platform. Similar developments by scientific customers, such as Northrop Aircraft, led IBM to continue development of interconnected punchedcard machines, eventually marketing the Card Programmed Calculator (CPC). Commercial digital computers would render these complex punched-card machines obsolete. However, the Era of Calculation includes several other historic efforts that mark the transition from calculator to computer.
ERAS OF BUSINESS COMPUTING
3.3.3
21
Harvard Mark I: The IBM Automatic Sequence Controlled Calculator
As early as 1929, IBM had established a research relationship with Columbia University, providing calculating equipment and funds for statistical computing. IBM would fund another early computer researcher's effort at Harvard University. Howard Aiken became interested in computing through his research in theoretical physics and began searching for sponsors for a Harvard-based research project. Aiken presented the project to IBM in 1937 [2]. The fact that IBM became not just a sponsor, but a research collaborator, is indicative of IBM's early interest in computing and the pivotal role research universities would play in this emerging industry. Howard Aiken rediscovered the work of Babbage while developing the specifications for the calculating machine. In fact, fragments of Babbage's calculating engine were found in a Harvard attic, having been donated by Babbage's son [2]. IBM provided funds, equipment, and the long-term engineering expertise to actually construct the behemoth electromechanical calculator using the high-level specifications provided by Aiken. The IBM Automatic Sequence Controlled Calculator, better known as the Harvard Mark I, could perform 3 addition/subtraction operations per second and store 72 numbers, with more complex operations taking substantially longer. The actual inauguration of the machine was somewhat controversial as Aiken failed to acknowledge the financial and engineering contributions of IBM, alienating Watson and other IBM executives [30]. The rapid development of digital computers at the end of World War II made the Harvard Mark I obsolete as it was completed. Though operational for 15 years, it served simply as a Navy calculator. However, it is a milestone since it was one of the first automatic calculating machines to be successfully built and was a fertile training ground for early computing pioneers such as Grace Murray Hopper and a host of IBM engineers. IBM would renew its commitment to Columbia University and gain further expertise by developing an even more advanced calculator, the Selective Sequence Electronic Calculator (SSEC), which was eventually installed in the lobby of IBM's New York City headquarters [2].
3.3.4
University of Pennsylvania, the ENIAC, and the EDVAC
J. Presper Eckert and John Mauchly pursued an intensive research program to deliver computing technology to the military during World War II at the University of Pennsylvania's Moore School of Electrical
22
ALAN R. HEVNER AND DONALD J. BERNDT
Engineering [31, 32]. The computing projects at the Moore School led to the development and clarification of a computing model that would spark the development of commercial computing and remain relevant to this day. The first of the Moore School electronic computers, the Electronic Numerical Integrator and Computor (ENIAC) 2 was a massively complex device for its time, using roughly 18 000 vacuum tubes, 90 000 other devices, such as resistors, capacitors, and switches, and several miles of wire [33]. It is a tribute to Eckert's engineering talents and the commitment of all involved that the machine came into existence. However, before the machine was even completed, several design drawbacks became apparent. The most important from the current perspective is the difficulty in "setting up" or programming the machine for new computations. It was necessary to rewire the machine using patch cords to specify a computation, a painstaking task that took hours or even days. The wire-based approach was adopted in order to deliver the instructions at electronic speed. Typically, paper tape, punched cards, or some other input mechanism was used, which was adequate for speeds such as the 3 operations per second executed by the Harvard Mark I. However, the speed of the ENIAC processor, at 5000 operations per second, required a correspondingly fast mechanism for instruction delivery in order to keep the processor working efficiently, hence the use of wiring to specify the program [33]. The mismatch between processor speed and primary storage persists to this day and has resulted in a complex storage hierarchy in modern computers, with cache memory and other mechanisms used to deliver needed information at high speeds. The careful hand-wired specification of a particular computation suited the numerical applications involved in the nascent scientific computing efforts, but was not an economical model for the everyday demands of business computing. Although the notion of storing "programs" or sequences was already recognized and being incorporated in a very limited fashion in punched-card machines, it was becoming clear that a more flexible and robust scheme was necessary for general computing. In addition to difficult programming, and the mismatch between processor speed and instruction delivery, the machine also suffered from a limited memory size. Before the ENIAC was completed, the Moore School team was already at work on a new machine, the Electronic Discrete Variable Computer (EDVAC). One important innovation was Eckert's proposal for a mercury delay line, circulating microsecond-long pulses, to provide 1000 bits of permanent memory [34]. This would address the most obvious shortcoming of the ENIAC, a very small main memory. 2The spelling is taken from the nameplate on a piece of equipment in the Smithsonian collection.
ERAS OF BUSINESS COMPUTING
23
As the designers debated improvements to the ENIAC and focused on increasing the storage, one of the great insights of the computer age would emerge--the stored-program concept. The designers of the ENIAC recognized that programs, as well as data, could be stored in computer memory. Thus programs could be loaded into computer memory using a peripheral device, such as a paper tape, with both data and instructions delivered to the processor at electronic speeds during execution. In fact, the program itself could then be manipulated as data, enabling the future development of programming tools and new research areas such as artificial intelligence [35]. This fundamental stored-program concept, simple in hindsight, is one of the essential characteristics of all digital computers to this day. John von Neumann learned of the ENIAC project through a chance meeting with Herman Goldstine, the liaison officer from the Ballistics Research Laboratory, and eventually joined the project as a consultant during the design of the EDVAC [36]. Von Neumann had an established reputation and his participation brought a new level of visibility to the project, which had originally engendered little confidence among the military sponsors [37]. He was very interested in the logical specification of these new computing machines and helped the original designers address the problems inherent in the ENIAC's design. In 1945, the EDVAC design was mature enough that yon Neumann wrote his famous treatise, A First Draft of a Report on the ED VAC, clearly specifying the logical foundations of the machine, as well as the stored-program concept [38]. This famous report is recognized as one of the founding documents of modern computing. The initial draft report was meant for internal distribution, but it became widely available. John von Neumann's sole authorship is an unfortunate historical accident and contributed to tensions between the original project leaders, Eckert and Mauchly, and von Neumann's group that contributed to the abstract design [2,9]. However, von Neumann's clear exposition laid out the major components of the EDVAC design at an abstract level, including the memory (for both data and program), control unit, arithmetic unit, and input/output units. This architecture, often called the yon Neumann architecture, along with the stored-program concept, has served as the foundation for modern digital computers. In the summer of 1946, the Moore School held a series of lectures in which the stored-program concept and computer designs were laid out for a group of researchers representing most of the leading laboratories, thereby influencing almost all of the developing projects.
3.3.5
The ABC and Z3
Although Eckert and Mauchly's work at the Moore School matured to become the most influential model for early general-purpose computers, other
24
ALAN R. HEVNER AND DONALD J. BERNDT
efforts provided guidance in the evolution of the computer. Code-breaking and other military needs led to pioneering efforts in computing in the US, England, and Germany during World War II. A university-based project led by Professor John Vincent Atanasoff began in 1937 at Iowa State University, and with the help of Clifford Berry, a graduate student, resulted in a rudimentary electronic computing device by 1939. John Mauchly would visit Atanasoff in 1940 at Iowa State University and stayed for several days at Atanasoff's home, learning about what we now call the Atanasoff-Berry Computer (ABC) [2]. The direct and indirect influence that Atanasoff's work had on the later Moore School project has been debated both from a historical and legal perspective (for patent applications). Other historians attribute the first working, fully programmable generalpurpose computer to Konrad Zuse in Germany [39]. His Z1 programmable calculator was completed in 1938 and the nearly equivalent Z3 computer was completed in 1941. These machines performed calculations for the German war effort. The Z4 saw postwar operation at the Federal Technical Institute in Zurich [9]. A consensus has evolved that there is no one inventor of the computer, with more than enough credit in the birth of computing to recognize all involved.
3.4 The Rise of the General Purpose Computer Eckert and Mauchly were among the early pioneers who recognized the commercial applications of the Moore School projects. They would leave the University of Pennsylvania to form one of the original computer startup firms, the Eckert-Mauchly Computer Corporation (EMCC), ushering in the Era of Automation. Though EMCC was ultimately absorbed by Remington Rand, Eckert and Mauchly's perception of the computer market would prove to be correct. IBM would also capitalize on the early computer research efforts and a deep understanding of business processes honed by years of punched-card data processing sales. The fundamental insight of these early business computer companies was that manipulating data, essentially symbolic computing, could be accomplished using the same technologies that were first focused on scientific computing.
3.4.1
The Universal Automatic Computer'. UNIVAC
In 1946, Eckert and Mauchly began work on the Universal Automatic Computer (UNIVAC), which would later become synonymous with the word "computer" until IBM came to dominate the industry. The Census Bureau was the first to order a UNIVAC from the fledgling company. As early as
ERAS OF BUSINESS COMPUTING
25
1946 during the Moore School lectures, Mauchly was discussing sorting and other clearly business-oriented topics in his lectures [2]. So, as Eckert and Mauchly began their commercial endeavor, they had a distinctly businessoriented focus, whereas most earlier projects had pursued the scientific computer. Even IBM, which maintained an early interest in the emerging computer, pursued the Defense Calculator (later marketed as the IBM 701) at the expense of data processing projects, giving the UNIVAC several extra years without serious competition in the business world. Of course, IBM faced the difficult dilemma of how to enter the emerging computer business while protecting the immensely successful punched-card product lines. Technically, the UNIVAC was to be a commercial incarnation of the ENIAC and EDVAC, embracing the stored-program concept and using the mercury delay line for high-speed memory [34]. Perhaps the most innovative aspect of the new machine was the use of magnetic tape to replace punched cards and paper tape. The construction of a magnetic tape storage device involved the development of low-stretch metal tape and many mechanical engineering challenges. In addition to magnetic tape storage, other peripheral devices to input and print information were required. In 1951, the UNIVAC passed the Census Bureau acceptance tests and ushered in the world of business computing. Within a year two more machines would be delivered to government agencies and orders for several more machines were booked.
3.4.2
IBM and the Defense Calculator
The very public success of the UNIVAC and its clear focus on business data processing spurred IBM into action [40]. The firm had decided to focus on supplying several high-performance computers to the technologyoriented defense sector, drawing resources away from two important business projects, a less expensive drum-based machine and a magnetic tape machine that could compete directly with the UNIVAC. The IBM 702 data processing machine, based on the Tape Processing Machine (TPM), was announced in 1953 [9]. However, the actual machines would be delivered in 1955, giving the UNIVAC a further advantage in the early business computing market. Once the IBM 700 series reached the market, it became the market leader with a larger installed base than the UNIVAC by 1956 [2]. The success of the IBM 702 was due in large part to the much more reliable magnetic tape technology, developed by the experienced IBM mechanical engineers. In addition, faster memory technology, modular construction, and the IBM sales force all contributed to the success. When core memory became available, models 701 and 702 were upgraded to models 704 and 705, maintaining a scientific computing and data processing dichotomy. One of the most successful machines, the IBM 704, had floating-point arithmetic,
26
ALAN R. HEVNER AND DONALD J. BERNDT
core memory, and the F O R T R A N programming language, as well as the support of IBM's sales and field engineering teams.
3.4.3
Drum Machines and the IBM 650
As early as the Atanasoff-Berry project, as well as during Eckert and Mauchly's work on the ENIAC, the idea of a rotating magnetic storage device was explored. The main drawback of the technology was the relatively slow speed of the electromechanical devices as compared with the electronic processors being developed - - a relationship that still holds today between magnetic disks and memory/processor speeds. However, magnetic drums were much lower in cost and more reliable than other available memory technologies, such as mercury delay lines. The commercialization of magnetic drum machines is typified by Engineering Research Associates (ERA), a firm that had roots in Navy research and would go on to market scientific and business computers [9]. As with the Eckert-Mauchly Computer Corporation, ERA would need more capital to pursue the business computing market and would also become part of Remington Rand. The magnetic drum devices developed for internal use and direct component sales by ERA were very reliable and could store up to 2 million bits, with access times measured in milliseconds. Though the speeds did not match the digital processor speeds, inexpensive and reliable business computers with adequate storage could be built. The IBM 650 would turn out to be a tremendous financial success. Whereas the 700 series provided technical leadership, the 650 became the business workhorse with roughly a thousand machines installed [9]. In developing the 650, IBM was forced to consider business solutions rather than computing power to attract their traditional punched-card users to business computing. This would be an important lesson for IBM and the rapidly growing computer industry. The IBM 650 was a less expensive computer based on the Magnetic Drum Computer (MDC) project within IBM. Though slower, the lower-cost machine was very reliable and had software that supported the business solution approach. In addition, IBM established an innovative program with universities, providing deep discounts on the IBM 650 for use in computing courses [2]. The result was the first generation of computing professionals being trained on IBM equipment, as well as increasing the role of universities in further research.
3.4.4 Magnetic Disk Secondary Storage In 1957, IBM marketed the first magnetic disk, allowing large-scale random access storage to be attached to a wide range of computers. IBM
ERAS OF BUSINESS COMPUTING
27
engineers would develop several innovations that allowed high capacity disk drives to be built at a reasonable price. The geometry of magnetic disks provided more surface area than drums, especially when the disks were stacked like platters. The IBM disk technology used read/write heads that could be positioned over the surface and basically flew on a thin film of air (i.e., the boundary layer) created by the motion of the disk. IBM produced the model 305 disk storage device, a 5 million character storage device using a stack of 50 disks, which became known as the Random Access Memory ACccounting machine (RAMAC). The random access nature of the device was demonstrated at the Brussels World's Fair in 1958, where visitors could question "Professor RAMAC" via a keyboard and receive answers in 10 different languages [9]. Both magnetic disk storage devices, as well as the interactive style of computing ushered in by this technology, were lasting innovations that would flourish on future desktops. 3.5
C o m p u t i n g M e a n s Business
This section discusses the classic business machines that developed at the end of the Era of Automation, and the introduction of the System/360 that marks the beginning of the Era of Integration and Innovation. The computers that evolved during the Era of Automation focused on providing business solutions. Two of the most successful machines were the classic IBM 7090/7094 mainframes and the less expensive IBM 1401 business computer. These machines were often installed together to create the batch processing systems that sprawled across computer rooms, with the 7094 providing the computing power and the IBM 1401 handling the cards and magnetic tape. The introduction of the System/360 redefined the market and ushered in the Era of Integration and Innovation. The robust System/360 architecture would last for decades and span an incredible range of products [41]. IBM's focus on a uniform family of machines for software compatibility highlighted the growing importance of software. Finally, the impressive range of System/360 peripherals demonstrated the importance of a solution-based approach, as opposed to a narrow focus on pure processor performance. Clearly, the System/360 is an important milestone in business computing.
3.5. 1 IBM 1401: A New Business Machine The IBM 1401, announced in 1959, represented a total business computing solution, employing modular construction and advanced peripherals, such as the innovative IBM 1403 "chain" printer. The high-speed printer, capable of printing 600 lines per minute, was probably one of the most important
28
ALAN R. HEVNER AND DONALD J. BERNDT
features for the business market. The main goals of improved performance and reliability were achieved by simply replacing vacuum tubes with transistors and employing magnetic core memory, a technology pioneered by Jay W. Forrester during Project Whirlwind [2]. Magnetic core technology was the basis for reliable, high capacity, nonvolatile storage that supported the construction of random access memory (RAM), a perfect complement to the new transistorized processors. However, it was the business lessons learned from the IBM 650 that guided the development and marketing of the IBM 1401, as well as firmly establishing IBM as the dominant force in the rapidly growing mainframe computer market. Approximately 10 000 model 1401 machines were installed, dwarfing the successful model 650 installed base and all other computers of the day [9]. Whereas many computer makers were fascinated by processor designs, IBM focused on supplying business solutions, reliable products, excellent support, and a strong relationship with the business customer. In addition to a strong hardware product, the IBM 1401 was the computing platform for the Report Program Generator (RPG) programming language. This language was specifically designed for programmers with previous experience with patch cord setup and the IBM accounting machines. Unlike FORTRAN, the RPG language was designed for business computing and a smooth transition from previous platforms. Like FORTRAN, RPG remains in use to this day, with its punched-card machine heritage obscured by a long line of increasingly sophisticated computing machines. While supporting customer software development, IBM also used its familiarity with business processes to develop applications that made its machines even more attractive. By virtue of IBM's dominant position in the market, the software applications were bundled with the hardware, spreading the software development costs over many customers. It was hard for many manufacturers to take advantage of similar economies of scale and the computing industry was characterized as "IBM and the seven dwarves." As applications became more complex and software costs escalated, the practice of bundling software could no longer be supported. Many customers were dissatisfied with the embedded software costs and this, coupled with government pressure, led to IBM's decision to "unbundle" software in 1968 [9]. This event would fuel the growth of the tremendous software industry that continues to thrive today.
3.5.2
IBM 7090: The Classic Mainframe
In addition to the low cost IBM 1401, the scientific computers from the 700 series (701,704, and the subsequent model 709) were strong products for
ERAS OF BUSINESS COMPUTING
29
IBM. The 700 series evolved into the IBM 7090, the quintessential mainframe. The classic room-sized computer, complete with arrays of blinking lights and rows of spinning tape drives, defined the image of computing during the heyday of mainframe machines. The IBM 7090 was based on transistor technology and included all the significant peripheral devices that IBM had been so successful in engineering. The upgraded model 7094 was installed at hundreds of locations. The mainframe model of computing remains relevant to this day, not so much for the "batch" mode of computing, but for the storage and manipulation of vast amounts of data. The mainframe computers were characterized, in part, by carefully designed input/output channels (often dedicated processors) that could handle large-scale data processing. Mainframe computers were full of data-processing tradeoffs, balancing raw processor performance, memory capacity, physical size, programming flexibility, and cost. Faster scientific supercomputers would be built for less data intensive computations, but the mainframe typified large general purpose computing during the 1960s. The highly successful 704 and 7094 models, incorporating floating-point arithmetic, also showed that computers developed for scientific applications often doubled as business data processing machines. IBM would formally combine its product line, ending the scientific computing and data processing dichotomy, when it re-invented its entire product line with the development of System/360.
3.5.3 New Product Line: System/360 Despite IBM's tremendous success, the company suffered from an increasingly complex product line, a problem that grew more urgent as software became an essential component of business solutions that spanned the hardware spectrum. IBM was left to continually maintain a plethora of software versions in order to support customer migration paths. After lengthy internal debate, IBM decided on a revolutionary strategy to build a family of compatible computing platforms [42]. The so-called New Product Line, later marketed as System/360--evoking "all points of the compass" or "a full circle" - - remains one of the largest commercial research and development efforts and was really a gamble that affected the entire company. Although some of the System/360 technical developments were quite conservative with regard to actual computer architecture, it employed the successful business solution strategy that characterized the IBM 650, 1401, and 7094 [11]. However, it added two new elements that revolutionized the industry--the entire product line was developed as a compatible family, allowing software to run across the entire spectrum of platforms, and a wide
30
ALAN R. HEVNER AND DONALD J. BERNDT
range of peripherals allowed tremendous variety of configurations. The scale of the product announcement that accompanied the introduction of System/360 in 1964 (initially 6 compatible computers) including some 44 peripheral devices redefined the computer industry. The System/360 would prove to be a tremendously successful product line, fueling IBM growth for decades [43]. The enhanced System/370, which employed integrated circuits, extended the dominant position of the 360-370 architecture for many more years [44]. The life of the System/360-370 series would continue even longer than planned as IBM's attempt at a second comprehensive architectural reinvention, the Future Series, foundered after several years of effort [2].
3.6 Computers Leave the Computer Room The Era of Decentralization focuses on the incredible progress in integrated circuits and the advent of the microprocessor. The microprocessor would break the boundaries of computing and allow the creation of truly personal computers at both the low end (microcomputers) and the high end (workstations). In addition, advances in LANs and WANs provided the connectivity options necessary to begin linking the rapidly growing personal computer market. The technological foundations for these developments were laid in the Era of Integration and Innovation, an era that is characterized by two somewhat contradictory themes: the redefinition and success of large-scale computing marked by the IBM System/360, and innovations in integrated circuits and networking that would lead to the dramatic changes in the Era of Decentralization. These innovations led to a shift from the centralized computer room to a decentralized model of computing, vastly expanding our ability to create new information systems architectures.
3.6. 1 Integrated Circuits and the Minicomputer From 1965 through the mid 1970s, integrated circuits were developed and refined, providing the third classic transformational technology that would reshape the computer industry [45]. Integrated circuits would reduce the cost of computers by orders of magnitude, improve reliability, and eventually result in the microprocessor--"a computer on a chip." Like vacuum tubes and transistors, the integrated circuit arose in the electronics industry, outside the mainstream computer manufacturers, and was adopted by a new set of computer makers, some already existing and some newly formed. During this critical time, the entrepreneurial electronics industry was centered around Route 128 and MIT as well as Silicon Valley and Stanford University. In 1957 Harland Anderson and Kenneth Olsen, an MIT-trained
ERAS OF BUSINESS COMPUTING
31
engineer who had worked on several aspects of Project Whirlwind, formed Digital Equipment Corporation (DEC) [9]. The original aim of DEC was to enter the computer business and compete with the established mainframe manufacturers. Although DEC could certainly develop competitive processor designs, the costly barriers to entry included the business software applications, sales and support teams, and all the associated peripherals (e.g., printers, disk storage, tape drives). So, with venture funding from American Research and Development (ARD), one of the original postwar venture capital firms formed by Harvard Business School faculty member George Doriot, DEC began by making component circuit boards. The successful component business funded DEC's move into the computer business with the announcement of the first computer in the venerable Programmed Data Processor (PDP) series. Though the earlier PDP models had some success, it was the 1965 introduction of the PDP-8 that would usher in the minicomputer era, with over 50 000 installed systems [9]. The PDP-8 was the first integrated circuit-based computer and was marketed to the scientific and engineering industry, often finding use in dedicated applications that would have been impossible for an expensive mainframe. The PDP-8 was small in size and cost, priced under $20 000. The PDP-8 machines brought a personal style of computing to university research environments, nurturing a new breed of computer professionals just as the IBM 650 had created the first wave of collegiate computing. By 1970, DEC was the third largest computer maker and minicomputers formed a new market allowing DEC to become a serious IBM competitor. From a business computing perspective, the minicomputer enabled department-level computing and more distributed information systems architectures. The use of a centralized computing model became a choice, rather than a dictate of the mainframe platform. It is worth noting that the technology for desktop or personal computing was simply a technological progression of the increasingly dense integrated circuits. Large scale integrated circuits were remaking industries, fostering products such as digital calculators and watches. The first commercial microprocessors were developed in the early 1970s, typified by the Intel 4004 and 8008. Therefore, the technology for desktop computing was in place, but the market was not exploited until years later. It would take an eclectic mixture of hobbyists, engineers, venture capitalists, and visionaries to create a new market. In fact, many university researchers viewed the PDP-8 as a personal computer, but for most of the 1970s the minicomputer would be marketed to both the scientific and business market. In framing the evolutionary role of the minicomputer in terms of information systems architecture, the smaller department-level server became a reality. The incredible growth in computing performance made these machines a powerful force.
32
ALAN R. HEVNER AND DONALD J. BERNDT
DEC also introduced larger machines that began to challenge traditional IBM strongholds. One of the most successful high-end PDP systems was the PDP-10, a time-sharing system that became an important platform at universities and commercial research laboratories. Introduced in 1966, this machine was in use at many of the research centers during the development of the ARPANET. After the PDP-11, DEC would continue expansion into the large systems market with the introduction of the VAX (Virtual Address eXtension) architecture and the VAX 11/780. The VAX products were among the first 32-bit minicomputers [28]. The VAX architecture was meant to span a wide range of products, much as the earlier System/360 architecture had done for IBM, and was also introduced with the VMS operating system. Though the VAX line was successful, the computer industry was being redefined by the microprocessor and DEC would face stiff competition on several fronts [46].
3.6.2 Desktop Computing Although the move from minicomputers to microcomputers was a natural technical progression as microprocessors became available in the early 1970s, the market had to be recognized and essentially created [47]. The personal computer industry grew out of the electronic hobbyist culture. Micro Instrumentation Telemetry Systems (MITS) developed electronics kits for the hobbyist market and launched the Altair 8800 into an ill-defined market with its appearance on the cover of the January 1975 issue of Popular Electronics. MITS received a flood of orders and the Altair galvanized a new community of hardware and software suppliers. The makings of a small computer system were readily available from component suppliers, with the microprocessor at the heart of the system. The personal computer market was soon a crowded mix of small start-up firms constructing machines from off-the-shelf components, and the Altair quickly vanished from the scene. The rise of Microsoft and Apple Computer out of this market has been the subject of much fascination. Although these firms developed interesting technologies, their survival during the tumultuous birth of personal computing probably results from a combination of technical abilities, perseverance, timing, and good luck. Of course, many established computer firms would enter the personal computing market once it was established, with IBM and Intel defining the standard and Apple Computer supplying the spice. Probably the most sophisticated vision of personal computing emerged from Xerox PARC, with its Alto computer and graphical desktop [14]. This machine would directly influence Apple Computer, as well as Microsoft, using the now classic model of technology transfer through personnel recruitment.
ERAS OF BUSINESS COMPUTING
33
However, the breakthrough was more a matter of cost than sophistication. IBM's 1981 introduction of an inexpensive machine with an open architecture transformed and legitimized the market. IBM followed with successive machines, such as the IBM PC/XT and IBM PC/AT, introducing the new PS/2 product line in 1987. Apple Computer introduced the Macintosh in 1984, raising personal computing standards with a powerful and intuitive user interface [48]. The emergence of desktop computers put in place the client computer, capable of interacting with minicomputers and mainframes, as well as doing local computations. A wide range of potential information systems architectures were now possible, with small-scale servers, intelligent clients, and centralized data processing facilities all contributing necessary components. All that was required were the interconnection mechanisms being supplied by the evolving computer networking and communications industry.
3.6.3 High-Performance Workstations In addition to the personal computer, the era of decentralization saw the introduction of high-performance desktop computers (i.e., workstations) in the early 1980s, initially targeting the scientific and engineering markets. These were sophisticated desktop computers introduced by start-up firms such as Apollo Computer, Sun Microsystems, and Silicon Graphics, as well as more established companies like Hewlett-Packard. Three important factors set these early workstations apart from the low-cost personal computers exemplified by the IBM PC. Workstations were based on powerful new microprocessors such as the Motorola 68000, utilized sophisticated operating systems with UNIX being a natural choice, and embraced networking technology such as Ethernet from the outset. Whereas microcomputers evolved from a hobbyist culture and stressed low cost, workstations emerged from sophisticated computer firms and were purchased by equally sophisticated customers. As has been the case throughout the business computing eras, the relentless and rapid progress in technology continually redefines the industry. The distinction between workstations and personal computers has blurred, with high-end Intel-based personal computers offering performance levels appropriate for the scientific and engineering tasks that defined the workstation class. To meet the need for a sophisticated operating system, Microsoft developed Windows NT to take advantage of these powerful new machines. There is a lot of development activity at the high-end desktop market and the only sure trend is that price and performance will continue to improve dramatically.
34
ALAN R. HEVNER AND DONALD J. BERNDT
One of the most successful and influential workstation firms is Sun Microsystems, founded in 1982 by a group of business people and researchers associated with Stanford University in a now familiar tale of laboratory-to-market technology transfer [49]. The company commercialized workstation hardware and the UNIX operating system, taking its name from the Stanford University Network (SUN) workstation project and building high-performance desktop computers at a reasonable cost for the engineering market. Two key researchers who joined the firm were Andy Bechtolsheim on the hardware side and Bill Joy on the software side. With a strong commitment to UNIX, Sun Microsystems has gone on to market a series of successful UNIX versions (first SunOS and now SOLARIS) derived from the original Berkeley UNIX system that Bill Joy had helped to develop. In addition to support for local area networking in the form of Ethernet, the implementation of the TCP/IP protocol in UNIX linked the expanding workstation market and the development of the Internet [50]. Another important development associated with the rise of highperformance workstations is the adoption of reduced instruction set computer (RISC) technology [51]. Most computers developed throughout the previous business computing eras provided increasingly large sets of instructions, pursuing what has come to be known as complex instruction set computer (CISC) technology. Two factors initially favored CISC technology: the fairly primitive state of compiler optimization technology, and the cost or speed of memory access. If the machine provided a rich set of instructions, compilers could easily generate target code that used those high-level instructions without having to apply involved optimization strategies, reducing the "semantic gap" between programming and machine languages [52]. Secondly, fairly complex operations could be performed on processor-resident data avoiding memory input/output. However, faster memory technology and improving compiler technology changed these underlying assumptions and led to work by IBM and others that focused on building simpler, but extremely fast processors. Essentially, if the processor implemented a simpler and more consistent set of instructions, great care could be taken to ensure that these instructions executed quickly, and with more possibilities for exploiting parallelism. John Cocke pioneered the approach at IBM, where the experimental IBM 801 was used to test the approach [53, 54]. Additional early research initiatives were begun at Berkley [51] and Stanford [55]. The approach generated considerable debate, but the experimental results were promising and RISC processors would rapidly evolve within the high-performance workstation market. For example, Sun Microsystems commercialized the technology as SPARC (Scalable Processor ARChitecture) and licensed the technology to other companies. MIPS Computer Systems also produced a RISC
ERAS OF BUSINESS COMPUTING
35
processor, derived from the Stanford MIPS (Millions of Instructions Per Second) project, which was used by many workstation vendors. IBM produced the RT and the more successful R/6000. Finally, Motorola joined with IBM and Apple to design the Power PC as a direct competitor to the long-lived Intel product line [52].
3.7 Operating Systems Though they are certainly software systems, operating systems are so closely associated with the underlying hardware that we often consider the combination as comprising the computing platform. Therefore, we have chosen to consider a brief history of operating systems in conjunction with the computing hardware discussed in this section. There is a rich history of operating systems, but we concentrate on a few systems that provided innovative business computing capabilities. The truly malleable nature of software has enabled operating system designers to borrow important ideas from previously successful systems, leading to an operating system history marked by cross-fertilization and common threads. The discussion will follow the eras outlined in Section 2, concentrating on the first four eras as did the hardware discussion. It is during the Eras of Calculation, Automation, Integration and Innovation, and Decentralization that mainstream operating system technologies matured.
3. 7. 1
Personal Calculators
The machines that developed in the Era of Calculation were the handcrafted ancestors of the modern digital computer. These machines were fragile, room-sized behemoths that routinely crashed as vacuum tubes failed during computations. The typical applications of these machines were well-understood numerical algorithms for computing important tables or solving engineering equations. These machines were mostly used by a closeknit group of experts that included the designers and developers, scientific colleagues, and other team members who maintained and operated the hardware. In fact, most users were highly skilled and very familiar with the one-of-a-kind computing environment they often helped create. These experts used the machine much like a personal calculator, signing up for a time slot during which the machine was completely dedicated to their tasks. Programming was usually accomplished by hand-wiring a plugboard, a task that could take hours or even days. However, the plugboard was one of the few early technologies that could deliver instructions at electronic speeds to the processor.
36
ALAN R. HEVNER AND DONALD J. BERNDT
3. 7.2 Batch Processing The Era of Automation ushered in a separation between the designers and developers of the computer and the commercial customers who would be programming the machines for specific tasks. These first commercial machines were housed in special rooms and maintained by a group of trained operators. Though the operators added a level of indirection, the dedicated model of computing was maintained. Typically, the programmers would write programs on paper using COBOL, F O R T R A N , or assembly language, transfer them to punched cards, and submit them to the operators for processing. Once a computation or "job" was complete, the output would be printed and placed in an output bin for later collection by the programmer. Though this style of interaction was a vast improvement over patch cords and plugboards, despite the bustling activities of operators the machine was often idle as the jobs were ushered around the computer room. The high cost of early commercial computers meant that any idle time was viewed as a serious waste of an expensive resource. Computer users quickly investigated alternative strategies to reduce idle time and manage their computing resources with more efficiency. The strategy adopted by most computer installations was batch processing. A group or "batch" of jobs was collected and then written to magnetic tape, the tape was then transferred to the main computer where all the jobs were processed, and the output was also written to tape for later processing. A rudimentary operating system was used to read the next job off the tape and execute it on the processor. One of the first such operating systems was developed in the mid-1950s at General Motors for an IBM 701 [56]. The successful use of these early batch operating systems led to more inhouse development by large customers and vendor offerings, including IBSYS from IBM, one of the most influential early operating systems for the IBM 7090/7094. In order to communicate with the computer operators to make requests such as mounting data tapes, messages were associated with a job, and eventually a complex Job Control Language (JCL) developed to communicate with operators and their software counterpart--operating systems. Desirable hardware support for emerging operating systems included memory protection, a hardware timer, privileged instructions, and interrupt handling [57]. Often relatively inexpensive machines were used to write the batch to an input tape and print the results from an output tape, thereby using the expensive mainframe only for running the jobs. A typical installation used a machine like the IBM 1401 to read card decks and produce input tapes, while an IBM 7094 might be used for the actual computations [28]. The IBM 1401 would again be used to print the output tapes, often using the
ERAS OF BUSINESS COMPUTING
37
innovative high-speed IBM 1403 chain printer. Though the turnaround time was still measured in hours, at least each machine was used for the most appropriate tasks and the expensive mainframes were kept busy.
3.7.3 Multiprogramming and Time Sharing With the IBM 360 series, IBM embarked on one of the most ambitious software engineering projects to build the operating system for this formidable product line--OS/360. The challenge facing OS/360 designers was to produce an operating system that would be appropriate for the lowend business market, high-end scientific computers, and everything in between. This wide spectrum of the computer market was characterized by very different applications, user communities, peripheral equipment requirements, and cost structures. In meeting these conflicting demands, OS/360 became an incredibly complicated software system, with millions of lines of assembly language written by thousands of programmers. The story of this massive system development effort has been told in many forms, including the classic book The Mythical Man Month by Frederick Brooks, one of the OS/360 designers [58]. In order to interact with OS/360, there was an equally complex JCL with book-length descriptions required [59]. Despite these obstacles, OS/360 managed to work and contribute to the success of System/360, even though bugs and new releases remained permanent fixtures. Three important operating system innovations are associated with OS/360 and other operating systems from the classic third generation of computers (i.e., integrated circuits): multiprogramming, spooling, and time sharing [28]. The execution of a program entails many activities including both arithmetic computations and input/output (I/O) to secondary storage devices. The order-of-magnitude difference in secondary storage access times makes I/O a much more costly operation than an arithmetic computation and may impose long waiting times during which the processor remains idle. Different types of computing tasks often have very different profiles with respect to I/O activity. Heavy computational demands on the processor typify scientific computing, whereas I/O operations often predominate during business data-processing tasks. Long I/O wait times and idle CPUs are factors that lead to inefficient resource usage, much like the human operator activities that first led to batch processing. For instance, on earlier systems, such as the IBM 7094, the processor remained idle during I/O operations. Multiprogramming or multitasking is a technique in which several jobs are kept in main memory simultaneously, and while one job is waiting for I/O operations to complete, another job can be using the processor. The operating system is responsible for juggling the set of jobs,
38
ALAN R. HEVNER AND DONALD J. BERNDT
quickly moving from one job to another (i.e., a context switch) so that the processor is kept busy. This capability greatly increases job throughput and efficient processor utilization. The second innovation was aimed at the need for supporting machines, such as the IBM 1401, for card reading and tape handling. Spooling (Simultaneous Peripheral Operation On Line) was used to read card decks directly to disk in preparation for execution, as well as for printing output results. This technique provided efficient job management without the need for supporting machines and a team of operators. Though multiprogramming and spooling certainly contributed to greater job throughput and processor utilization, it was still often hours from the time a programmer submitted a job until output was in hand. One of the benefits of the dedicated processing model was the immediate feedback. In order to provide a more interactive experience and meet response time goals, the concept of time-sharing was explored [60]. Essentially, time-sharing uses the same techniques as multiprogramming, switching quickly between jobs to efficiently use the processor. However, time-sharing users were connected via on-line terminals and the processor would switch between them giving the illusion that the machine was dedicated to each of them. Since many of the interactive tasks included human thought time (or coffee breaks) with sporadic demands for processor time, the processor could switch quickly enough to keep a whole set of interactive users happy and even manage to complete batch jobs in the background. Among the first time-sharing systems to be developed was the Compatible Time-Sharing System (CTSS) at MIT, running on an IBM 7094 [61]. At first CTSS only supported a few simultaneous users but it was a successful demonstration of a time-sharing system and provided the foundation for a larger project. Project MAC (Man and Computer) at MIT, with Bell Labs and General Electric as industrial partners, included development of a "computer utility" for Boston. The basic idea was to provide computing services using a model much like electric power distribution, where you plugged into any convenient outlet for computer time. Essentially, this was time-sharing on a grand scale with hundreds of simultaneous users on largescale systems. At the center of this effort was the construction of the MULTICS (MULTiplexed Information and Computing Service) operating system, which eventually worked well enough to be used at MIT and a few other sites [62-65]. The MULTICS system was implemented on a GE 645 rather than an IBM System/360, which did not support time-sharing well, causing concern at IBM and giving GE an edge in the emerging time-sharing market. However, implementing MULTICS was a huge endeavor and both Bell Labs and GE dropped out of the project along the way. Though
ERAS OF BUSINESS COMPUTING
39
MULTICS never did become widely adopted and the idea of a computer utility withered, MULTICS was an influential test-bed for many operating system concepts that would find a place in future systems. IBM would also develop a time-sharing capability for its System/360, but it would be the later System/370 using integrated circuit technology that handled time sharing using the Conversational Monitoring System (CMS) and Time Sharing Option (TSO). The System/360 architecture has endured through successive generations of IBM products and increasingly sophisticated operating systems were developed, including OS/SVS (Single Virtual Storage) to take advantage of virtual memory (16 MB) on the upgraded System/370. The interim SVS was replaced with MVS (Multiple Virtual Storage) to satisfy growing memory requirements, providing a 16 MB address space for each job. When the underlying hardware was upgraded to handle 31-bit addresses, MVS/XA (Extended Addressing) provided 2 GB per job. This was later extended further in MVS/ESA (Enterprise System Architecture) to allow up to 32 GB per job to be assembled from 2 GB address spaces. MVS is one of the most complex operating systems ever developed and has been a long-lived software component of the System 360/370 computing platform [57].
3.7.4 Desktop Computing and Network Operating Systems The Era of Decentralization is the period when advances in microprocessors and computer networking provided the missing components in our information technology tool kit that enabled desktop computing, highspeed LANs, and global connectivity via internetworking. Most of these technologies are discussed elsewhere, but there were several related operating system developments during this era. Two of the most widely used operating systems, UNIX and MS-DOS, flourished in this period. In addition, both of these operating systems would eventually support networking, completing the connectivity picture and allowing individual desktops access to the Internet. Though UNIX was developed somewhat earlier on a DEC PDP-7 minicomputer (the First Edition dates from 1969), the early versions were intended for internal Bell Labs use [66]. It was not until Ritchie and Thompson published an influential paper on UNIX in 1974 that AT&T began shipping UNIX, along with the source code, to interested universities and research organizations [67]. The nearly free distribution of the UNIX operating system and source code led to a most interesting software development project. An informal worldwide community of developers
40
ALAN R. HEVNER AND DONALD J. BERNDT
adapted and extended the UNIX system, creating one of the most powerful and influential operating systems ever written [68]. The UNIX system began when Bell Labs pulled out of the MULTICS project and Ken Thompson set out to re-create some of the functionality in a programming environment for use at Bell Labs. The UNICS (UNIplexed Information and Computing Service) system, a pun on the MULTICS acronym, was later called UNIX [28]. Thompson designed UNIX to be a lean system that provided only the essential elements of a good programming environment without the grand-scale baggage required to implement a metropolitan computer utility. The strengths of the UNIX system have influenced subsequent operating systems and include the following points. 9 The system was designed with simplicity in mind. It had the ability to connect components together to construct more complex programs via pipes that carry streams of data. 9 The system and source code were freely distributed and became the focus of decentralized development efforts. One of the most influential versions of UNIX was developed by graduate students and researchers at Berkeley, released as a series of Berkeley Software Distributions (BSD) [69, 70]. 9 UNIX was implemented using a high-level system programming language, C. The C implementation of UNIX meant that the operating system was more easily ported to other hardware platforms, as long as there was a C compiler. Of course, C has become one of the most widely used programming languages. 9 Later versions of UNIX included support for networking, with Ethernet in the local area environment and the TCP/IP protocol suite for the Internet. The adoption of TCP/IP in Berkeley UNIX made the protocol suite a de facto standard. Once TCP/IP became freely available, both UNIX and the Internet developed together in many ways. Probably the most significant shortcoming of UNIX has been the lack of a standard, a flaw due to the loosely organized worldwide community that was responsible for so many of its strengths. The UNIX system was adopted by many computer makers and subsequently modified to include incompatible extensions that have contributed to divisions within the UNIX community. There have been several attempts to standardize the UNIX operating system, including AT&T standards such as the System V Interface Definition (SVID). The IEEE Standards Board began an effort to reconcile the two main branches of the UNIX family tree under the POSIX
ERAS OF BUSINESS COMPUTING
41
(Portable Operating System) project. 3 The POSIX committee eventually produced a set of standards, including the 1003.1 standard that defines a set of core system calls which must be supported. Unfortunately, a subsequent division between vendors such as IBM, DEC, HP, and others that formed the Open Software Foundation (OSF) and AT&T with its own UNIX International (UI) consortium exacerbated the problem of creating a unified UNIX system. Many vendors developed their own UNIX versions such as IBM's AIX, DEC's ULTRIX, Hewlett-Packard's HP-UX, and Sun Microsystem's SOLARIS. The UNIX system has dominated in the non-Intel workstation market, especially RISC-based systems, and has more recently been a force in the database and web server arena [71]. In addition, there have been several freely distributed versions such as the LINUX system, which has recently become a popular UNIX system for personal computer platforms. The Intel-based personal computer market has been dominated by Microsoft's MS-DOS since the introduction of the first IBM PC. Though the first MS-DOS versions were primitive single-user systems lacking many of the features that had evolved since the early batch processing systems, subsequent releases added functionality, drawing heavily on UNIX. 4 For the introduction of the PC, IBM went to Microsoft to license the BASIC interpreter and planned to use the CP/M-86 operating system developed by Gary Kildall at Digital Research. When the C P / M schedule slipped, Microsoft was asked to supply an operating system as well. In what would have been an interesting alternative computing history, Microsoft might have turned to UNIX, but the requirements for at least 100 K of memory and a hard disk prevented its initial use. Microsoft bought 86-DOS (Disk Operating System), internally code named QDOS for Quick and Dirty Operating System, from Seattle Computer Products and hired its original developer, Tim Paterson, to enhance it for the IBM PC launch [9]. The renamed MS-DOS was to become the software counterpart to IBM PC hardware and the Intel-based microcomputers. IBM called the system PCDOS and Microsoft retained the rights to market it to the clone market as MS-DOS. If anyone had been able to predict the explosive growth in the microcomputer market, more thought would have been given to both the hardware and software components of this revolutionary computer platform. For instance, the IBM PC did not incorporate hardware protection or privileged instructions to prevent programs from bypassing 3The " I X " in POSIX was added to give the Portable Operating System project a sort of UNIX-like sound [28]. 4 Originally, Microsoft was licensed by AT&T to distribute UNIX.
42
ALAN R. HEVNER AND DONALD J. BERNDT
MS-DOS, leading to a huge number of bug-infested and non-portable software applications. Within a decade there were over 50 million PCcompatible platforms using versions of MS-DOS [9]. IBM found itself in the strange position of leading the personal computer industry with a product composed of components designed by others and using an operating system over which it had only indirect control. MS-DOS version 2.0 was a major rewrite that was delivered with the IBM PC/XT. This version drew heavily on UNIX for the file system structure, shell I/O redirection and interaction, as well as system calls, though it still remained a single-user system without any of the time-sharing or protection features. The 1984 introduction of the IBM PC/AT brought version 3.0. Though the Intel 80286-based system included 16 MB of memory (versus 640 K on earlier systems), kernel and user modes, hardware protection features, and support for multiprogramming, MS-DOS used none of these features and ran in an 8086 emulation mode [28]. However, version 3.0 provided networking support, a critical event that contributed to the Era of Decentralization. In 1987, IBM introduced the PS/2 family and planned to produce a more robust operating system in partnership with Microsoft--OS/2. OS/2 was one of the few new operating systems to be commercially engineered without concern for backward compatibility, providing many advanced features (e.g., extended memory addressing, protected modes, and multiprogramming) in a clean and elegant design. Unfortunately, the market did not respond. Although technically inferior, MS-DOS with its huge collection of applications held the superior market position. Microsoft abandoned OS/2 in 1991 and in response IBM formed a software development alliance with Apple Computer. Under market pressure, IBM itself developed MS-DOS version 4.0, which Microsoft reverse-engineered for the clone market. Finally, Microsoft released MS-DOS 5.0 in 1991. This major new version finally provided strong support for extended memory, a separate command shell, help facilities, a screen editor, and a simple user-initiated multiprogramming feature. It was also the first version to be sold directly to users, rather than supplied solely to computer manufacturers. MS-DOS remained an idiosyncratic operating system that provided a very difficult programming environment, but the inertia provided by millions of personal computers made it one of the most long-lived operating systems. At the same time, there were several technically superior operating systems available in the personal computer market. UNIX became the system of choice on high-performance workstations, using a window system to support the all-important WIMP interface. IBM continued the development of OS/2. The Apple Macintosh System 7 extended the model pioneered at Xerox PARC and provided one of the most intuitive and
ERAS OF BUSINESS COMPUTING
43
well-engineered operating system environments. Microsoft embarked on a strategy to provide a pleasing interface on MS-DOS by developing Microsoft Windows, a WIMP-style interface in the tradition of Xerox PARC and Apple. In the most recent clean-slate operating system project to be undertaken in the commercial sector, Microsoft began development of Windows NT. NT provides a Windows-like interface on top of a newly implemented and sophisticated operating system [57]. Like OS/2 and Macintosh System 7, Windows NT is an operating system capable of exploiting the tremendous power of current microprocessors, which have blurred the line between workstations and traditional personal computers, providing the performance of the not-too-distant minicomputers and mainframes of the past. Windows NT is based in large part on UNIX, especially a more recent incarnation called Mach, which has been used to explore important new operating system capabilities for threads (i.e., lightweight processes) and multiple processors [72, 73]. It will be interesting to watch the evolution of operating systems as new leaps in performance provide incredibly powerful desktop systems, based on both single and multiple processors.
4.
Communication" C o m p u t e r N e t w o r k i n g
Without computer networking technologies, the role of the computer would have remained a simple extension of the original calculating machines. The revolutionary nature of computing and our ability to construct a rich variety of information system architectures are inextricably linked with the growth of computer networking. Indeed, as we enter the age of a wired world, new forms of communication and a vast new digital economy are arising from the synergy of computing and networking technologies. The rapid growth of computing technology during the postwar years and the unprecedented leaps in performance enabled by large-scale integrated circuits overshadowed emerging networking technologies. Although the early steps in networking technology may appear to have lagged behind growth in pure processing power, recent gains in networking protocols and raw speed have been dramatic. A paper by several leading database researchers discussed these trends in computing and communications, noting that each of the following measures of computing performance had improved by an order of magnitude (i.e., a factor of 10) or more every decade [74]: 9 the number of machine instructions per second 9 the cost of a typical processor
44
ALAN R. HEVNER AND DONALD J. BERNDT
9 the amount of secondary storage per unit cost 9 the amount of main memory per unit cost. The expectation was that these measures would continue to improve and that two new measures would join the march: the number of bits transmitted per unit cost as well as bits per second. Networking has clearly joined computing in the climb toward ever-higher levels of performance. Most textbooks on computer networking draw a coarse division between Wide Area Networks (WAN) and Local Area Networks (LAN) technologies [75]. Though this distinction is blurred somewhat by switched technologies that can play a role in both situations, such as Asynchronous Transfer Mode (ATM) or even mobile networking, the essential characteristics and historical development are highlighted by this separation. WAN and internetworking technologies arose through efforts to connect geographically dispersed centers of computing excellence at university and military research centers. Although the proprietary interconnection schemes for peripherals and terminal devices could be considered the earliest form of local area networking, LAN technologies, such as Ethernet, developed in response to the demands for interconnecting large numbers of desktop machines. At an abstract level the twin themes of resource sharing and communication underlie both WANs and LANs, but under closer scrutiny the details reinforce the distinction. WANs developed to share the expensive computing resources and expertise of the mainframe, with the somewhat unexpected side effect of spawning the now ubiquitous electronic mail (email) technology. In fact, email would become the largest component of traffic on the early research networks. LANs developed to allow office users to share both resources, such as printers and servers, and information or documents for collaborative work. Essentially, LANs made economic sense when inexpensive computers became the norm. Therefore, local area networking required a cost-effective technology, without the reliance on expensive dedicated networking equipment (e.g., routers) that characterized wide area networking. For example, Ethernet provides a simple protocol and continually improving levels of performance, where all the machines on a particular network share the interconnection media. The use of shared media, rather than the dedicated point-to-point links that are used in WANs, is an artifact of the different economics of the environments. In the sections that follow, the growth of wide area networking and the fundamental technology of packet switching are discussed. The use of packet switching, enabled by digital computing technologies, is at the heart of current WANs and the most publicly visible manifestation, the Internet. In the local area network arena, Ethernet has become a dominant technology. With the more recent introductions of Fast Ethernet and
ERAS OF BUSINESS COMPUTING
45
Gigabit Ethernet, the technology promises to be an important factor in the years ahead. The phenomenon of global networks is made possible by both technologies working in concert. Like a tree, the wide area packet switched networks provide the trunk and major branches, but the millions of leaves or personal computers are connected using LANs. The story begins, as so much of computing history, with the interaction of military funding and university research.
4.1
ARPA: Advanced Research Projects Agency
On the heels of the 1957 launch of the Soviet Sputnik satellite, President Eisenhower formed the Advanced Research Projects Agency (ARPA) as an agile organization, free of the normal armed services bureaucracy and with the mission of ensuring continued American leadership in scientific research [76]. Military funding fueled the evolution of computing hardware. ARPA's mission was to foster similar innovations outside the individual branches of the armed services. As a general during World War II, Eisenhower was clearly familiar with the military and had little patience for the bickering and redundancies that sometimes plagued the armed services. Therefore, he sought civilian leadership for the newly formed ARPA, funded it well, and had the Director of ARPA report directly to the Secretary of Defense [77]. If America was going to invest heavily in a scientific Cold War, an independent agency with the necessary contractual power and nearly unlimited research scope seemed an appropriate vehicle. Initially, a large part of ARPA's agenda was focused on developing the space program and exploring the military uses of space. However, Eisenhower had already planned for an independent space agency and the National Aeronautics and Space Administration (NASA) was formed in 1958. Responsibility for space programs moved to NASA, with military applications being once again the province of the individual armed services. Although still a fledgling agency, ARPA was at a crossroads and would either redefine itself or disband. The staff at ARPA developed a new plan for the agency to foster American basic research and high-risk projects that stood little chance of being funded by the armed services. The focus of ARPA changed from the traditional military-industrial complex to the nation's research universities. Under successive directors the agency remained small, but both the budget and scope of research projects grew [77]. ARPA's support of computing research started when time-sharing emerged as a promising technology that might allow expensive computing resources to be used for both military and civilian research. In addition, the late 1950s saw the emergence of the minicomputer. ARPA management
46
ALAN R. HEVNER AND DONALD J. BERNDT
recognized the increasing importance of computing and Jack Ruina, the first scientist to lead ARPA, decided to form a group charged with supporting research in computing. J.C.R. Licklider, a psychologist with a growing interest in computing, became the first director of what would eventually become ARPA's Information Processing Techniques Office (IPTO) [76]. Licklider had a vision of the computer as a powerful tool, working in synergy with the human mind, to leverage the potential of the human intellect. His vision stood in contrast to the more traditional scientific computing focus and after only a few years of experience with computing, he authored the influential paper, " M a n - c o m p u t e r symbiosis" [78]. Under his enthusiastic leadership ARPA would invest heavily in computing research and begin pursuing projects in the newly emerging field of computer networking. Licklider's handpicked successor in 1964 was Ivan Sutherland, a leading computer graphics researcher, who would continue to emphasize similar research in computing. Sutherland would hire Bob Taylor, who would then fund and directly administer the first efforts to interconnect the isolated islands of high-performance computers that existed in the military and research universities. He would eventually convince Larry Roberts from the Lincoln Laboratory to administer the project that would evolve into the Internet.
4.2
Packet Switched Networking
Paul Baran, who started his career as a technician at the Eckert-Mauchly Computer Corporation, went on to pursue a graduate engineering degree at UCLA and became a researcher at the RAND Corporation in 1959 [77]. RAND was formed in 1946 to continue developing the nation's wartime operations research capabilities and evolved into a leading research institution. At RAND, Baran became interested in the survivability of communication systems during nuclear attack at a time when both the US and the Soviet Union were assembling nuclear ballistic missile arsenals. Baran recognized that digital computing technology might serve as the basis for a survivable communications system and began refining his ideas, authoring a series of RAND technical reports. 5 Two of the most powerful ideas that Baran developed form the basis of what we now call packet switching networks [79]. 9 The first idea was to introduce redundancy into the network, moving away from centralized or decentralized topologies to what he called 5Baran's original RAND reports are available at the Internet Society web site www.&oc.org under the sections on Internet history.
ERAS OF BUSINESS COMPUTING
47
distributed networks. The mesh-like distributed network loosely modeled the interconnections between neurons in the brain. 9 The second idea was to divide a message up into small message blocks or packets that could take different paths through the network and be reassembled at the destination. In the face of network failures, this would allow message blocks to continue to be delivered over surviving portions of the network.
These two fundamental ideas are among the defining characteristics of packet switched networks and were explored by Baran as he developed a model for a distributed communications network. By dividing up the message into pieces or blocks, a postal service model could be used to deliver electronic data and voice communications using separately addressed blocks. A key insight was that digital computers could disassemble, route, and reassemble the messages fast enough to meet the communication requirements. Survivability would be introduced by building redundant links and forming a distributed network. After running simulation experiments, Baran estimated that the surprisingly low "redundancy level" of between 3 and 4 connections per node would provide very high reliability. Essentially his experiments showed that low-cost and somewhat unreliable links could be used to build a highly reliable network by introducing a modest amount of redundancy. Baran also pictured each node in the network as a digital computer that would route the message blocks at each juncture over the best surviving links. He outlined a scheme that called for a "self-learning policy at each node, without need for a central, and possibly vulnerable, control point" [79]. The routing decisions would be made dynamically in what he called "hot potato routing," building a high-speed store-and-forward network with reliability through redundancy. Much like the early telegraph system, the nodes would be capable of storing the message blocks and forwarding them along the most appropriate path, depending on the state of network links or connections. Each node would contain a routing table that described the current network topology, indicating the number of links and preferred routes to other nodes. The uniform message blocks would allow for efficient handling of all types of data. These ideas, taken together, constituted a revolutionary new communications model enabled by the digital computer. Baran had difficulty convincing both RAND colleagues and later AT&T management that his ideas represented a practical approach to designing a digital communications network. AT&T never did recognize the value in Baran's early outline of packet switching technology, despite his considerable efforts in trying to convince the dominant communications company
48
ALAN R. HEVNER AND DONALD J. BERNDT
of the benefits. AT&T was firmly entrenched as the monopoly telephone company across the nation and viewed communications technology from a circuit switching perspective. The telephone system relied on switching equipment that would form a dedicated physical circuit over which voice communications would travel. The circuit exists for the duration of the call and is dedicated to the communicating parties, whether they are speaking or not. The idea that you could break up communications into little pieces, route them along various paths, and reassemble them at their destination was a foreign concept to telephone company management. In fact, packet switching networks would turn out to be flexible enough to provide circuitswitching-like behavior using virtual circuits. Virtual circuits are negotiated paths that allow the bulk of the packets to follow the same route, providing sequenced delivery and simplified routing decisions [75]. Baran's fundamental ideas would await rediscovery by ARPA and ultimately form the basis for one of the greatest breakthroughs in computing history.
4.3
ARPANET
Robert Taylor served as a deputy director under Sutherland and became the third director of the IPTO in 1966. Thus, he was already familiar with many of ARPA's activities in computing research. Mainframe computing was the dominant model, with the leading research universities and laboratories each committing substantial resources to remain at the leading edge of this rapidly growing field. ARPA was funding a lot of these initiatives and Taylor was struck by the amount of duplication that was developing throughout the research centers. Computers were large and expensive, with each ARPA-sponsored investigator aspiring to own a stateof-the-art machine. Each new investment created an advanced, but isolated computing center. Taylor was able to quickly convince the current ARPA director Charles Herzfeld that an electronic network linking the computing centers would make financial sense and enable a new level of collaborative work. So, Taylor now needed to assemble a technical team to administer and implement the project [77]. In 1966, Lawrence Roberts left Lincoln Laboratory to head the networking project that Taylor had initiated. Roberts had many connections within the computing research community and began to identify key contributors to the project. Essentially, ARPA needed to identify principal investigators to actually conduct the research and build the prototype network. Taylor convened a meeting with many of the leading researchers, but there was not a tremendous amount of enthusiasm for resource sharing among these well equipped centers, as well as some hesitation to commit valuable computing resources to the network itself. It was an important idea
ERAS OF BUSINESS COMPUTING
49
put forward by Wesley Clark after the meeting that would change Roberts' conception of the network [76]. Clark suggested a network of small identical computers be used to provide the actual network infrastructure, with simple links connecting the computing centers to the network computers. This arrangement would simplify the network compatibility problems and lighten the burden on host computers. He also suggested Frank Heart as a computer engineer capable of building the system. Roberts knew Frank Heart from Lincoln Laboratory, but Heart had moved to the consulting firm of Bolt, Beranek, and Newman (BBN) in Cambridge, Massachusetts. Also among the early advisors was Leonard Kleinrock, Roberts' long-time friend from Lincoln Laboratory who would head the Network Measurement Center at UCLA. At a subsequent meeting, Roberts would learn of the on-going work of Donald Davies, as well as the initial work of Paul Baran. The pieces were in place to try a packet switching network on a grand scale. Roberts drafted a request for proposals and circulated it among the informal advisors that had attended the early meetings. The proposal outlined a network of dedicated Interface Message Processors (IMPs) that would manage the network links, routing packets and handling transmission errors. The newly emerging minicomputers would be small and inexpensive enough to serve as IMP computer platforms, making the use of dedicated network computers a revolutionary, yet practical idea. Proposals were submitted by dozens of leading computer companies, including IBM, Control Data Corporation (CDC), and Raytheon. The small consulting firm of BBN would draft a very detailed proposal, exploring many of the issues that would be faced in constructing the network. BBN was often called "the third university" among Boston's formidable collection of research universities and was well connected within the computing research community. BBN brought an innovative research focus to this novel project and would succeed brilliantly in designing and implementing the early network.
4.3. 1 The IMPs: Interface Message Processors BBN assembled a small, talented staff and began the process of designing the Interface Message Processors, the heart of the network. These machines would have to handle the packet traffic and make all the routing decisions, while still providing performance that would make the network a viable real-time store-and-forward system. BBN decided to base the IMPs on "ruggedized" versions of Honeywell DDP 516 minicomputers. Since these computers were made to withstand battle conditions, the theory was that they could also be safely installed in the laboratories of research universities. ~
50
ALAN R. HEVNER AND DONALD J. BERNDT
The IMPs, the first packet switching routers, once implemented exceeded performance expectations. The first IMPs began arriving at the research centers in 1969, with IMP 1 going to Kleinrock's group at UCLA, IMP 2 going to SRI, IMP 3 going to UC Santa Barbara, and IMP 4 going to the University of Utah [77]. The A R P A N E T was taking shape. BBN would get IMP 5 and the first crosscountry connection. Using IMP 5, BBN would go on to develop remote network management features that would lay the foundation for modern network operations centers. Once the IMPs were in place, the research centers were charged with writing host-to-IMP connections for their specific machines. At BBN, Bob Kahn would write the specification for host-toIMP interconnections that would serve as a blueprint for sites to connect to the network. BBN would provide the IMPs and the basic packet handling facilities, but the host-to-IMP and later host-to-host connections would be left to the research centers to design.
4.3.2
Network Protocols
An informal group of research center members, mostly graduate students, evolved into the Network Working Group (NWG) [80]. The N W G began debating the type of host-to-host services that should use the underlying packet switching network being implemented by BBN. The N W G became an effective, yet democratic organization that started the immense job o f charting a long-term course for the network. Among the early participants were three graduate students from Klienrock's group at UCLA: Steve Crocker, Vint Cerf, and Jon Postel. In 1969, the N W G issued its first Request for Comment, or RFC1, setting the inclusive tone for the organization that still permeates many of the Internet's governing committees. RFC 1 addressed the basic host-to-host handshake that would be used to initiate communications. The N W G continued to issue RFCs and would arrive at the notion of a layered protocol, where new more advanced services are built upon lower-level common services. In 1971, BBN took a more active role in the N W G hoping to accelerate the work on network protocols. The ability to login to remote machines and a mechanism for transferring files were identified as two important services that were necessary for resource sharing. The relatively simple T E L N E T protocol was the first higher level service that supported remote logins across the network, and it is still in widespread use today. The second protocol was the file transfer protocol (FTP), again a protocol that has remained an important service. These two protocols were the nucleus for resource sharing and made the A R P A N E T a success on many levels. At the first International Conference on Computer Communication in 1972, the
ERAS OF BUSINESS COMPUTING
51
team of ARPANET investigators hosted a large-scale interactive display of the network, generating widespread interest and galvanizing a host of new participants in the rapidly growing network field [76].
4.3.3
Electronic Mail: Communication not Computation
Though resource sharing was the initial motivation for the ARPANET, electronic mail would quickly account for the majority of traffic. On many of the large time-sharing systems, ad-hoc facilities for depositing electronic messages were developed. For example, the MAILBOX system was available on the Compatible Time-Sharing System at MIT in the early 1960s. However, in the post-ARPANET world the problem was to scale up these simple electronic mailboxes to operate in a true network environment. Electronic mail started out as a simple program written by Ray Tomlinson at BBN to provide simple mailbox facilities on PDP-10s running the BBN developed operating system TENEX [77]. Tomlinson enhanced his locally oriented programs to allow delivery between two BBN PDP-10s, changing the focus of electronic mail from local to the world of networking. The use of FTP to carry messages provided the bridge for Tomlinson's programs to provide electronic mail over the ARPANET. These early communication programs soon spawned a host of increasingly sophisticated electronic mail handlers. Tomlinson's initial program may have been simple, but electronic mail offered a new type of communication that has remained at the center of network usage. In fact, the rather unexpected growth in email traffic was among the first pieces of evidence that communication, not computation, may be the most important benefit of the digital computer.
4.4
Xerox PARC: The Office of the Future
Xerox's famed Palo Alto Research Center (PARC) opened in 1970 near Stanford University. (A detailed history of this important center is fascinating reading [14].) Several alternative locations were explored, including New Haven and Yale University for its proximity to the Stamford, Connecticut headquarters of Xerox. George Pake, PARC's first director, persuaded other executives that the dynamic Palo Alto area would be the perfect location. Unlike many universities at the time, Stanford University was eager to build relationships with industry and viewed PARC as an important model of cooperation. Xerox PARC would assemble one of the most impressive groups of researchers and generate advances on multiple technological fronts. Though the story of how Xerox itself failed to capitalize on PARC technologies has become legend, the innovative work of
52
ALAN R. HEVNER AND DONALD J. BERNDT
the research center helped to guide many of the key technologies of business computing. In fact Xerox did find valuable products, such as the laser printer, among the technologies pursued under the PARC creative umbrella. The commercialization of computing technology has been unpredictable and the rewards have not always gone to the technological leaders. However, within Xerox PARC several key technologies would come together to form a lasting vision of the office of the future and the role of personal computing. Xerox PARC was able to attract an incredible array of talented individuals owing to its location, unrestricted research agenda, and attractive salary levels. Cutbacks in research funding by the military, formalized in part by the Mansfield Amendment, and general economic conditions led to a buyers' market for research and development talent [14]. In this environment, Xerox PARC was able to hire many of the top computer science researchers. In fact, PARC almost directly inherited the computing research mantle from ARPA as federal funding diminished. ARPA had concentrated its funding in a few universities, such as CMU, MIT, Stanford, UC Berkeley, UCLA, and the University of Utah. Researchers from almost all these laboratories as well as Robert Taylor, the ARPA Information Processing Techniques Office (IPTO) director, would end up at PARC. PARC would lead the way on many fronts, but in particular four important technologies would come together and offer a glimpse of the office of the future: the laser printer, the personal computer, the graphical user interface, and the LAN.
4.4. 1 The Laser Printer: Images not Characters Gary Starkweather, a Xerox optical engineer, began using lasers to paint the surface of xerographic drums on an experimental basis at the traditional Rochester-based Xerox research facility. His initial work did not find acceptance at the more product-focused laboratory and he managed to get transferred to Xerox PARC where he was free to pursue the ideas that would lead to the laser printer. In building the earlier prototypes, Starkweather used a spinning disk with mirrored facets that redirected the laser beam across the surface of the xerographic drum. By modulating the laser, millions of dots could be etched on the drum surface to form a complete image. These early experiments relied on technologies that would be too expensive to produce commercially, such as extremely precise mirrors. A return to simpler technologies would provide the breakthrough. Starkweather used a lens to passively correct any imperfections in the mirrored surfaces and the whole arrangement worked. The newly developed Scanning Laser Output Terminal (SLOT) was a flexible output device capable of producing images, not simply
ERAS OF BUSINESS COMPUTING
53
the character output of mechanical printers [14]. The translation from a computer representation to SLOT input was a difficult task at the time as well. It is solved easily now with abundant and inexpensive memory, but Butler Lampson and Ron Rider would develop the Research Character Generator (RCG) at PARC using wire-wrapped memory cards, essentially a high performance print buffer. With the RCG in place, the laser printer would become an operational technology within PARC, even though Xerox would not market a commercial version (the Xerox 9700) until 1977. Just as the IBM 1403 "chain printer" made the IBM 1401 a successful business computing solution, the laser printer would provide the bridge from the digital to the physical world in the office of the future.
4.4.2 The Alto: Computing Gets Personal Charles Thacker and Butler Lampson would start a somewhat unofficial project to implement Alan Kay's vision of a small personal computer with a high-resolution display for graphical interaction. The tight timetable and unofficial nature of the project forced Thacker and fellow designer Edward McCreight to avoid unnecessary complexity. The final design incorporated an innovative technique in which the processor would play many roles, mediating between all the peripheral devices and the memory while implementing a "microparallel processing" strategy that simplified other control functions for the disk drive, keyboard, and graphical display. Another innovation was to use memory to store and manipulate a bitmap for the high-resolution display. The first Alto was built in 1973 and made its now famous public debut by animating Cookie Monster, a Sesame Street favorite, for a gathering of computer researchers. The final design was so simple that sometimes people would simply requisition the parts and assemble their own machine. Somewhere in the neighborhood of 2000 Altos were eventually produced, far surpassing the original plan for 30 machines [14]. Although at over $12 000 an Alto was quite expensive, the march from current medium-scale integration (MSI) to large-scale integration (LSI), and then to very large-scale integration (VLSI) would make the technology affordable in a few short years. These machines provided a model for the future and became essential equipment throughout PARC, forming the nexus for the other three critical technologies.
4.4.3
The Graphical User Interface: A Digital Desktop
The desktop metaphor and overlapping windows that characterize virtually all the personal computers and workstations of today were first
54
ALAN R. HEVNER AND DONALD J. BERNDT
fully realized at Xerox PARC. Much of the software for the Alto would be implemented using Smalltalk, an object-oriented programming language developed by Alan Kay and other researchers at PARC. The first "killer application" would be Bravo, a "What You See is What You Get" (WYSIWYG) text processor. Documents could be readily composed on the Alto and sent to the laser printer for publication. Indeed, Lynn Conway and Carver Mead would use the internal Xerox text processing system to quickly publish their influential textbook that ushered in the widespread use of very large-scale integration (VLSI) and the microprocessor. Today the windows, icons, mouse, and pull-down menus (WIMP) style interface has become familiar to all. These technologies would directly influence developments at two fledgling personal computer companies Apple and Microsoft, as employees left PARC to work at these firms, and, in the case of Apple, through a series of demonstrations at PARC itself authorized by Xerox headquarters [ 14].
4.4.4
Ethernet: The Network is the Office
Robert Metcalfe was called on to develop networking technology for the Altos, and together with David Boggs, implemented an elegant shared media solution they coined Ethernet [81]. Though it was an option on the Altos, it quickly became another crucial technology allowing the Altos to serve as a communication device, as well as effectively using the developing laser printer technology. This essential piece allowed Xerox PARC to use the office of the future and demonstrate it to others.
4.4.5
The Xerox Star
These four technologies, working in concert, created the networked office environment populated by personal computers that offer an intuitive desktop m e t a p h o r - - a lasting standard. Xerox PARC faced many challenges in trying to commercialize these technologies within the corporate framework. Eventually, a complete "office of the future" was constructed for the 1977 Xerox World Conference in Boca Raton, Florida and demonstrated to all of the corporate executives [14]. Xerox formed a new research division to commercialize the technology and the resulting Xerox Star was a dazzling system that made its debut at the National Computer Conference in 1981. It was an impressive embodiment of all the PARC technologies, but finally reached the market at a cost of more than $16 000 in a year when IBM introduced its low-cost personal computer with an open architecture that invited third-party development. The model was sound, but the technologies would be exploited by others.
ERAS OF BUSINESS COMPUTING
4.5
55
LANs
4.5. 1 University of Hawaii and ALOHANET N o r m Abramson was one of the principal designers of an innovative network at the University of Hawaii [82]. In 1969 ARPA funded the A L O H A N E T , which used small radio transmitters to exchange data between computers. Two fundamental ideas were developed during the A L O H A N E T project. The first was that wireless networks were possible and remain of growing interest today. The second fundamental insight was that the network could use the identical transmission frequency for all the nodes. Rather than construct point-to-point or particular links between each pair of computers, a shared radio frequency would be used. However, a shared frequency approach meant that collisions would occur occasionally and that some means of recovery would be necessary. If a collision did occur, the message would be undecipherable and no acknowledgement would be received. So, a node would simply re-transmit after a random interval, hoping to avoid any further collisions. This shared media approach would form the basis of a new hardwired L A N at Xerox PARC [75].
4.5.2
Xerox PARC and Ethernet
Robert Metcalfe, a PARC researcher specializing in computer networking, got involved in the effort to build an effective way of connecting computers. He had been part of the A R P A N E T team at MIT and had become a network facilitator, an experienced member of the early A R P A N E T sites who assisted new sites. After being introduced to the A L O H A N E T project and incorporating an analysis in his thesis, Metcalfe drew upon several ideas to develop a short-distance network or LAN at PARC. The fundamental insight was to use a shared media approach, replacing A L O H A N E T radio transmitters with network interfaces that would allow the machines to broadcast on a shared piece of cable [81]. A shared cable meant the network was inexpensive to set up and no complex and expensive routers were required. Cost was an important design constraint since the machines themselves were inexpensive and would hopefully become a fixture on everyone's desk. A shared media approach implies that there may be collisions when two or more nodes try to transmit simultaneously. The Ethernet protocol involves listening for an idle line before attempting to transmit a message. If a collision is detected during the transmission, the node waits a random interval and begins re-transmitting the message. Random wait times mean
56
ALAN R. HEVNER AND DONALD J. BERNDT
that nodes are unlikely to continue to demand the line at the same time [75]. The protocol resembles a human conversation when two people try to speak at once, except the events are measured in microseconds. Xerox developed the Ethernet standard along with DEC and Intel for commercial release in 1980, licensing the technology for a nominal fee. Ethernet is still the dominant LAN technology and has kept pace in the demand for faster speed with the introduction of Fast Ethernet (100 Mbps) and Gigabit Ethernet. The LAN provided the missing link that allowed massive numbers of personal computers to join large computers on the longdistance packet switching networks.
4.6 Internetworking Bob Khan and Vint Cerf collaborated on one of the most important protocol developments, the Transmission Control Protocol and the Internet Protocol (TCP/IP) suite. The protocol was intended to support internetworking, the interconnection of independent networks - - a network of networks, or what we now call an Internet. The most important constraint was that the underlying packet switched networks should remain unchanged, so their scheme was based on the idea of a gateway that would provide a bridge between dissimilar networks [80]. Their 1974 paper laid out a design for an internetworking protocol [83]. Xerox PARC had an influence here as well since Metcalfe was both a PARC employee and an A R P A N E T facilitator. Since PARC had large networks of Altos already running, an interconnection scheme was being pursued in a fully operational setting. Through the open A R P A N E T forum, ideas from PARC and other research groups contributed to the development of TCP/IP. The layered protocol approach allowed important capabilities to be situated at an appropriate level. Common functions were implemented lower in the protocol "stack," and more specialized functions were implemented at a higher level. Therefore, some of the more difficult debates revolved around where to locate specific capabilities. T E L N E T and FTP, along with SMTP for electronic mail and the newer HTTP protocol for the WWW all implement special functions that rely on the T C P / I P foundation. The IP provides the bedrock and, as a common service used by all higher level components, represents a level from which any unnecessary functionality must be removed. The IP protocol provides a "best-effort" delivery mechanism that does not offer any guarantees. The internetworking scheme called for gateways to be able to speak two network dialects that would provide a route from one network to the next. The individually addressed packets or datagrams are propagated from gateway to gateway
ERAS OF BUSINESS COMPUTING
57
using only the IP header or address information, leaving the interpretation of the contents to the destination. This simple hop-by-hop propagation is a lightweight approach without costly error handling techniques since delivery is not guaranteed. The TCP provides a reliable service for those applications willing to pay for the overhead. The TCP protocol is the responsibility of the source and destination, giving the protocol an end-to-end orientation. That is, any problems are resolved over the entire path by sender and receiver. This makes errors more costly, but the redundancy of the packet switched approach and vast improvements in reliability provided by fiber optics have made this a powerful approach. Direct access to the inexpensive "besteffort" service is provided through the User Datagram Protocol (UDP), allowing higher level applications to choose the appropriate level of service [84]. The official ARPANET transition to TCP/IP took place in 1983 and the growing network was split, with MILNET formed to handle classified military traffic [80].
4.6. 1
NSFNET
By the late 1970s, the ARPANET was ingrained in the research community, but remained restricted to research groups receiving Department of Defense (DoD) funding. The small close-knit computing research community that existed when ARPA first started funding research operated informally on many levels. However, in the new environment of rapidly growing computing research, ARPANET membership threatened to become a divisive factor. The National Science Foundation (NSF) stepped in to create a virtual network, CSNET, based on dial-up lines hosted by a BBN machine that provided inexpensive connectivity to the ARPANET and other networks [77]. This would suffice until a more permanent open-access network could be constructed. The NSF had already funded six supercomputing centers at universities across the country, so a network connecting these centers could form the backbone for regional networks. The NSFNET was patterned after the ARPANET with small, dedicated computers handling the networking as well as links to the supercomputers. The NSFNET was a spectacular success, calling for several backbone upgrades to accommodate the rapidly growing traffic. The MERIT consortium took over the administration of the network. In 1990, NSF began the process of transferring the network to the commercial sector, starting with Advanced Network Services (ANS), a non-profit corporation formed by MERIT, MCI, and IBM. The unprecedented growth of the Internet from these beginnings surpassed all expectations.
58
ALAN R. HEVNER AND DONALD J. BERNDT
4.7
LANs, WANs, and the Desktop
The theme of the communication section is that three important technologies converged in the mid 1970s, providing the complementary pieces that together allowed the computer to become a tool for communication not just computation. 9 The first technology was the packet switched networks outlined by Baran around 1960 and successfully implemented in the ARPANET. 9 The second important technology is local area networking that has allowed an innumerable number of desktops to be inexpensively connected to an increasingly networked world. 9 The third technological piece is the personal computer and the model of interactive use that was so successfully demonstrated by the Alto at Xerox PARC, and quickly made affordable by Apple and IBM. These three technologies combined to form the nascent computing environment that we see evolving today: one in which you sit at a personal computer interacting with windows, icons, mouse, and pull-down menus (i.e., the WIMP interface), sharing laser printers and servers across a LAN, yet capable of communication across the world via interconnected packet switched networks.
5.
Software
The third component of the business computing system is the application software. From its fairly primitive beginnings, software has come to dominate the cost of a typical business system. Yearly software sales run in the tens of billions of dollars, growing rapidly every year. This section presents a brief overview of the progress made in business software development during the business computing eras. The presentation is structured based on the software triangle shown in Fig. 4. Software is composed of three basic parts--the algorithmic procedure in a programming language; the data in a structured format; and the human-computer interaction interface. Bringing these parts together in a functional business system requires well-defined software development processes and development methods. 5.1
Algorithmic Programming
Solving a business problem first requires the creation of an algorithm that provides a step-by-step procedure for accepting inputs, processing data, and
59
ERAS OF BUSINESS COMPUTING Human-computer interaction
Algorithms
Data
FIG. 4. Components of business software.
producing outputs. Algorithms are typically represented in natural language or some form of structured format such as pseudocode or flowcharts. The objective of business programming is to code this algorithm in a form such that an important business problem can be solved via the use of a computer system. The history of computer programming languages has been documented in several excellent sources [85, 86].
5. 1.1 Early Business Programming Languages Programming in the Era of Calculation usually involved wiring plugboards, setting toggle switches on the side of the computer, or punching holes in paper tape. The wiring, switch settings, and holes represented instructions that were interpreted and executed by the computer. Each time a program was run the boards had to be rewired, switches had to be reset, or the paper tape rerun. The stored-program computer changed this onerous task by storing the program in internal memory and executing it on command. The Era of Automation brought the major advances of program compilers and assemblers. Grace Murray Hopper worked with Howard Aiken on programming the Harvard Mark I via paper tape. She applied this knowledge to the development of the first program compiler, the A-0, for the UNIVAC. The role of a compiler is to translate a high-level language in which humans can write algorithms into a computer's internal set of instructions. The assembler then translates the computer instructions into binary machine code for placement in the computer's memory. The research and development of compilers and assemblers led rapidly to the creation of the first computer programming languages.
60
ALAN R. HEVNER AND DONALD J. BERNDT
IBM developed FORTRAN (FORmula TRANslation) for the 704 computer in early 1957, contributing to the popularity of the product line. John Backus and his team defined the language to support engineering applications that required fast execution of mathematical algorithms [87]. The handling of large data files was a secondary consideration. FORTRAN has gone through many generations and remains popular today for engineering applications. Business computing had different requirements. Efficient data handling was critical and business programmers had the need for a more userfriendly, English-like language. In 1959, a team of Department of Defense developers, including Grace Murray Hopper, defined a business-oriented programming language COBOL (COmmon Business Oriented Language). COBOL is a highly structured, verbose language with well-defined file handling facilities. The English-like syntax makes programs more readable and self-documenting. It was also the first language to be standardized so its programs could run on different hardware platforms. COBOL has survived as the primary business programming language through all the eras of business computing. More legacy business computing systems are run via COBOL programs than any other language programs. Even today, COBOL advocates extol its advantages over more recent languages. COBOL provides sophisticated features for heterogeneous data structures, decimal arithmetic, powerful report generators, and specialized file and database manipulation [88]. Its most important advantage may be its impressive portability across nearly all hardware and software platforms. A new COBOL 2000 standard is being prepared with features for objectorientation, component-based development, web programming, and other state-of-the-art language features. It appears that COBOL will be around for a long while yet. IBM introduced RPG (Report Program Generator) as a report definition language in the early 1960s. With RPG the programmer defined a business form with data fields. RPG then produced reports by executing the forms based on underlying data files in the computer system.
5. 1.2 Structured Programming Languages As business programming moved into the Era of Innovation and Integration (circa 1965), a crisis in software development was beginning to be noticed. Large application programs (e.g., 50 000-100 000 lines of code) were being developed. These programs were very difficult to read, debug, and maintain. Software routinely failed and repairs were difficult and timeconsuming. The worldwide nature of the software problem was reflected in the NATO software engineering conferences held in 1968 (Garmisch, Germany)
ERAS OF BUSINESS COMPUTING
61
and 1969 (Rome) [89, 90]. The term "software engineering" was coined to generate discussion as to whether the development of software was truly an engineering discipline. The issues of software development and how to solve the software crisis debated at these early conferences remain relevant. Edsger Dijkstra's influential 1968 paper, "Go-to statement considered harmful," [91] addressed a major problem in existing programming languages. Flow of logical control through a program was often haphazard, leading to "spaghetti code" programs. Software researchers, such as Dijkstra and Harlan Mills, proposed structured programming as the answer to out-of-control program flow [92]. The use of only three simple structures--sequence, selection, and iteration--can express the control flow of any algorithm [93]. This understanding led to the development of new structured programming languages. The languages Pascal and ALGOL-68 initiated some of the principal structured programming concepts. However, they had little commercial impact. Meanwhile, new versions of FORTRAN-IV and COBOL integrated new structured features. IBM attempted to combine the best features of these two languages in PL/1 (Programming Language/One). Although some business systems were written in PL/1, the language never really caught on in the business world. The effectiveness of structured programming was clearly demonstrated on IBM's New York Times project, delivered in 1971. Structured programming techniques were used to build an on-line storage and retrieval system for the newspaper's archives. The completed system contained around 85 000 lines of code and was of a complexity well beyond previous IBM projects. Structured programming techniques gave the program team enhanced control of the system development process and produced a highly reliable system that crashed only once during the first year and reported only 25 errors during that year [94].
5. 1.3 Recent Programming Languages In the early 1970s, Ken Thompson and Dennis Ritchie developed a new systems programming language called C, using the language to implement UNIX. By allowing access to low level machine operations, this language defied many of the tenets of structured programming. Nevertheless, C has become a very popular programming language, particularly within the last two eras of business programming as personal computers have dominated the desktops. The language C + + evolved from C for the programming of object-oriented business applications [95]. Visual programming languages, such as Visual Basic (VB), incorporate facilities to develop Graphic User Interfaces (GUIs) for business applications.
62
ALAN R. HEVNER AND DONALD J. BERNDT
Such languages are particularly effective in the development of client-server distributed applications where the end-user at the client site needs an efficient, friendly interface. The advent of the Internet during the 1990s has provided the impetus for efficient, platform-independent programs that can be distributed rapidly to Internet sites and executed. The Java programming language was developed at Sun Microsystems to fit this need [96]. A consortium of industry proponents is working toward the standardization of Java for the programming of business applications on the Internet. 5.2
Data: File S y s t e m s and Database S y s t e m s
The management of data in systems predates recorded history. The first known writing was done on Sumerian stone tablets and consisted of a collection of data on royal assets and taxes. Writing on papyrus and eventually on paper was the predominant manner of manual data management up to the beginning of the 20th century. First mechanical and then electronic machinery rapidly changed the ways in which businesses managed data.
5.2. 1 Punched-card Data Management Although automated looms and player pianos used punched cards to hold information, the first true automated data manager was the punched-card system designed by Herman Hollerith to produce the 1890 US census. Automated equipment for handling and storing punched cards was the primary means of managing business data until the Era of Automation. An entire data management industry, whose leader was IBM, grew up around punched cards.
5.2.2 Computerized File Management The use of the UNIVAC computer system for the 1950 US census heralded the Era of Automation in business computing. To replace punched cards, magnetic drums and tapes were developed to store data records. Without the constraints of an 80-column card format, new, innovative data structures were devised to organize information for fast retrieval and processing. Common business applications for general-ledger, payroll, banking, inventory, accounts receivable, shipping invoices, contact management, human resources, etc. were developed in COBOL. All of these programs were centered on the handling of large files of data records. The prevailing system architecture during this era was that of batch-oriented
ERAS OF BUSINESS COMPUTING
63
processing of individual transactions. Files of transactions were run against a master file of data records, usually once a day or once a week. The problems with this architecture were the inherent delays of finding and correcting errors in the transactions and the lack of up-to-date information in the master file at any point in time [97].
5.2.3
On-Line Data Processing
Direct access storage devices, such as magnetic disks, and improved terminal connection devices opened the way for more effective business processes based on on-line processing. Users of the business system could conduct a complete transaction, reading and writing data records, from an on-line terminal connected directly to the mainframe computer. The computer was able to handle many terminals simultaneously via multiprocessing operating system controls. On-line transaction processing dominated the Era of Innovation and Integration from 1965 to the mid1970s. Direct access to data in magnetic storage led to improved structures for rapidly locating a desired data record in a large data file while still allowing efficient sequential processing of all records. Hierarchical data models presented data in hierarchies of one-to-many relationships. For example, a Department record is related to many Employee records and an Employee record is related to many Project records. Sophisticated indexing techniques provided efficient access to individual data records in the file structure. Popular commercial file systems, such as IBM's Indexed Sequential Access Mechanism (ISAM) and Virtual Sequential Access Mechanism (VSAM), provided very effective support for complex business applications based on large data sets. Hierarchical data models lacked a desired flexibility for querying data in different ways. In the above example, the hierarchy as designed would not efficiently support the production of a report listing all employees working on a given project. A more general method of modeling data was needed. An industrial consortium formed the Data Base Task Group (DBTG) to develop a standard data model. Led by Charles Bachman, who had performed research and development of data models at General Electric, the group proposed a network data model. The DBTG network model was based on the insightful concepts of data independence and three levels of data schemas: 9External Subschema: Each business application had its own subset view of the complete database schema. The application subschema was optimized for efficient processing.
64
ALAN R. HEVNER AND DONALD J. BERNDT
o Conceptuat Schema: The global database schema represented the logical design of all data entities and the relationships among the entities. 9Physical Schema: This schema described the mapping of the conceptual schema onto the physical storage devices. File organizations and indexes were constructed to support application processing requirements.
Data independence between the schema levels allowed a developer to work at a higher level while remaining independent of the details at lower levels. This was a major intellectual advance that allowed many different business applications, with different sub-schemas, to run on a single common database platform. Business architectures centered on large hierarchical and network databases were prevalent throughout the 1970s and well into the 1980s. Many such systems still perform effectively today.
5.2.4
Relational Databases
E.F. Codd, working at IBM Research Laboratory, proposed a simpler way of viewing data based on relational mathematics [98]. Two-dimensional relations are used to model both data entities and the relationships among entities based upon the matching of common attribute values. The mathematical underpinnings of the relational data model provided formal methods of relational calculus and relational algebra for the manipulation and querying of the relations. A standard data definition and query language, Structured Query Language (SQL), was developed from these foundations. Commercialization of the relational model was a painstaking process. Issues of performance and scalability offset the advantages of easier conceptual modeling and standard SQL programming. Businesses were reluctant to abandon their mission-critical mainframe database systems for the new relational technology. Advances in query optimization led to relational systems that could meet reasonable performance goals. An impetus to change came with the Era of Reengineering and Alignment. The relational model fit nicely with new client-server architectures. The migration of processing power to distributed client sites called for more user-friendly GUIs and end-user query capabilities. At the same time, more powerful processors for the servers boosted performance for relational processing. Oracle and IBM are the leaders in providing the commercial relational database systems that are at the heart of many of today's most interesting business applications.
ERAS OF BUSINESS COMPUTING
5.2.5
65
Future Directions in Data Management
Effective management of data has always been and will remain the center of business computing systems. The digital revolution has drastically expanded our definition and understanding of data. Multimedia data includes audio, pictures, video, documents, touch (e.g., virtual reality), and maybe even smells. Business applications will need to find effective ways of managing and using multimedia data. Object-oriented methods of systems development attempt to break the boundary between algorithmic procedures and data [99]. Objects are real world entities that encapsulate internal data states and provide procedures that interact with the environment and modify internal states. Two main themes characterize the use of object technology in database management systems: object-relational technology integrates the relational model and support for objects, whereas object-oriented systems take a more purist approach. Commercial object-oriented database systems have had limited success but have demonstrated promise in several important business applications [100]. A major challenge for the future of data management will be how to manage the huge amounts of information flowing over the WWW. It is estimated that a majority of business (e.g., marketing, sales, distribution, and service) will be conducted over the Internet in the near future. New structures for web databases must support real-time data capture, on-going analyses of data trends and anomalies (e.g., data mining), multimedia data, and high levels of security. In addition, web-enabled business will require very large databases, huge numbers of simultaneous users, and new ways to manage transactions. 5.3
Human-Computer
Interaction (HCI)
The effectiveness of any business computer system is determined by the quality of its interfaces with the external world. This area of research and development has been termed Human-Computer Interaction (HCI).
5.3. 1 Early Computer Interfaces The first computer interfaces were toggle switches, blinking lights, and primitive cathode-ray tubes. Quickly the need for more effective input/output devices brought about the use of paper tape drives and Teletype printers. Up to 1950 however, interaction with the computer was the domain of computer specialists who were trained to handle these arcane and unwieldy interfaces.
66
ALAN R. HEVNER AND DONALD J. BERNDT
The use of computers in business systems required more usable, standard HCIs. During the Era of Automation, the standard input medium was the Hollerith card. Both the program and data were keypunched on cards in standardized formats. The cards were organized into card decks, batched with other card decks, and read into the computer memory for execution. Output was printed on oversized, fan-fold computer paper. The IBM 1401 computer system, introduced in 1959, was a major business success primarily because of its high-speed (600 lines per minute) 1403 chain printer. Businesses were required to hire computer operations staff to maintain the computer systems and to control access to the computer interfaces. End-user computing was rare during this era.
5.3.2
Text-Based Command Interfaces
As business uses of the computer grew during the 1970s, the demand from end-users for more effective, direct interaction with the business computer systems grew correspondingly. Moving from batch computer architectures to on-line distributed architectures necessitated new terminal-based interfaces for the application users. Computer terminals were designed to combine a typewriter input interface with a cathode-ray tube output interface. Terminals were connected to the mainframe computer via direct communication lines. The design of the computer terminal was amazingly successful and remains with us today as the primary HCI device. HCI interfaces for on-line applications were either based on scrolling lines of text or on predefined bit-mapped forms with fields for text or data entry. Standard business applications began to proliferate in the environment of on-line computing. For example:
9 Text Editing and Word Processing: The creation, storage, and manipulation of textual documents rapidly became a dominant use of business computers. Early text editors were developed at Stanford, MIT, and Xerox PARC. Commercial WYSIWYG word processing packages came along in the early 1980s with LisaWrite, a predecessor to MacWrite, and WordStar. 9Spreadsheets: Accounting applications are cornerstone business activities. Commercial accounting packages have been available for computers since the 1950s. The spreadsheet package VisiCalc became a breakthrough product for business computing when it was introduced in 1979. Lotus 1-2-3 and Microsoft Excel followed as successful spreadsheet packages. 9Computer-Aided Design: The use of computers for computer-aided design (CAD) and computer-aided manufacturing (CAM) began
ERAS OF BUSINESS COMPUTING
67
during the 1960s and continues today with sophisticated application packages. 9Presentation and Graphics: Research and development on drawing programs began with the Sketchpad system of Ivan Sutherland in 1963. Computer graphics and paint programs have been integrated into business applications via presentation packages, such as Microsoft's PowerPoint. Text-based command languages were the principal forms of HCI during the 1970s and 1980s for the majority of operating systems, such as UNIX, IBM's MVS and CICS, and DEC's VAX VMS. The users of these systems required a thorough knowledge of many system commands and formats. This type of text-based command language carried over to the first operating systems for personal computers. CPM and MS-DOS constrained users to a small set of pre-defined commands that frustrated end-users and limited widespread use of personal computers.
5.3.3
The WIMP Interface
Many years of research and development on computer GUIs have led to today's WIMP HCI standards. Seminal research and development by J.C.R. Licklider at ARPA, Douglas Englebart at Stanford, and the renowned group at Xerox PARC led to the many innovative ideas found in the WIMP interface [101]. The first commercial computer systems popularizing WIMP features were the Xerox Star, the Apple Lisa, and the Apple Macintosh. The X Window system and the Microsoft Windows versions made the WIMP interface a standard for current business computer systems. More than any other technology, the WIMP interface and its ease of use brought the personal computer into the home and made computing accessible to everyone. Advantages to businesses included an increase in computer-literate employees, standard application interfaces across the organization, decreased training time for new applications, and a more productive workforce.
5.3.4
Web Browser Interfaces and Future Directions
As with all computer technologies the Internet has brought many changes and new challenges to HCI. The WWW is based on the concept of hypertext whereby documents are linked to related documents in efficient ways. Documents on the Internet use a standard coding scheme (HTML and URLs) to identify the locations of the linked documents. Specialized web
68
ALAN R. HEVNER AND DONALD J. BERNDT
browsers provide the interfaces for viewing documents on the web. Mosaic from the University of Illinois was the first popular web browser. Currently, Netscape and Microsoft provide the most widely used web browsers. Improvements, such as XML, the successor to HTML, will support new browsing features and capabilities for the future of the WWW. There are numerous future directions in the field of HCI. For example
[lO1]: 9Gesture Recognition: The recognition of human gestures began with light pens and touch sensitive screens. The recording and recognition of handwriting is a subject of on-going research. 9Three-dimensional Graphics: Research and development on 3D interfaces has been an active area, particularly in C A D - C A M systems. 3D visualization of the human body has the potential to revolutionize surgery and healthcare. 9 Virtual Reality: Scientific and business uses of virtual reality are just now being explored. Head-mounted displays and data gloves, funded by NASA research, will become commercially viable in the near future for marketing demonstrations and virtual design walkthroughs. 9 Voice Recognition and Speech: Audio interfaces to computer systems have been available for the past decade. However, the limited vocabulary and requirements for specific voice pattern recognition remain problems to overcome before widespread use.
5.4
Software Development Processes and Methods
The three software technologies of algorithmic programming, data, and HCI are brought together in the design and implementation of a business application via software development processes and methods. A software development process is a pattern of activities, practices, and transformations that support managers and engineers in the use of technology to develop and maintain software systems. A software development method is a set of principles, models, and techniques for effectively creating software artifacts at different stages of development (e.g., requirements, design, and implementation). Thus, the process dictates the order of development phases and the transition criteria for transitioning from one phase to the next, while the method defines what is to be done in each phase and how the artifacts of the phase are represented. The history of business computing has seen important advances in both software processes and software methods. We briefly track the evolution of these advances in this section.
ERAS OF BUSINESS COMPUTING
69
5.4. 1 Software Development Processes Throughout the Era of Automation very little attention was paid to organizing the development of software systems into stages. Programmers were given a problem to solve and were expected to program the solution for computer execution. The process was essentially "code and fix." As problems became more complex and the software grew in size this approach was no longer feasible. The basic "waterfall process" was defined around 1970 [102]. A welldefined set of successive development stages (e.g., requirements analysis, detailed design, coding, testing, implementation, and operations) provided enhanced management control of the software development project. Each stage had strict entrance and exit criteria. Although it had several conceptual weaknesses, such as limited feedback loops and overly demanding documentation requirements, the waterfall process model served the industry well for over 20 years into the 1990s. The principal Department of Defense process standard for the development of software systems during this period, DOD-STD-2167A, was based on the waterfall approach. Innovative ideas for modeling software development processes include the spiral model [ 103] and incremental development [ 104]. The spiral model shows the development project as a series of spiraling activity loops. Each loop contains steps of objective setting, risk management, development/verification, and planning. Incremental development emphasizes the importance of building the software system in well-defined and well-planned increments. Each increment is implemented and certified correct before the next increment is started. The system is thus grown in increments under intellectual and management control. A standard, flexible software development process is essential for management control of development projects. Recent efforts to evaluate the quality of software development organizations have focused on the software development process. The Software Engineering Institute (SEI) has proposed the Capability Maturity Model (CMM) to assess the maturity of an organization based on how well key process areas are performed [105]. The CMM rates five levels of process maturity: 9 initial: ad hoc process 9 repeatable: stable process with a repeatable level of control 9 defined: effective process with a foundation for major and continuing progress 9 managed: mature process containing substantial quality improvements 9 optimized: optimized process customized for each development project.
70
ALAN R. HEVNER AND DONALD J. BERNDT
The principal goal of the CMM is for organizations to understand their current process maturity and to work toward continuous process improvement. The international ISO-9000 standards contain similar provisions to evaluate the effectiveness of the process in software development organizations.
5.4.2 Early Development Methods Early methods of program design were essentially ad-hoc sketches of logic flow leading to the primary task of writing machine code. These design sketches evolved into flowcharting methods for designing program logic. Basic techniques also evolved for the development activities of requirements analysis, software design, and program testing. The creation of software was initially considered more of a creative art than a science. As business system requirements became more complex into the 1960s, development organizations quickly lost the ability to manage software development in a predictable way. Defined development processes and methods brought some controls to software construction. Structured methods for the analysis and design of software systems appeared during the late 1960s and early 1970s. Two primary approaches were defined-procedure-oriented methods and data-oriented methods. The development of procedure-oriented methods was strongly influenced by the sequential flow of computation supported by the dominant programming languages, COBOL and FORTRAN. The focus of software development under this paradigm is to identify the principal functions (i.e., procedures) of the business system and the data flows among these functions. Persistent data stores are identified along with the data flows among functions and data stores. The system functions are hierarchically decomposed into more detailed descriptions of sub-functions and data flows. After sufficient description and analysis, the resulting functions and data stores are designed and implemented as software modules with inputoutput interfaces. Primary examples of procedure-oriented system development methods include structured analysis and structured design methods [106, 107] and the Jackson development methods [108]. Data-oriented system development places the focus on the required data. The data-centric paradigm is based on the importance of data files and databases in large business applications. System data models are developed and analyzed. System procedures are designed to support the data processing needs of the application. The design and implementation of the application software is constructed around the database and file systems. Primary data-oriented methods included the Warnier-Orr methods [109] and information engineering [110].
ERAS OF BUSINESS COMPUTING
71
5. 4.3 Object-Oriented Methods In the early 1980s, object-oriented (OO) methods of software development were proposed for building complex software systems. Object-orientation is a fundamentally different view of a system as a set of perceptible objects and the relationships among the objects. Each object in the application domain has a state, a set of behaviors, and an identiO'. A business enterprise can be viewed as a set of persistent objects. Business applications are developed by designing the relationships and interactions among the objects. Advocates point out several significant advantages of object-oriented system development, including increased control of enterprise data, support for reuse, and enhanced adaptability to system change. Risks of object-oriented development include the potential for degraded system performance and the startup costs of training and gaining object-oriented experience. Early object-oriented languages included Simula-67 and Smalltalk. Today, C + + and Java are the most popular object-oriented languages. A plethora of object-oriented development methods in the 1980s have converged into the Unified Modeling Language (UML) standards [111]. A significant percentage of new business applications are being developed as object-oriented software systems.
5.4.4 Formal Development Methods The requirement for highly reliable, safety-critical systems in business, industry and the public sector has increased interest in formal software development methods. Formal methods are based on rigorous, mathematics-based theories of system behavior [112, 113]. Formal methods, such as the Cleanroom methods [114], support greater levels of correctness verification on all artifacts in the development process: requirements, design, implementation, and testing. The use of formal methods requires the use of mathematical representations and analysis techniques entailing significant discipline and training in the development team. Although anecdotal evidence of the positive effects (e.g., improved quality and increased productivity) of formal methods is often reported [115, 116] more careful study is needed [117, 118]. However, we are seeing a move to require the use of formal methods on all safety-critical software systems by several countries and standards bodies [119].
5.4.5 Component-Based Development (CBD) Methods The latest emerging trend for the development of business systems is component-based development (CBD). CBD extends the ideas of software
TABLE II SOFTWARE TECHNOLOGIES IN THE BUSINESS COMPUTING ERAS
Era of Decentralization 1975-84
Era of Reengineering and Alignment 1985-94
Era of the Internet and Ubiquitous Computing > 1995
Era of Calculation < 1950
Era of Automation 1950-64
Era of Integration and Innovation 1965-74
Switches Paper tapes Punched cards
FORTRAN COBOL RPG
Structured programming Programming teams
Visual programming
Object-Oriented programming
Internet programming Java XML
Punched cards Paper documents
Magnetic tapes Data file organization
Magnetic drums and disks Hierarchical databases Network databases (DBTG)
Relational databases
Optical disk Object-oriented databases
Multimedia data Data warehousing Data mining
Human-mechanical interaction
Punched-card input Printer output
On-line terminals Text command language
WlMP interfaces
Extended WIMP interfaces
Web browsers
None
Code and fix
Waterfall model
Prototyping Spiral model Incremental
CMM and ISO 9000
Optimized, Adaptive Process Models
)
ERAS OF BUSINESSCOMPUTING
73
reuse into a full-scale development process whereby complete business applications are delivered based upon the interaction and integration of software components. Wojtek Kozaczynski gives a business-oriented definition of a component in [120]: A business component represents the software implementation of an autonomous business concept or business process. It consists of the software artifacts necessary to express, implement, and deploy the concept as a reusable element of a larger business system. The component is essentially a black box with well-defined interfaces to the external world. Thus, each component provides a service to the business system. New products and standards for middleware provide the "glue" for building systems from individual components. The technologies of DCOM, CORBA, and Enterprise JavaBeans are a start for enabling CBD processes and methods. Object-oriented concepts, such as encapsulation and class libraries, and emphasis on system architectures, such as n-tier client-server, support the realization of CBD in business environments. As the underlying technologies mature and the software development industry accepts and adopts enabling standards, component-based development will become a dominant paradigm for building business software systems [121].
5.5
Software S u m m a r y
Table II summarizes the mapping of the software technologies to the eras of business computing.
6.
Business System Architectures
The three technology components of a business computer system w computing platform, communications, and software--are integrated via system architecture into a functional, effective business system. We explore the evolution of business system architectures by bringing together two separate streams of research and practice--computer system and software architectures and the management of information systems (MIS). Strong synergies can be seen in the interplay of computer technology architectures and the MIS models of business computing over the eras of business computing during the past 50 years. A classic paper by Zachman [122] presented an information systems architecture framework made up of the elements of data, process, and networking. The Zachman IS architecture combines representations of these elements into an architectural blueprint for a business application. The
74
ALAN R. HEVNER AND DONALD J. BERNDT
framework of this chapter differs by including the computational platform (hardware and OS) as a fundamental element and combining algorithmic procedure (i.e., process) and data into the software element. However, our framework objectives are similar to Zachman's in providing a basis for understanding how information technology elements are brought together into effective business systems via architectural descriptions. The importance of software architectures for the development of complex systems has been emphasized in the software engineering literature [123, 124]. Architectural styles are identified based on their organization of software components and connectors for the transmission of data and control among components. We structure our presentation of business system architectures by discussing major computer architecture milestones. We find a close correspondence among these milestones, the eras of business computing in which they occurred, and the MIS focus prevalent during that era. Figure 5 summarizes the presentation. The business system architectures reflect the cumulative effect of the MIS loci over the years. Business systems have evolved and become more complex due to the requirements for meeting many, sometimes conflicting, business objectives. We will see how the architectural solutions attempt to satisfy these multiple objectives.
On-line, real-time architectures
Mainframe, data-flow architectures
Manual processes
Web-based Event-driven, architectures component-based Distributed, architectures client-server ,~ WWW architectures focus A l i g n m e n t focus C u s t o m e r focus P e r f o r m a n c e focus
Technology focus
'
Era of
1
Calculation 1950
Business process focus I I ' Era of Automation
Era of 1965 Integration and Innovation
FIG. 5.
Era of 1975 Decentralization
] Era of i Era of the 1985 Reengineering 1995 Internet and and Alignment Ubiquitous Computing
Business system architectures.
ERAS OF BUSINESS COMPUTING 6.1
75
M a n u a l Business Processes
In the centuries leading up to the invention of the computer, businesses focused their creative energies on the development of effective business processes for production, personnel management, accounting, marketing, sales, etc. Standard operating procedures (SOPs) and workflow processes were widely used throughout business history. The concept of a "General Systems Theory" guided the structure and application of these business processes. The following passage from Barnard [125] shows organizational systems thinking: A cooperative system is a complex of physical, biological, personal, and social components which are in a specific systematic relationship by reason of the cooperation of two or more persons for at least one definite end. Such a system is evidently a subordinate unit of larger systems from one point of view; and itself embraces subsidiary systems--physical, biological, etc.--from another point of view. One of the systems within a cooperative system, the one which is implicit in the phrase "cooperation of two or more persons" is called an "organization". Even before the advent of computers, intellectual leaders such as Herbert Simon and C. West Churchman were extending the ideas of systems thinking into business organizations [126]. Such systemic business processes were performed manually during the Era of Calculation up to 1950. However, the business focus of getting the critical business processes right before automation remains an underlying tenet of all successful organizations today and for the foreseeable future.
6.2
Mainframe Architectures
The automation of business processes with the original large mainframe computer systems occurred slowly at first. Management focus was on how computer technology could best be introduced into the organization. Gorry and Scott Morton [127] suggested a framework for the development of management information systems. Nolan [128] proposed a widely cited six-stage model of data processing (DP) growth within an organization. His six stages of growth with suggested technology benchmarks are: 9Stage 1--Initiation: 100% batch processing 9Stage 2--Contagion: 80% batch processing, 20% remote job entry 9Stage 3--Control: 70% batch processing, 25% database processing, 5 % time-sharing
76
ALAN R. HEVNER AND DONALD J. BERNDT
9Stage 4--Integration." 50% batch and remote job entry, 40% database/data communications processing, 10% minicomputer and microcomputer processing 9Stage 5--Data Administration." 20% batch and remote job entry, 60% database/data communications processing, 15% minicomputer and microcomputer processing, 5% personal computers 9Stage 6--Maturity." 10% batch and remote job entry, 60% database/ data communications processing, 25% minicomputer and microcomputer processing, 5% personal computers. Growth benchmarks are also provided for applications portfolio, planning and control, organizational structure, and user awareness. Although other researchers have pointed out weaknesses in Nolan's stage model (e.g., [129]), the technology benchmarks cited above clearly demonstrate a management focus on the evolution of business system architectures. The 1950s and early 1960s saw a vast majority of business application programs written in COBOL based on basic data flow architectures. During this Era of Automation, computer systems consisted primarily of the computational platform (e.g., mainframe and operating system) and early application software systems. In a data flow architecture, data in the form of variables, records, or files move from one computer system application to the next until the required business process is completed. The simplest form of a data flow architecture is known as a batch sequential architecture. Data is batched into large files and the application programs are batched for sequential runs on the data files. The classic Master File-Transaction File applications are based on batch sequential processing. The pipe and filter architecture is a more general model of data flow. Pipes carry data from one filter to the next in a network data flow. A filter accepts streams of data as input and produces streams of data as output. The filter performs some local transformation of an input into an output on a continuing basis. Each filter is independent of all other filters in the data flow architecture. The pipe and filter structure provided the underlying computational model for the UNIX operating system [68]. 6.3
On-Line, Real-Time Architectures
During the Era of Integration and Innovation from 1965 to 1974, businesses began to realize the competitive advantages of on-line, real-time processing. The evolving technologies of databases, data communications, and the computational platform (e.g., minicomputers and real-time operating systems) enabled sophisticated real-time business applications to
ERAS OF BUSINESS COMPUTING
77
be developed. Business computer systems moved from strictly operational, back office systems to initial forays into truly strategic information systems [130]. Examples of seminal real-time business applications included: 9 American Hospital Supply placing on-line order entry terminals in hospitals 9 Merrill Lynch introducing its Cash Management Account 9 American Airlines developing the Sabre computerized reservation system. Recognizing the overwhelming success of these ventures, businesses moved to new computer system architectures that would provide the performance to support on-line, real-time applications. On-line processing required important new advances in data communications (e.g., remote job entry, real-time data queries and updates), database repositories (e.g., hierarchical and network databases), and operating systems (e.g., multiprogramming, real-time interrupts, resource allocation). The critical need was to integrate these new technologies into a computer architecture with sufficient performance to meet rigorous response time and data capacity requirements. The principal business system architecture used to meet these requirements was a repository architecture. A central repository of business data in file and database formats represents the current state of the application system. Multiple sources of independent transactions (i.e., agents) perform operations on the repository. The interactions between the central repository and the external agents can vary in complexity, but in the early days of on-line applications they consisted mostly of simple queries or updates against the repository. The real-time operating system provides integrity and concurrency control as multiple transactions attempt to access the business data in real-time. The data-centric nature of most business applications has made the repository architecture with real-time requirements a staple of business application development. 6.4
Distributed, C l i e n t - S e r v e r Architectures
The data communication inflexion point occurring around 1975 ushered in the Era of Decentralization. For businesses, the ability now to decentralize the organization and move processing closer to the customer brought about major changes in thinking about business processes and the supporting business computer systems. The new customer focus was exemplified by the critical success factor (CSF) method for determining business requirements. Critical success factors are the limited number of
78
ALAN R. HEVNER AND DONALD J. BERNDT
areas in which results, if they are satisfactory, will ensure successful competitive performance for the organization [131]. Key executives from the business are interviewed and CSFs are identified from four prime sources: 9 9 9 9
structure of the particular industry competitive strategy, industry position, and geographical location environmental factors temporal factors.
The CSFs are collected, organized, and prioritized based on feedback from the executives. Business processes are then developed to support realization of the organization's critical success factors. The results of CSF studies invariably found that the decentralization of information and processing closer to the customer location is an effective business strategy. The technology components to support true distributed processing were available during this era to support these decentralized business strategies. Networks of communicating computers consisted of mainframes, minicomputers, and increasingly popular microcomputers. Distributed architectures became the norm for building new business computer systems [132, 133]. Distributed computing provided a number of important advantages for business systems. Partitioning the workload among several processors at different locations enhanced performance. System availability was increased due to redundancy of hardware, software, and data in the system. Response time to customer requests was improved since customer information was located closer to the customer site. The ability to integrate minicomputers and microcomputers into the distributed architecture provided significant price-performance advantages to the business system. The potential disadvantages of the distributed system were loss of centralized control of data and applications and performance costs of updating redundant data across the network. An important variant of the distributed architecture is the client-server architecture. A server process, typically installed in a larger computer, provides services to client processes, typically distributed on a network. A server can be independent of the number and type of its clients, while a client must know the identity of the server and the correct calling sequence to obtain service. Examples of services include database systems and specialized business applications. To aid in handling the complexity of the new distributed business systems, the architectural style of layering was applied. Layered architectures were proposed for managing data communication protocols on networks. The
ERAS OF BUSINESS COMPUTING
79
three most popular communication architectures are: 9 International Standards Organization (ISO) Open Systems Interconnection (OSI) Seven Layer Architecture [75] 9 IBM's Systems Network Architecture (SNA) [134] 9 TCP/IP Protocol Suite [84]. These layered architectures allowed distributed applications to be written at high levels of abstraction (i.e., higher layers) based on services provided by lower layers in the architecture.
6.5 Component-Based Architectures The Era of Reengineering and Alignment was predicated on the principles of total quality management (TQM) as discussed in Section 2. Longstanding business processes are reengineered to take greatest advantage of new information technologies and computer system architectures. An equally important management focus that occurred during this era was the alignment of business strategy with information technology (IT) strategy in the organization. A strategic alignment model proposed by Henderson and Venkatraman [20] posits four basic alignment perspectives:
1. Strategy Execution: The organization's business strategy is well defined and determines the organizational infrastructure and the IT infrastructure. This is the most common alignment perspective. 2. Technology Transformation: The business strategy is again well defined, but in this perspective it drives the organization's IT strategy. Thus, the strategies are in alignment before the IT infrastructure is implemented. 3. Competitive Potential: The organization's IT strategy is well defined based upon innovative IT usage to gain competitive advantage in the marketplace. The IT strategy drives the business strategy which in turn determines the organizational infrastructure. The strategies are aligned to take advantage of the IT strengths of the organization. 4. Service Level: The IT strategy is well defined and drives the implementation of the IT infrastructure. The organizational infrastructure is formed around the IT infrastructure. The business strategy does not directly impact the IT strategy. Although all four perspectives have distinct pros and cons, the alignment of the business strategy and the IT strategy before the development of the organizational and IT infrastructures in perspectives 2 and 3 provides a consistent vision to the organization's business objectives.
80
ALAN R. HEVNER AND DONALD J. BERNDT
The period from 1985 to 1994 was a period of continuous and rapid change for business from the proliferation of desktop workstations throughout the organization to the globalization of the marketplace [135]. The fundamental business strategy of "make and sell" was transformed into a strategy of "sense and respond" [136]. Two new computer system architectures were devised to meet these changing environmental demands. Event-driven architectures have become prevalent in business systems that must react to events that occur in the business environment [137]. When an important event occurs a signal is broadcast by the originating component. Other components in the system that have registered an interest in the event are notified and perform appropriate actions. This architecture clearly performs well in a "sense and respond" business environment. Note that announcers of events are unaware of which other components are notified and what actions are generated by the event. Thus, this architecture supports implicit invocation of activity in the system. This architecture provides great flexibility in that actions can be added, deleted, or changed easily for a given event. The important new development ideas of component-based development have led naturally to the design and implementation of component-based architectures. Business systems are composed of functional components glued together by middleware standards such as CORBA, DCOM, and Enterprise JavaBeans. In many cases the components are commercial off the shelf (COTS) products. Thus, organizations are able to build complex, highperformance systems by integrating COTS components via industry standard middleware protocols. This minimizes development risk while allowing the business organization to effectively align its IT strategy with its business strategy via judicious selection of best practice functional components. Enterprise Resource Planning (ERP) business systems from vendors like SAP, Baan, PeopleSoft, and Oracle utilize component-based architectures to allow clients to customize their business systems to their organization's requirements. 6.6
Web-Based Architectures
The influence of the WWW has required businesses to rethink their business and IT strategies to take greatest advantage of its revolutionary impact [138]. This Era of the Internet and Ubiquitous Computing will generate new web-based architectures for integrating the Internet into business functions of marketing, sales, distribution, and funds transfer. The rapid exchange of information via push and pull technologies to any point of the globe will eliminate most boundaries and constraints on international commerce. However, critical issues of security, privacy, cultural differences
ERAS OF BUSINESS COMPUTING
81
(e.g., language), intellectual property, and political sensitivities will take many years to be resolved.
7.
Conclusions and Future Directions
The past half-century has seen amazing progress in the use of information technology and computer systems in business. Computerization has truly revolutionized the business organization of today. This chapter has presented a structured overview of the evolution of business computing systems through six distinct eras: 9 9 9 9 9 9
Era Era Era Era Era Era
of of of of of of
Calculation Automation Integration and Innovation Decentralization Reengineering and Alignment the Internet and Ubiquitous Computing.
Advances in each of the major computing technologies--Computational Platform, Communications, Software, and System Architecture--have been surveyed and placed in the context of the computing eras. We close the chapter by presenting a summary of the major conclusions we draw from this survey of business computing. Within each conclusion we identify key future directions in business computing that we believe will have a profound impact.
7.1
Computers as Tools of Business
Businesses have been most successful in their use of computing technologies when they have recognized the appropriate roles of computers as tools of business. The underlying business strategies and the effective business processes that implement the strategies are the critical success factors. Computing systems provide levels of performance, capacity, and reach that are essential for competitive advantage in the business environment. Recent popular management theories of business process reengineering and business strategy-information systems strategy alignment have emphasized the close, mutually dependent relationships between business goals and computer systems support for these goals. The expansion and application of these theories will be important future directions for forward moving business organizations. Smart businesses realize that innovative technologies, such as the Internet, have the potential to enable new forms of business.
82
ALAN R. HEVNER AND DONALD J. BERNDT
Thus, new business processes must be defined that take greatest advantage of the capabilities of the new technologies. However, it is important to keep in mind that it is the horse (business strategy and processes) that pulls the cart (computer systems support).
7.2
Miniaturization of the Computational Platform
One of the most astonishing trends over the past 50 years is the consistent growth in computer chip performance as governed by Moore's law which predicts a doubling of chip capacity every 18 months. Just as important for business applications, as performance has skyrocketed the size of the computing platform has grown smaller and smaller. Miniaturization of the computing platform has essentially erased physical boundaries of computer proliferation. Embedded computing systems now reside in nearly every appliance we own--automobiles, washing machines, microwaves, and audio/video equipment. Current research on wearable computers is expanding the range of computerization to the human body [139]. Miniature information processing devices can be embedded in eyeglasses, wristwatches, rings, clothing, and even within the body itself. The future research direction of biocomputing holds intriguing possibilities for business applications. We still have a great deal to learn about how the brain processes information and makes decisions [140]. New computer architectures based on biological brain patterns have the potential to revolutionize the future of computing platforms. Connecting these miniature computer brains with the Internet as a world wide brain could result in quite interesting business application scenarios. 7.3
Communications
Inflexion Point
A major trend throughout the eras of business computing has been the increasing dominance of communications technology in business systems. We have noted the progression from on-line systems to distributed systems to pervasive networking (e.g., WANs and LANs) to the WWW. We observed that a communications inflexion point occurred around 1975 where computing systems changed from being defined by their hardware (i.e., computing platform) architecture to being defined by their communications architecture. Since this point in time businesses have increasingly relied upon their telecommunications infrastructure to support essential business applications. Even basic office applications such as email and FTP have dramatically enhanced business processes by speeding the communication paths among employees [141]. It is clear that electronic commerce will have a pervasive role in the future marketplace of products and services. It is
ERAS OF BUSINESS COMPUTING
83
projected that sales of business-to-business electronic commerce applications will grow to $93 billion in year 2000. Thus, we believe that the trend for communications technologies to increase in importance as a critical component in business computing systems will continue into the future.
7.4
Growth of Business Information and Knowledge
The amount of information in the world is growing at an exponential rate. In parallel, the means for accessing this information is expanding rapidly via the Internet and other digital media. Businesses are faced with numerous challenges and opportunities in deciding how to access, filter, and use the massive amount of information most effectively in critical business processes. The study of business information and rules for applying the information make up the emerging research area of knowledge management. The inherent knowledge and acquired intelligence of the business organization is being recognized as its most important asset. The history of business computing systems has seen many forms of knowledge management support systems.
9Decision support systems support the transformation of raw data into business information that is used for managerial decision making. Decision models are used to structure and transform data into information. 9Expert systems utilize expert business rules and inference engines to generate predictions, diagnoses, and expert decisions. 9 More recently, intelligent agents have been applied to the collection, filtering, and presentation of business information. An intelligent agent is an autonomous software component that can react to events in business environments and deliver information results to a decisionmaker. The requirements for storing and manipulating huge amounts of business information have led to the exciting fields of data warehousing and knowledge discovery (i.e., data mining). A data warehouse is distinctly different from an operational business database system. The data warehouse is a repository of information drawn from many operational databases to provide a more comprehensive view of the business enterprise. The information is time-stamped to allow trend analyses over periods of interest. Thus, information is not updated once it is in the data warehouse. Instead, new data values with a timestamp are added to the warehouse on top of previous data values. Sophisticated querying and reporting capabilities allow users efficient access to the business information.
84
ALAN R. HEVNER AND DONALD J. BERNDT
Data mining strategies are applied to the data warehouse in order to discover new and interesting relationships (i.e., knowledge) among the information entities. Business organizations are just now beginning to reap the benefits of data warehousing and data mining. This future direction provides great potential for businesses to capitalize on their business knowledge and to enhance their business strategies.
7.5 Component-Based Software Development The size and complexity of business application software have grown at an exponential rate over the past 50 years. Maintaining intellectual control over the development of such complex systems is difficult at best, and nearly impossible without disciplined development approaches. Underlying principles of system hierarchies, data encapsulation, referential transparency [104], and structured processes and methods support evolving best practices to control the development of complex business systems. An important future direction of software development is component-based software development as discussed in Section 5.4.5. The effective use of software architecture concepts provides a blueprint for the construction of large systems made up of smaller, pre-developed system components. The components are selected, connected, and verified to provide required business system functionality. A goal of component-based software development is to have an open marketplace for vendors to develop state-of-the-art components that will plug-and-play in open system architectures. Although it is still something of a utopian dream, several research and development activities portend this future direction of software development:
9Open System Standards:
Software development organizations are becoming more amenable to the need for open standards in order to support the integration of software components. Software engineering standards come from international (e.g., ISO) and national (e.g., ANSI) regulatory bodies [142], from industrial consortia (e.g., Open Management Group), and from organizations with dominant market presence (e.g., Microsoft). The principle of natural selection allows the best standards to survive and flourish in the marketplace. 9Open Source Software: The appeal of open source software, such as the LINUX operating system, lies in the continuity of support across a large user base. Since no one vendor "owns" the source code, business applications are shared and supported by the open source software community and culture. 9Commercial Off-The-Shelf Software (COTS): The majority of businesses cannot afford to staff and support an internal software
ERAS OF BUSINESS COMPUTING
85
development organization. An effective information systems strategy for such businesses is to evaluate, select, and purchase COTS application systems. A COTS strategy requires an enterprise architecture foundation for the integration of the various systems into an effective whole. Enterprise Resource Planning (ERP) Systems: The promise of ERP systems is that a single, integrated software system can provide total control of an organization's processes, personnel, and inventory. Moreover, the integration of business information across multiple functions will produce new business intelligence and will support new business applications. ERP vendors, such as SAP, Baan, Oracle, and PeopleSoft, provide a component-based strategy where clients can customize their business systems by selecting needed business functions for integration into the vendor's ERP architecture. The ERP market is estimated to grow to just under $30 billion in the year 2000.
7.6
MIS and Business System Architecture Synergies
An intriguing contribution of this survey is the relationship found between the management of information systems literature and the prevalent business system architectures during the eras. As shown in Fig. 5 and described in Section 6, a new management focus in each era created expanded systems requirements that were met by innovative architectural designs. The current management foci on electronic commerce and ubiquitous computing will lead to future business computing architectures that will fully integrate organizational intranet and WWW components. REFERENCES [1] Swade, D. (1991). Charles Babbage and His Calculating Engines, Science Museum, London. [2] Campbell-Kelly, M. and Aspray, W. (1996). Computer." A Histoo' of the Information Machine, Basic Books, New York. [3] Aspray, W. (ed.) (1990). Computing Before Computers, Iowa State University Press, Ames, IA. [4] Cortada, J. (1993). Before the Computer." IBM, NCR, Burroughs, and Remington Rand and the Industry The)' Created, 1865-1956, Princeton University Press, Princeton, NJ. [5] Williams, F. (1985). A Histoo'of Computing Technology, Prentice-Hall, Englewood Cliffs, NJ. [6] Austrian, G. (1982). Herman Hollerith." Forgotten Giant of Information Processing, Columbia University Press, New York. [7] Rodgers, W. (1969). THINK: A Biography of the Watsons and IBM, Stein and Day, New York.
86
ALAN R. HEVNER AND DONALD J. BERNDT
[8] Gotlieb, C. (1961). General-purpose programming for business applications. Advances in Computers I, Academic Press, Boston, MA. [9] Ceruzzi, P. (1998). A Histoo'of Modern Computing, MIT Press, Cambridge, MA. [10] Diebold, J. (1952). Automation, Van Nostrand, New York. [11] Gifford, D. and Spector, A. (1987). Case study: IBM's System/360-370 architecture. Communications of the ACM 30(4). [12] Grosch, H. (1991). Computer. Bit SlicesjJ'om a Lfle, Third Millennium Books, Novato, CA. [13] Ein-Dor, P. (1985). Grosch's law re-revisited. Communications of the ACM 28(2). [14] Hiltzik, M. (1999). Dealers of Lighming." Xerox PARC and the Dawn of the Computer Age, HarperCollins, New York. [15] Hammer, M. and Champy, J. (1993). Reengineering the Corporation." A Manifesto for Business Revolution, HarperCollins, New York. [16] Crosby, P. (1979). QualiO' is Free. The Art o1'"Making QualiO' Certain, McGraw-Hill, New York. [17] Gabor, A. (1990). The Man Who Discovered Quality. How W. Edwards Deming Brought the Quality Revolution to America, Penguin, London. [18] Davenport, T. (1992). Process Innovation. Reengineering Work Through Information Technology, Harvard Business School Press, Cambridge, MA. [19] Hammer, M. (1990). Reengineering work: don't automate, obliterate. Harvard Business Review, July-August. [20] Henderson, J. and Venkatraman, N. (1993). Strategic alignment: leveraging information technology for transforming organizations. IBM Systems Journal 32(1). [21] Luftman, J. (ed.) (1996). Competing in the ln/'ormation Age. Strategic Alignment in Practice, Oxford University Press, Oxford. [22] Crocker, D. (1997). An unaffiliated view of internet commerce, in Readings in Electronic Commerce, R. Kalakota and A. Whinston (eds.), Addison-Wesley, Reading, MA. [23] Berners-Lee, T. (1997). World-wide computer, Communications of the ACM 40(2). [24] Bird, P. (1994). LEO. The First Business Computer, Hasler Publishing, Berkshire, UK. [25] Caminer, D. (ed.) (1997). Leo. The hwredible Story o[ the World~ First Business Computer, McGraw-Hill, New York. [26] Cortada, J. (1997). Economic preconditions that made possible application of commercial computing in the United States, IEEE Annals o[ the Histo~3' o/ Computing 19(3). [27] Cortada, J. (1996). Commercial applications of the digital computer in American corporations 1945-1995. IEEE Annals o[the Histol 3' of Computing 18(2). [28] Tanenbaum, A. (1992). Modern Operating Systems, Prentice Hall, Englewood Cliffs, NJ. [29] Eckert, W. (1940). Punched Card Methods in Scient([ic Computation, Thomas J. Watson Astronomical Computing Bureau, Columbia University, New York. [30] Comrie, L. (1946). Babbage's dream comes true. Nature 158, October. [31] Eckert, J. P. (1976). Thoughts on the history of computing. IEEE Computer, December. [32] McCartney, S. (1999). ENIAC: The Triumphs and Tragedies of the World's First Computer, Walker and Co., New York. [33] Winegrad, D. (1996). Celebrating the birth of modern computing: the fiftieth anniversary of a discovery at the Moore School of Engineering of the University of Pennsylvania. IEEE Annals of the History o[ Comput#lg 18(1) [an introductory article from a special issue on the ENIAC]. [34] Eckert, J. P. (1998). A survey of digital computer memory systems. IEEE Annals of the History of Computing 20(4). Originally published in the Proceedings o[ the IRE, October 1953. [35] Feigenbaum, E. and Feldman, J. (eds.) (1963). Computers and Thought, McGraw-Hill, New York.
ERAS OF BUSINESS COMPUTING
87
[36] Goldstine, H. (1972). The Computer.fi'om Pascal to yon Neumann, Princeton University Press, Princeton, NJ. [37] Aspray, W. (1990). John yon Neumann and the Origins ol'Modern Computing, MIT Press, Cambridge, MA. [38] von Neumann, J. (1993). A first draft of a report on the EDVAC. IEEE Annals of the Histoo, of Computing 15(4). An edited version of the original 1945 report. [39] Rojas, R. (1997). Konrad Zuse's legacy: the architecture of the Z1 and Z3. IEEE Annals of the History of Computing 19(2). [40] Pugh, E. (1995). Building IBM. Shaping and Industry and Its Technology, MIT Press, Cambridge, MA. [41] Pugh, E., Johnson, L. and Palmer, J. (1991). IBM's 360 and Early 370 Systems, MIT Press, Cambridge, MA. [42] Amdahl, G. (1964). The structure of the SYSTEM/360. IBM Systems Journal 3(2). [43] Flynn, M. (1998). Computer engineering 30 years after the IBM Model 91. IEEE Computer, April. [44] Case, R. and Padegs, A. (1978). Architecture of the System/370. Communications of the ACM 21(1). [45] Wolff, M. (1976). The genesis of the IC. IEEE Spectrum, August. [46] Bell, G. (1984). The mini and micro industries. IEEE Computer 17(10). [47] Goldberg, A. (1988). A History of Personal Workstations, ACM Press, New York. [48] Guterl, F. (1984). Design case history: Apple's Macintosh. IEEE Spectrum, December. [49] Hall, M. and Barry, J. (1990). Sunburst." The Ascent qf Sun Microsystems, Contemporary Books, Chicago. [50] Segaller, S. (1998). Nerds 2.0.1." A Brief Histoo" o/the hlternet, TV Books, New York. [51] Patterson, D. (1985). Reduced instruction set computers. Communications of the ACM 28. [52] Diefendorff, K. (1994). History of the PowerPC architecture. Communications of the ACM 3"7(6). [53] Radin, G. (1983). The 801 minicomputer. IBM Journal of Research and Development 2'7(5). [54] Agerwala, T. and Cocke, J. (1987). High performance reduced instruction set processors. IBM Technical Report, March. [55] Hennessy, J. and Jouppi, N. (1991). Computer technology and architecture: an evolving interaction. IEEE Computer, September. [56] Weizer, N. (1981). A history of operating systems, Datamation, January. [57] Stallings, W. (1995). Operating Systems, 2nd Edn, Prentice Hall, Englewood Cliffs, NJ. [58] Brooks, F. (1975). The Mythical Man-Month. Essays on Software Engineering, AddisonWesley, Reading, MA. [59] Cadow, H. (1970). 0S/360 Job Control Language, Prentice Hall, Englewood Cliffs, NJ. [60] Watson, R. (1970). Timesharing System Design Concepts, McGraw-Hill, New York. [61] Corbato, F., Merwin-Daggett, M. and Daley, R. (1962). An experimental time-sharing system, Proceedings of the AFIPS Fall Joint Computer Conference. [62] Corbato, F. and Vyssotsky, V. (1965). Introduction and overview of the MULTICS system, Proceedings off the AFIPS Fall Joint Computer Conference. [63] Corbato, F., Saltzer, J. and Clingen, C. (1972). MULTICS--the first seven years, Proceedings of the AFIPS Spring Joint Computer Conference. [64] Organick, E. (1972). The Multics System, MIT Press, Cambridge, MA. [65] Saltzer, J. (1974). Protection and control of information sharing in MULTICS. Communications of the ACM 17(7). [66] Ritchie, D. and Thompson, K. (1978). The UNIX time-sharing system. Bell System Technical Journal 57(6). This special issue included a series of articles describing aspects of the UNIX time-sharing system.
88
ALAN R. HEVNER AND DONALD J. BERNDT
[67] Ritchie, D. and Thompson, K. (1974). The UNIX time-sharing system. Communications of the ACM 17(7). [68] Bach, M. (1987). The Design of the UNIX Operating System, Prentice Hall, Englewood Cliffs, NJ. [69] Quarterman, J., Silberschatz, A., and Peterson, J. (1985). 4.2BSD and 4.3BSD as examples of the UNIX system. Computing Surveys 17, December. [70] Leffler, S., McKusick, M., Karels, M. and Quarterman, J. (1989). The Design and Implementation of the 4.3BSD UNIX Operating System, Addison Wesley, Reading, MA. [71] Salus, P. (1994). A Quarter Century of UNIX, Addison-Wesley, Reading, MA. [72] Black, D. (1990). Scheduling support for concurrency and parallelism in the Mach operating system, IEEE Computer, May. [73] Tanenbaum, A. (1995). Distributed Operating Systems, Prentice Hall, Englewood Cliffs, NJ. [74] Silberschatz, A., Stonebraker, M., and Ullman, J. (1996). Database systems: achievements and opportunities, SIGMOD Record 25(1). [75] Stallings, W. (1997). Data and Computer Communications, 5th edn, Prentice-Hall, Englewood Cliffs, NJ. [76] O'Neill, J. (1995). The role of ARPA in the development of the ARPANET 1961-1972. IEEE Annals of the History of Computing 17(4). [77] Hafner, K. and Lyon, M. (1996). Where Wizards Stay Up Late. The Origins of the lnternet, Simon & Schuster, New York. [78] Licklider, J. (1965). Man-computer symbiosis, IRE Transactions on Human Factors in Electronics, March. Reprinted in [47]. [79] Baran, P. (1964). On distributed communications networks. IEEE Transactions on Communications Systems, March. [80] Leiner, B., Cerf, V., Clark, D. et al. (1997). The past and future history of the Internet. Communications of the ACM 40(2). A more detailed version of the paper is available at the Internet Society web site (www.isoc.org) in the section on Internet history. [81] Metcalfe, R. and Boggs, D. (1976). Ethernet: distributed packet switching for local computer networks. Communications of the ACM 19(7). [82] Abramson, N. (1985). Development of the ALOHANET. IEEE Transactions on Information Theory 31, March. [83] Cerf, V. and Kahn, R. (1974). A protocol for packet-network intercommunication. IEEE Transactions on Communications, May. [84] Comer, D. (1995). Internetworking with TCP/IP, 3rd edn., Prentice-Hall, Englewood Cliffs, NJ. [85] Wexelblat, R. (ed.) (1981). Histoo' of Programming Languages L Academic Press, Boston, MA. [86] Bergin, T. and Gibson, R. (eds.) (1996). Histoo'of Programming Languages II, ACM Press, Addison-Wesley, Reading, MA. [87] Backus, J. (1979). The history of FORTRAN I, II, and III. IEEE Annals of the History of Computing 1(1). [88] Glass, R. (1999). Cobol: a historic past, a vital future? IEEE Software 16(4). [89] Naur, P. and Randell, B. (1969). Software Engineering." Report on a Conference Sponsored by the NATO Science Committee, Garmisch, Germany, 1968. [90] Buxton, J. and Randell, B. (1970). Software Engineering Techniques. Report on a Conference Sponsored by the NATO Science Committee, Rome, Italy, 1969. [91] Dijkstra, E. (1968). The go-to statement considered harmful. Communications of the ACM 11(3). [92] Mills, H. (1986). Structured programming: retrospect and prospect. IEEE Software 3(6).
ERAS OF BUSINESS COMPUTING
89
[93] Boehm, C. and Jacopini, G. (1966). Flow diagrams. Turing machines, and languages with only two formation rules. Communications of the ACM 9(5). [94] Baker, T. (1972). System quality through structured programming. Proceedings of the AFIPS Conference, Part 1. [95] Stroustrup, B. (1994). The Design and Evolution of C++, Addison-Wesley, Reading, MA. [96] Gosling, J., Yellin, F. and the Java Team (1996). The Java Application Programming Interface, Addison-Wesley, Reading, MA. [97] Gray, J. (1996). Evolution of data management. IEEE Computer 29(10). [98] Codd, E. (1970). A relational model of data for large shared data banks. Communications of the ACM 13(6). [99] Stonebraker, M. (1996). Object-Relational DBMSs. The Next Great Wave, Morgan Kaufmann, San Francisco, CA. [100] Chaudhri, A. and Loomis, M. (eds.) (1998). OBject Databases in Practice, Prentice-Hall PTR, Englewood Cliffs, NJ. [101] Myers, B. (1998). A brief history of human-computer interaction technology. Interactions, March/April. [102] Royce, W. (1970). Managing the development of large software systems: concepts and techniques, Proceedings o1 WESTCON, August. [103] Boehm, B. (1988). A spiral model of software development and enhancement. IEEE Computer, May. [104] Trammell, C., Pleszkoch, M., Linger, R., and Hevner, A. (1996). The Incremental development process in cleanroom software engineering. Decision Support Systems 17(1). [105] Paulk, M. (ed.) (1994). The Capability Maturity Model: Guidelines for hnproving the Software Process, Addison-Wesley, Reading, MA. [106] Stevens, W., Myers, G. and Constantine, L. (1974). Structured design. IBM Systems Journal 13(2). [107] Yourdon, E. (1989). Modern Structured Anah'sis. Yourdon Press/Prentice-Hall, Englewood Cliffs, NJ. [108] Cameron, J. (1938). JSP & JSD. The Jackson Approach to Software Development. IEEE Computer Society Press, Washington DC (1989). [109] Orr, K. (1977). Structured Systems Development, Yourdon Press/Prentice-Hall, Englewood Cliffs, NJ. [110] Martin, J. (1989). Information Engineering. Books 1-3, Prentice-Hall, Englewood Cliffs, NJ. [111] Booch, B., Rumbaugh, J., and Jacobson, I. (1999). The Un(fied Modeling Language User Guide, Addison Wesley Longman, Reading, MA. [112] Wing, J. (1990). A specifier's introduction to formal methods. IEEE Computer 23(9). [113] Luqi and Goguen, J. (1997). Formal methods: promises and problems. IEEE Software
14(1). [114] Linger, R. (1994). Cleanroom process model. IEEE Software 11(2). [115] Gerhart, S., Craigen, D., and Ralston, A. (1993). Observation on industrial practice using formal methods, Proceedings o[ the 15th International Con[Z~rence on Software Engineering, Computer Society Press, Los Alamitos, CA. [116] Sherer, S., Kouchakdjian, A., and Arnold, P. (1996). Experience using cleanroom software engineering. IEEE Software 13(3). [117] Pfleeger, S. and Hatton, L. (1997). Investigating the influence of formal methods. IEEE Computer 30(2). [118] Glass, R. (1999). The realities of software technology payoffs. Communications of the ACM 42(2). [119] Bowen, J. and Hinchey, M. (1994). Formal methods and safety-critical standards. IEEE Computer 27(8).
90
ALAN R. HEVNER AND DONALD J. BERNDT
[120] Brown, A. and Wallnau, K. (1998). The current state of CBSE. IEEE Software 15(5). [121] Butler Group (1998). Component-Based Development. Application Delivery and Integration Using Componentised Software, UK, September. [122] Zachman, J. (1987). A framework for information systems architecture. IBM Systems Journal 26(3). [123] Shaw, M. and Garlan, D. (1996). Software Architecture: Perspectives on an Emerging Discipline, Prentice-Hall, Englewood Cliffs, NJ. [124] Bass, L., Clements, P., and Kazman, R. (1998). Software Architecture in Practice, Addison-Wesley, Reading, MA. [125] Barnard, C. (1938). The Functions o/the Executive, Harvard University Press, Cambridge, MA. [126] Kast, F. and Rosenzweig, J. (1972). General systems theory: applications for organization and management. Academy of Management Journal, December. [127] Gorry, G. and Scott Morton, M. (1971). A framework for management information systems. Sloan Management Review, Fall. [128] Nolan, R. (1979). Managing the crises in data processing. Harvard Business Review, March-April. [129] Benbasat, I., Dexter, A., Drury, D. and Goldstein, R. (1984). A critique of the stage hypothesis: theory and empirical evidence. Communications of the ACM 27(5). [130] Rackoff, N., Wiseman, C. and Ullrich, W. (1985). Information systems for competitive advantage: implementation of a planning process. MIS Quarterly 9(4). [131] Rockart, J. (1979). Chief executives define their own data needs. Harvard Business Review, March/April. [132] Peebles, R. and Manning, E. (1978). System architecture for distributed data management. IEEE Computer 11(1). [133] Scherr, A. (1978). Distributed data processing. IBM Systems Journal 17(4). [134] McFadyen, J. (1976). Systems network architecture: an overview. IBM Systems Journal 15(1). [135] Peters, T. (1987). Thriving on Chaos. Handbook for a Management Revolution, Knopf, New York. [136] Haeckel, S. and Nolan, R. (1996). Managing by wire: using IT to transform a business from make and sell to sense and respond, Chapter 7 in Competing in the Information Age." Strategic Alignment in Practice, ed. J. Luftman, Oxford University Press, Oxford. [137] Barrett, D., Clarke, L., Tarr, P., and Wise, A. (1996). A framework for event-based software integration. A CM Transactions on Software Engineering and Methodology 5(4). [138] Whinston, A. Stahl, D., and Choi, J. (1997). The Economics of Electronic Commerce, Macmillan Technical Publishing, New York. [139] Dertouzos, M. (1997). What Will Be. How the New World of Information Will Change Our Lives, HarperCollins, New York. [140] Pinker, S. (1997). How the Mind Works, W. W. Norton, New York. [141] Gates, B. (1999). Business ~ The Speed of Thought. Using a Digital Nervous System, Warner Books, San Francisco, CA. [142] Moore, J. (1998). Software Engineering Standards. A User's Road Map, IEEE Computer Society, New York.
Numerical Weather Prediction FERDINAND BAER Department of Meteorology University of Maryland College Park, MD 20742 USA baer@atmos, u ud.edu
Abstract Astounding advances in numerical weather prediction have taken place over the last 40 years. Atmospheric models were rather primitive in the early 1960s and were able to yield modest forecasts at best for one day over a very limited domain at only one to three levels in the atmosphere and for only one or two variables. Today reliable forecasts are produced routinely not only over the entire globe, but over many local regions for periods up to 5 days or longer on as many as 80 vertical levels and for a host of variables. This development is based on dramatic improvements in the models used for prediction, including the numerical methods applied to integrate the prediction equations and a better understanding of the dynamics and physics of the system. Most important is the growth of computing power during this era which allowed the models to expand by more than five orders of magnitude, thus significantly reducing errors and increasing the number and range of variables that can be forecast. Concurrently, the processing of data used by models as initial conditions has also benefited from this explosion in computing resources through the development of highly sophisticated and complex methodology to extract the most information from accessible data (both current and archived). In addition, increased communication speeds allow the use of more data for input to the models and for rapid dissemination of forecast products. Numerous regional models have sprung up to provide numerical forecasts of local weather events with a level of detail unheard of in the past, based on the rapid availability of appropriate data and computational resources. New modeling techniques and methods for reducing forecast errors still further are on the horizon and will require a continuation of the present acceleration in computer processing speeds.
1. 2.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computational Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The Finite-Element Method . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Spherical Geodesic Grids . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Time Truncation and the Semi-Lagrange Method . . . . . . . . . . . . . . .
ADVANCES IN COMPUTERS, VOL. 52 ISBN 0-12-012152-2
91
92 102 107 123 127 131
Copyright 2000 by Academic Press All rights of reproduction in any form reserved.
92
3. 4. 5. 6.
FERDINAND BAER
Data Analysis, Assimilation, and Initialization . . . . . . . . . . . . . . . . . . Regional Prediction Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . EnsemblePrediction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.
137 144 148 151 153
Introduction
Before the advent of computers, weather prediction was more an art than a science. Forecasters used what little data was available to them to prepare charts (maps) on which they analyzed that data to provide fields of variables, such as temperature or wind, from which they could forecast the evolution of those fields based on intuitive concepts learned subjectively from observations of many such patterns or from heuristic formulae derived from the theory of fluids. In addition, some statistical formulae were available which were derived from limited data archives. In all, although some of these forecasters were brilliant, neither the product nor the prospects were encouraging. The one shining light in this cloudy view was the recommendation of Richardson [1] who clearly saw weather forecasting as a computational problem based on fundamental prediction equations, but did not have the computing resources to prove his point. Then with the advent of computers in the late 1940s, numerical weather prediction began in earnest. Von Neumann selected the weather prediction problem as his primary choice to demonstrate the feasibility of the digital computer as the tool for solving complex nonlinear problems [2]. Richardson's equations were resurrected and simplified to meet the computer's capabilities, and the march to successful forecasts was at last on a road from which it has not yet deviated, nor is there any indication that the quest will cease. Phillips [3] summarized the state of this development by 1960 in a comprehensive article in this series, and since that article will be referenced frequently in this chapter, further numerical reference to it is omitted. References to other work by Phillips will be cited in the usual way. Since 1960, numerical weather prediction knowledge and capability has mushroomed. Terminologically, the weather prediction problem is approached by defining a "model," and solutions to this model are sought. Thus Richardson's equations which described the atmosphere and which he postulated for use in weather prediction served as one of the earliest models, of which there are now countless numbers. One can find a model to predict almost any aspect of the weather. Thus it seems appropriate to summarize briefly how our atmosphere is Constituted and what might be predictable. The Earth's atmosphere is a thin fluid shell composed of a number of gases that are in constant motion. Since the human population lives at the bottom
NUMERICAL WEATHER PREDICTION
93
of this mass and interacts with it constantly, it is hardly surprising that we are interested in its evolution. The description of the atmosphere is given by a number of variables that represent molecular composites of the gases such as velocity, temperature, water content in all three phases, and aerosols. These variables are distributed continuously in three-dimensional space and vary with time. One can either determine the evolution in time of these variables at each point in space (defined as an Eulerian approach) or follow the particles through time (defined as a Lagrangian approach). Although the Eulerian approach has long been favored, recent developments have suggested advantages to the Lagrangian approach. More will be said on this subsequently. The model from which the future state of the variables is assessed is based on known physical and dynamical principles. These principles take the form of mathematical equations, including the equations of motion (also known as the Navier-Stokes equations), an equation for conservation of mass, an equation to determine the change in entropy, equations which determine the changes in water substance in its various phases, and chemical equations for movement and changes of aerosols. Phillips presented some of these equations, and they are repeated here for ease of reference. The additional equations are presented only formally since they have many forms depending on the particular model with which they are associated. Details may be given subsequently if relevant. The equations are written in their Eulerian form, so the time derivative is taken locally at a given point in the fluid. To determine the motion of the fluid, the equation of motion for the vector velocity V in all three space dimensions and relative to the rotating Earth is 0V
1 = -V.
VV-
2Q x V
Vp-
Ot
gk + F
(1.1)
p
where f~ is the angular velocity of the earth, p and p are the density and pressure at an atmospheric point respectively, g is the gravitational acceleration in the k (unit vertical vector) direction, and F comprises all the frictional forces per unit mass. Conservation of mass is represented by an equation of continuity:
Op
-- = -V.
Ot
pV
(1.2)
The thermodynamics of the system are described by changes in entropy:
Os Ot
1 = -V.
Vs +-
T
q(q~,,
ql, qi, aj, ...)
(1.3)
94
FERDINAND BAER
where s denotes specific entropy, q is the rate of heating per unit mass, and T is the temperature. Note that q depends on the heating rates associated with water vapor, qv, ice, q~, liquid water, ql, aerosols, ax-, and other factors depending on the dependent variables, such as radiation. Each of the variables qk and ak will have its own prediction equation,
Oqk Ot
= Qk
(1.4)
where the Qk represent complex formulae relating some or all of the dependent variables. As noted earlier, these representations vary significantly from model to model. This entire system of equations constitutes the "model" that is integrated in time to predict the future state of the fluid. The complications associated with the solution process of this system are enormous and have spun off countless research endeavors. The model clearly needs boundary conditions both at the top of the atmosphere and at the surface where the model interfaces with either the oceans or the landmasses of the Earth. The model needs initial conditions, which come from data that is observed. But is the data sufficient, is it suitable to represent the model variables, or is it even possible to observe some of the dependent variables? The model represents an infinite range of scales from the global to the microscale; the system must therefore be truncated to represent both meaningful scales and realistically computable ones. Indeed, since it is impossible to encompass too wide a range of scales in one model, separate models have been designed to represent particular scale ranges. Extreme examples might be models to describe hurricanes or tornadoes. The forces that drive the models, in addition to the boundary conditions, are represented by the terms that make up the qk and ak, and these functions also vary depending on scale. It should be evident from this discussion that weather prediction raises enormous problems both physically and computationally. Indeed, computational resources may be the limiting factor in our ability to improve predictions. Shuman [4] proposed a tantalizing hypothesis in this regard. He noted that over the years it has been possible to correlate the improvement in forecast skill with enhanced computing power. From this information he speculates on the possibility of predicting the future improvements of forecasting skill on estimated increases in computing power. As an example of how computers have evolved over the period since Phillips' 1960 review, Bengtsson [5] discusses the evolution of forecasting in Europe by comparing forecasts made in Sweden in 1954 with a simple prediction model on a Besk machine to a recent product (1998) from the European Center for Medium-Range Weather Forecasts (ECMWF) using
NUMERICAL WEATHER PREDICTION
95
one of the world's most advanced models on a 16-processor Fujitsu VPP 700. He notes that the number of calculations per unit time during this 45 year period has increased by over five orders of magnitude! Comparable increases in computing capability have been noted by Kalnay et al. [6] for predictions produced by the National Centers for Environmental Prediction (NCEP) of NOAA. In 1960 their model ran on an IBM 7090 with a performance speed of 67 Kflops, whereas in 1994 the model (clearly much more sophisticated) ran on a 16-processor Cray C90 at 15 Gflops. The models and their forecasting skill, as presented by Phillips in 1960, were indeed impressive when one considers that hardly a decade had passed since the process of using models in conjunction with computers had begun. However, by today's standards, those models and the features surrounding them appear primitive. Of course computing power itself was then correspondingly primitive. Consider that the models of the time could reasonably represent only the largest significant horizontal weather scales with forecasts limited to 24 h. The model domain ranged from continental regions to hemispheric and had at most two or three levels in the vertical. Moisture was not considered in great detail. Data management for initial conditions was fragile, the observing network was limited except for the heavily populated regions of the northern hemisphere, and the analysis of this data for input into the models was carried out using methods that were elementary and not carefully tuned to the models' needs. Computations were made only with classic finite difference methods, although the numerical limitations of the techniques were well understood. The changes that have occurred in the science of numerical weather prediction over the past 40 years, when measured against what Phillips reported, are staggering. Enormous progress has been achieved in the application of countless research developments that have been incorporated into models, and these models are now flourishing at numerous prediction centers. Whereas only a few numerical forecast centers existed in 1960 and these were primarily experimental, many countries now have their own numerical forecast center. Many of these national centers produce forecasts for local consumption with regional models, using global forecast data created by larger centers. An example of such a large center is the European Center for Medium-Range Weather Forecasts (ECMWF), a center created jointly by some 19 European nations during the 1970s. The E C M W F is one of the primary centers for numerical weather prediction (NWP) in the world today, and is competitive in producing some of the best global forecasts. Of course the US has kept pace with numerous forecast products on many scales generated by the National Centers for Environmental Prediction (NCEP) as part of the National Weather Service of NOAA. Other large centers that provide valuable global predictions as well as additional
96
FERDINAND BAER
forecast products include the United Kingdom Meteorological Office (UKMO), the Japanese, and the Canadians. Although many of the products delivered by these centers share a common purpose, the tools used in the form of models reflect the enormous advances that have taken place. Thus no two centers use exactly the same model for global predictions, although some of the more basic features in the models, which the community has agreed are well understood, do recur. Those aspects of models that are still subject to research evaluation may appear in various representations in the models of these larger centers. Competition for high quality forecasts is keen among the centers and, although their products are uniformly good, the best forecast on any given day will not always come from the same center. For those centers using regional models to forecast smaller scale events, the differences among these models is still pronounced. Perhaps the most apparent difference amongst models is in the technique selected to convert the basic nonlinear differential equations that describe the forecast system, i.e., (1.1-1.4), to a numerical form suitable for computation and integration on a digital computer. As noted from Phillips' discussion, the method of choice at the time was to apply finite differences in both the time and space dimensions. Since the vertical dimension has unique properties when compared to the horizontal dimensions in the atmosphere, advances in representing the equations in these two domains developed apace. For the horizontal representation, the spherical nature of the domain led to the application of a Galerkin approach known as the spectral method. Considering that at any given height in the atmosphere there exists a closed spherical surface on which the dependent variables describing the fluid are prescribed and predicted, the spectral method assigns a set of continuous orthogonal functions over the domain to represent these variables; only the coefficients of these functions vary depending on time and the particular variable represented. When all the variables are described in this way, the resulting equations are integrated over the global domain, leading to a set of ordinary nonlinear differential equations in time and vertical level. Differentiation in the vertical coordinate has continued to be transformed to finite differences. As computational problems with the method were resolved, most global models adopted some form of this spectral method, and until recently, it has remained the most popular method for solving the forecast equations. With the evolution of models from the basic geostrophicforecast system in which the divergence field is not predicted to the hydrostatic forecast system (primitive equations) where the divergence is predicted (see Phillips for definitions of these systems), the representation of the vertical coordinate became more important. In particular, the inclusion of surface topography
NUMERICAL WEATHER PREDICTION
97
in the models created computational complications that were partly resolved with the introduction of the sigma (c~) coordinate. If pressure is taken as the independent vertical coordinate using hydrostatic considerations, a is the pressure normalized by the surface pressure. Utilization of this coordinate in the vertical has become very popular and successful and additional variants have been explored. Modelers have developed a system whereby the c~ coordinate is used in the lower atmosphere and gradually transforms to pressure with height in the atmosphere. Indeed, some modelers have advocated entropy as a coordinate for the upper atmosphere. Other methods of representing the prediction equations in their horizontal coordinates have also been explored over the years, but have not gained popularity because they may have been too demanding of computers. These include the finite element method, which allows for variable grid sizes over the domain, and spherical geodesic grids, which provide exceptionally homogeneous gridding over the entire spherical surface and use an integration technique to solve the equations. Both methods have seen a resurgence of interest in the last few years with the advent of more suitable computing hardware, i.e., parallel processors. Since the forecast equations represent an initial value problem as well as a boundary value problem, the truncation of the time dimension is equally as important as that of the space dimension. The transform to finite differences in time had been thoroughly studied by Phillips' time and he noted many of its properties. The selection of explicit schemes to solve the atmospheric model equations has remained popular to this day, in particular the threelevel leapfrog scheme. However, with the ascendancy of the primitive equations, high frequency gravity waves were unleashed in the models, and because of the computational stability requirements inherent in explicit methods (note the CFL criterion [7]--see also Phillips for details on computational stability), very short time-steps were required to perform stable integrations. Indeed these computational requirements relative to available computing resources slowed progress in numerical weather prediction for a significant period of time. This impasse led researchers to develop composite schemes where the implicit and explicit methods were combined, and this approach was called the semi-implicit scheme. The nonlinear terms in the equations, which could be shown to propagate with relatively slow frequencies, were integrated using an explicit scheme whereas the linear terms, which propagated with high frequency, were integrated using an implicit scheme. Since the implicit scheme is always stable, the system of prediction equations in this form could be integrated using the larger time-step chosen from the explicitly integrated terms, thereby saving substantial computing time. Needless to say, the results of forecasts using this method proved satisfactory.
98
FERDINAND BAER
As computing resources increased over the years, model complexity grew rapidly. In particular, modelers were not content to predict only the largest scales; with increased resources they decreased the horizontal grid length in the models to improve the predictability at more regional scales. Not only did this increase the computational size of the model, it also required a decrease in the time increment to minimize computational error. Thus, despite the advantages of the semi-implicit scheme, numerical weather predictions continued to tax the limits of computing resources on the largest available machines. In this environment researchers reassessed the advantages of using the Lagrangian approach which had proved disastrous in early studies because particles tended to cumulate in regions of high vorticity [8]. The more recent studies solved this problem by interpolating particles to a fixed grid after each time incremental cycle. Indeed it was found that the method was computationally stable for all time increments (like to the implicit method) yet experiments with state-of-the-art-models indicated that a significantly longer time-step could be used to achieve the same quality of forecast when compared to other schemes. Thanks to this, the method has become extremely popular and is now incorporated to some extent in the working models of most large centers. Concurrent with these developments in numerical processing of prediction models, other features of the models that affect predictive ability have been studied in detail and significant improvements have been incorporated into operational models. These features include the assessment and processing of data for establishing model initial conditions, parameterization of physical processes which act as external forces to alter the model dependent variables in time and space, and the modification of the general prediction equations to suit the unique characteristics of particular spatial scales. These issues are deeply interrelated, yet must be investigated independently. Different forces in the prediction system dominate depending on the scales of interest; note that the principal forces which drive the cloud systems on the hurricane scale will not play a major role on the synoptic scale and vice versa. Additionally, the initial conditions must be closely tuned to the selected forecast scales. The interrelationships will become evident from the ensuing discussion. As the numerical representation of models developed, it became evident that initial conditions also play an equally important role in the quality of predictions. Forecasters realized that it was not sufficient to use simple interpolation to adjust observations as the initial state for integration. Several factors were at play here. First, new observations became available which were not conveniently distributed either in time or space to be simply used, but were exceptionally useful to fill in data gaps when treated appropriately. More importantly, however, was the gradual recognition that
NUMERICAL WEATHER PREDICTION
99
data analysis as input for initial conditions to a model must be uniquely tuned to the model itself and not analyzed by arbitrary and independent formulae. This awareness arose from the realization that no model is a perfect representation of reality but is merely an approximation. Thus data which represents reality perfectly if used in a model which is merely an approximation to that reality may not produce the optimum prediction from that model. Indeed, data that is suitably tuned to a model may well optimize the output of that model when used as initial conditions. As this awareness took hold, intense effort was applied to the data analysis problem starting with what was known as data initialization. The initial data fields for the relevant prediction variables were adjusted to match the equations from which they would be predicted. For example, if the model imposed geostrophic constraints, the initial conditions were adjusted to satisfy these constraints. Of course one could argue that with time the model would ultimately adjust any dataset, but nonlinear systems are notorious for going awry and are not infrequently unstable if they are perturbed slightly out of equilibrium. Moreover, a model is expected to give an optimum forecast rather than act as an analysis tool. But as research evolved, it became clear that a model could do both. Thus a new process for analysis known as variational analysis took hold and is still being refined. The procedure uses the model to adjust input data by forward integration for a finite time, returning to the original time by backward integration with the adjoint of the model. During this process asynchronous data may also be incorporated; indeed statistical data may be applied in regions where there is a severe paucity of data. This development has been incorporated into the models of most large forecast centers and has substantially improved their forecast products. Additionally it has led to the reanalysis of old data, making available long records of data archives that have consistency built into them and are valuable for testing new methodology. At the time Phillips was writing, numerical weather prediction was the exclusive domain of fluid dynamicists. Given the computational resources available, details on the physics and chemistry of the fluid could not have been effectively incorporated into the models. Thus developments in model improvements remained focused on both the numerics and dynamics of the computational system. Although the fundamentals of radiation were understood, lack of observational data inhibited research progress in that discipline for many years. Studies on the microphysics of clouds were in their infancy and progress was slow. It was many years before clouds were considered in models, and at first only the liquid phase of water substance was incorporated. That clouds grew and dissipated through entrainment, that they passed through freezing levels and changed phase, thereby releasing or gaining heat, was well understood but not added to models for
100
FERDINAND BAER
decades. Concurrently, detailed boundary conditions at the surface of the atmosphere such as the oceans and the biosphere were not applied to the models until their need in climate models was demonstrated. Interactive models in which predictions are made of all systems that interface at the boundary and are appropriately coupled are gradually beginning to surface, primarily in climate modeling efforts, but have not yet been sufficiently studied to be systematically utilized in short-term weather prediction. Atmospheric chemistry, the study of chemical species and aerosols which make up the atmosphere and which interact so as to impact on the evolution of the total fluid, is perhaps the most neglected of the physical processes that may play a role in the forecasting of weather. Even now, no large NWP model interactively involves details of the chemistry of the fluid. During recent decades, studies have begun using the results of prediction models to provide winds to move chemical species and aerosols about the fluid as passive components; only recently have efforts been made to consider actual changes in these tracers with time and how those changes might affect the primary predictive variables of the models. Although it is beyond the scope of this review to discuss the developments in our understanding of these physical processes in detail, it is noteworthy that as interest in prediction products expands to the smaller space scales, more emphasis will fall on the chemistry and physics of the fluid. Indeed when forecasts are made on urban scales of tens of kilometers, people will want to know not only that it might rain, but something of the intensity of the rain and its composition; i.e., whether it might contain hail. Furthermore, the concentration of ozone in the air and its persistence will become one of the primary products of a summertime forecast on these scales. Hypothetically, the prediction equations as they are written should describe all horizontal scales from the global to the microscale, but obviously this is not computationally feasible. In selecting scale ranges, which is done by truncating the computational form of the equations to the desired spatial increment, the planetary scales are the easiest to predict both because they require the fewest initial conditions and because they can be integrated with a larger time increment to maintain computational stability, thus using less computing time. Unfortunately, fully global models are best formulated by the primitive equations, which do not create problems at the equator. Thus the benefit of scaling is offset by the complexity of the model construction. Historically one notes that the earliest models were quasi-geostrophic and not global; they were hemispheric or nearly so, depending on computer limitations. As computers became more powerful, the models became global and used the primitive equations. As demand grew for predictions with more resolution, i.e., on smaller spatial scales, the method for providing
NUMERICAL WEATHER PREDICTION
101
regional and smaller scale predictions became a critical issue. Various approaches emerged. One perspective was to use only the basic system of equations and truncate them at the desired scale even if that scale required the interaction in the model of many scales from the planetary to the regional, say 50-100 km or less. This is not difficult to formulate but presents a formidable computational effort, although now within the capacity of the largest computers. Another approach, which has attracted a large following, is to generate a regional model which focuses on the smaller scales, assuming that neither the interactions with the larger scales (those not incorporated in the model) nor their changes during the integration period have much impact on the forecast scales during the prediction. A variant of this approach is to locally embed the regional model in a global model, thus providing time varying boundary conditions from the global model to the regional model during the integration period. A recent development, not yet in use for routine predictions, is the concept of grid stretching. In this application a variable grid is used which has finer resolution over the region requiring more detailed spatial prediction, and the grid is systematically enlarged away from that region. The approach allows for complete interaction of all scales over the entire global domain, but substantially reduces the computational requirement of integrating the model everywhere with the smallest scales. A finite element modeling approach which can accomplish a similar effect but with several regional domains over the globe is currently under development and is also used in ocean modeling. With reference to developments for predicting weather on various scales, there exist currently a variety of models to forecast tornadoes, hurricanes, dust devils, and convective complexes, and there are others in various stages of construction. Most of these models are nonhydrostatic and can be adapted to give predictions over any selected geographic domain. The output from these models is gradually being entrained into the products distributed by the largest forecast centers. Finally, perhaps the most insightful observation on numerical weather prediction since the time of Phillips' presentation was the theoretical limit on predictability enunciated by Lorenz [9]. Using an elegantly simple model of the atmosphere, Lorenz demonstrated by an error propagation method that all forecasting ability must cease within about 2 weeks after the start of an integration, no matter how small the initial error. Numerical weather prediction experts have taken this assessment to heart. In the last few years they have developed an ensemble forecasting approach. Since errors will grow in any forecast and these errors are random, a suite of forecasts is undertaken with this procedure, each of which has a slight perturbation in the initial state. Rather sophisticated methods have been developed to create
102
FERDINAND BAER
these perturbations. The ultimate forecast is a statistical composite taken from this ensemble of forecasts. The method improves on individual numerical forecasts and is gradually being adopted by the large forecast centers. Section 2 presents most of the more popular methods that have been developed and exploited to solve the prediction system on available computing systems. As computational resources have improved, these methods has risen or fallen in the popularity stakes. Each has its own advantages and limitations, but without the thorough study of these methods, the level of accuracy in the forecast products now available from numerical prediction models would not exist. In Section 3 the preparation of data needed to activate the prediction models as initial conditions is assessed and the procedures that have been developed to help ensure accurate model input are enumerated. It is evident that if models begin a forecast with less than high quality input data they are doomed to failure, since initial errors will propagate in time during the calculation. Improvements in data assimilation over the last 40 years, as will be shown, have indeed been astounding. It is unreasonable to expect that all space scales can be accurately predicted with one model. As success with prediction of planetary scales using global models accelerated, some modelers turned their attention to creating regional models to provide more local predictions. This development has been stimulated by the dramatic advances in available computing resources and is discussed in Section 4. Finally, no single model integration can provide a perfect forecast, no matter how accurate the initial state or how good the dynamics and numerics of the model are. This is simply due to the fact that the exact differential equations must be approximated. Numerous model integrations have demonstrated that minuscule perturbations of the initial state lead to identifiable changes in a model's forecast. These differences are known as model variability and apply to all models. Section 5 discusses this issue and suggests that application of recent developments in ensemble forecasting may help to reduce such errors.
2.
Computational Methods
Although the equations describing the atmosphere presented in the previous section are differential equations, their nonlinear nature implies that a numerical representation is essential to their solution. Indeed, Phillips discussed in detail the numerical features of the finite difference method applied to the system, including computational errors inherent in difference
NUMERICAL WEATHER PREDICTION
103
procedures, stability requirements, and solutions to associated numerical boundary value problems. The success with these techniques to date is astounding insofar as the uniqueness of solutions to the difference systems cannot be established. Tradition led early modelers to use finite differencing, but once predictions with these systems became established, certain inherent problems became apparent. It was Phillips [10] himself who brought the problem of nonlinear instability to the attention of the modeling community; it is an issue requiring careful attention lest the evolving solutions become unstable. A cursory view of (1.1) shows that nonlinear terms of the form u Ou/Ox exist and must be calculated. For simplicity let us assume that u is one of the velocity components and is merely a function of one space dimension (x). Since the variables must be represented on a grid of points truncated at some Ax, one could equally well represent u as a Fourier series with the shortest resolvable wave length of 2Ax and corresponding wave numbers m >17flAx. If for simplicity only sine functions are considered, u can be written as, M
u- Z
ui sin mix
(2.1)
i-1
over M grid points. The nonlinear product of any two waves, mi and mj, becomes
Ou Ox
- mj sin mix cos mjx + . . . .
mj[sin(mi 89 + mj)x + s i n ( m / - mj)x] 4 - . . . (2.2)
Consider now that u can be predicted from this term and others as
u(t + A t ) = u(t) + At(uOu/Ot) + . . .
(2.3)
So long as mi + mj <<.M, u(t + At) can be represented by the Fourier series in (2.1) without error. However, if the sum of the two wave numbers exceeds M, the series limit is exceeded. An expansion of u(t + At) with M terms will not see this interaction. Worse still, it can be shown by trigonometric expansion that terms larger than M A x will fold back into the range ( M - m k ) A x creating an aliasing error that can grow without bound and is known as nonlinear instability. Since this error tends to affect the shortest scales, the solution has been to include a scale-sensitive viscosity in the model equations to damp these scales before they grow significantly; a rather arbitrary and nonphysical, but practical, decision. Models represented by finite differences are often called gridpoint models, and the grids for these models have been selected in a variety of ways.
104
FERDINAND BAER
Perhaps from the tradition of displaying atmospheric data on conformal maps, the prediction equations were in the early years projected onto such maps [11]; examples are polar stereographic for the mid and polar latitudes and Mercator for the equatorial regions. The projection equations were then gridded to establish some uniformity of scale and allow for the necessary differencing. As models grew from limited domains to global, no one projection was adequate to describe the entire system. This led to complex compositing of projections: for example, Phillips [12] overlapped the polar stereographic and Mercator maps to accommodate all latitudes with minimum projection errors. Since the prediction equations can be represented in spherical coordinates directly on a global surface, gridding the surface in latitude and longitude coordinates seemed both more appropriate and simpler, since no mapping was required, and this representation became popular. Unfortunately as one approaches the pole in this representation, the minimum increment of longitude (AA) remains the same but the length of the increment decreases with the cosine of latitude. Since the CFL stability criterion depends on the grid length, shorter time-steps are required to maintain stability near the pole and led to the now classic "pole problem." Various solutions to this problem have been proposed but none are ideal, as the following example suggests. Consider a representation of any model variable given by (2.1) where x describes latitude and M refers to half the number of grid points. If one truncates this series so that the terms following N (where N > M) are removed, the resulting series will represent only larger space scales and a longer time increment will then yield stable results. This process has been used frequently. Unfortunately, arbitrary truncation of terms in the prediction system may alter the ultimate solution, given the fragile nature of nonlinear systems. The potential hazard thus raised is analogous to incorporating arbitrary friction to solve a previously discussed computational problem. One of the most prominent features of the atmospheric prediction system is that in the absence of external energy sources, the total energy of the system must be conserved in time. Despite the sources that come from, say, radiation and surface boundary effects, it is important that the conservation conditions inherent in the differential equations be maintained in the computational equations. Setting up the finite difference equations to conserve energy is a daunting task and has engaged the best researchers for decades. It is particularly difficult because of the three-dimensional nature of the task. To understand the significance of conservation in a truncated framework, it is convenient to use a simple reduction of the complete system as a demonstration. Consider a barotropic fluid, which exists if the thermodynamic variables are uniquely related to one another and are independent of
105
NUMERICAL WEATHER PREDICTION
position in the fluid (see Phillips for details). Since the vertical variation becomes irrelevant to the solution of the fluid motions in this setting, the motions need consideration in only one horizontal surface. In this context it is convenient to assume that the fluid is incompressible. As can be seen from (1.2), this leads to three-dimensional non-divergence in the fluid. On the assumption that no divergence is introduced at the upper and lower boundaries, the continuity equation can be integrated over the vertical domain to demonstrate that no divergence exists in any horizontal surface. Finally, if the fluid is considered to be in hydrostatic equilibrium, it is only necessary to establish the evolution of the horizontal velocity in any such surface and the vertical component of velocity can be ignored. This velocity is represented by two scalar variables, which may be transformed to any other two scalar functions. Because rotation plays such a major role in atmospheric motions, vorticity and divergence are the popular choices for this transformation. Note that for the approximations stated above the divergence vanishes, hence the velocity may be represented uniquely by the vorticity alone. Moreover, a prediction equation for the vorticity may be derived from (1.1) and under the conditions specified, is known as the This equation, although representing a very simplified version of the atmosphere, contains many features of the full atmospheric system and is thus a convenient tool for evaluating prediction techniques. Applying the approximations stated above to (1.1) and in addition ignoring friction, the following modified equation of motion may be written;
barotropicvorticity
equation.
OV2_-V2 VV2 Ot
2~2 X V2
p(p)
V2p
(2.4)
where the subscript 2 denotes two-dimensionality. The Earth's vorticity may be expressed here as 2 ~ - f k where f - 2 5 2 sin ~ is the Coriolis parameter, and ~ is latitude. If the velocity is now transformed to rotation and divergence by the definitions V2 = k x V~ + VX V- V2 = V 2X = 6 - divergence
(2.5)
k. V • V2 = V 2f _ r _ relative vorticity an equation for predicting the vorticity may be established by applying the operator k. V x to (2.4); k-Vx
[OV20t= - V 2 - V V z
-fk
• V2 ]
(2.6)
106
FERDINAND BAER
Noting that the divergence vanishes, and substituting the definitions from (2.5), the barotropic vorticity equation emerges as o~
Ot
= -1/2. V q - - k • V ~ . V r / - - J ( ~ , q)
(2.7)
r / - ( + f = absolute vorticity It is an easy matter to demonstrate that from (2.7) the absolute vorticity or any function of it must be conserved both following a particle and on integration over the surface of all particles. When this equation is converted to finite difference form, the condition of conservation should also be met. Arbitrary application of a finite difference operator to (2.7) will not achieve this required result. Arakawa [13] presented a procedure that assures conservation not only of vorticity, but kinetic energy and enstrophy (mean squared vorticity) as well. The technique involves multiple representations of the Jacobian operator; Arakawa used the following three, with x and y as the independent variables and r/as absolute vorticity; J( ~, ~) -
=
Ox oy
Oy Ox
o[~(o~/oy)] Ox
-
=
oy
o[~(a~/Ox)] Oy
_
Ox (2.8)
Applying centered second-order finite differencing to these three terms he generated three Jacobian operators and he combined them linearly with arbitrary coefficients. Considering enstrophy conservation by using r/J(0, q), he discovered that the sum of all the grid point values over the domain vanished if he weighted the Jacobians equally and gave each a value of one-third. This computational method also assured the conservation of vorticity and energy. The method was ultimately extended to the primitive equations, which represent general three-dimensional flows with divergence [14]. Interestingly, this procedure also prevents nonlinear instability. Yet despite its success, the method implies the solution to an integral by adjustment of the integrand to satisfy a few given constraints, and this may be accomplished in many different ways. Since no proof of optimization has been established, significant truncation errors may occur using this technique. In the face of the complexities associated with finite-difference techniques it is little wonder that alternate computational representations might be explored, and several useful ones have indeed surfaced. The first and perhaps most popular was the spectral method already mentioned, which
NUMERICAL WEATHER PREDICTION
107
arose as a natural consequence of the simple lateral boundary conditions of the global atmosphere. This method transforms the representation of the dependent variables from horizontal grid point space into continuous functions of the coordinates and allows for integration of the resulting equations. As noted earlier, the technique has proved to be very popular and prediction results using it have been uniformly successful for fully global models. However, when a prediction domain is selected which is not fully global and includes lateral boundary conditions, the benefits of the spectral method diminish and other methods are preferable. It is in this context that the finite element method, which is also a derivative of the general set of Galerkin methods, shows its advantages. In particular, the finite element method has distinct benefits when regional scales are desired in local domains. This process will be discussed in a later section. To reduce the impact on predictions of errors associated with different grid lengths over the domain, spherical geodesic grids have been developed. With this technique, the smallest two-dimensional grid areas selected are very uniform and do not vary significantly from one another over the entire global surface. With the advent of advanced computing power in the 1990s, earlier experiments with this method that had been set aside because of computing restrictions have stimulated new efforts that are now achieving promising prediction results. The methods referred to above are applications exclusively in the horizontal space domain, and may be combined with any discretization in both the vertical and time domains. Because of tradition, these techniques were developed with the prediction equations represented in the Eulerian framework; i.e., all calculations are made locally in the space domain and values of the dependent variables are extrapolated in time at each of the relevant points in space. If one elects to predict using the equations in the Lagrangian framework, the local or point values of the variables move in space during each time-step, and must be interpolated to a grid after each integration level. The procedure can be carried out using any of the spacediscretizing schemes discussed, but the interpolation must be made in physical space, so the spectral method becomes rather more cumbersome and perhaps more time consuming on a computer. Nevertheless, because of the stability benefits inherent in the Lagrangian scheme, it has become the method of choice for many prediction models and its properties will be discussed here. 2.1
Spectral M e t h o d s
Since the 1960s, the spectral method has become by far the most popular technique for converting the prediction equations to a computational form. It
108
FERDINAND BAER
appears to overcome many of the limitations introduced by the finite difference method, and despite new ideas that are drawing modelers to other procedures, it remains attractive to the modeling community. Although the structural character of the method is substantially different from the finite difference method, the two methods can be cast in a similar representational form, and this similarity allows for more systematic comparison. To elucidate this similarity, consider the dependent variables presented in the prediction equations (1.1-1.4) represented as the vector B - (V p s qv qi ql aj .-.)T where T denotes transpose. The dimensions of B are determined by the number of variables in the system; let that be N. As the equations stand, the left-hand side of the set is simply OB/Ot and the right-hand side can be summarized by a vector F with the same dimension as B to yield the following system: 0B
Ot
--- F(B, r, t)
(2.9)
Note that F depends both differentially and nonlinearly on B, the space coordinates r, and time. It is sometimes convenient to alter these equations by a linear transformation, which will be denoted by the linear matrix operator L, leading to the final form for the prediction system: 0B L - - - ~'(B, r, t)
Ot
(2.10)
To convert this system to a computational form, consider first the finite difference process. Selecting a three-dimensional grid with M points to approximate the continuum in space with suitably prescribed boundary conditions, and a difference operator to describe derivatives, B can be represented at each of the points with dimensions (NM); indeed, since the values of B must be available at some initial time, a numerical integration can proceed. The matrix L becomes by virtue of the difference operator a (NM x NM) matrix which can in principle be inverted, and F also becomes a numerical vector with N M elements after utilization of the difference operator at each grid point. The final finite difference system may then be written, using a circumflex to represent the numerical vectors and matrices at the grid points, as ~
Ot
= L - ~ ' ( I ] , r, t)
(2.11)
The solution is thus reduced to a matrix computation provided a numerical scheme is introduced to step the solution forward in time, and the resulting
NUMERICAL WEATHER PREDICTION
109
computational errors and stability issues are explicit in the numerical and physical approximations made. Phillips has discussed some of the finite difference approximations made in converting (2.10)-(2.11). The spectral method uses a different approach. Given a continuous domain over which the model variables are to be evaluated, one selects a set of linearly independent global functions that are continuous over the domain and have at least continuous first and second derivatives. The model variables B,, are then expanded in these functions with unknown time dependent coefficients. Thus instead of a set of values for the B,, at each grid point (iAxl,jAx:, kAx3) one has for
Bn,
Me
B,- ~
!11 --
B,,.,,,(t)Z,,,(r)
(2.12)
]
where Zm are the global expansion functions (with their requisite properties). The choice of these functions is arbitrary but some guidelines may optimize the selection. Ideally it would be desirable to select functions which would fit the observation points of the expanded variables exactly and interpolate between these points, but the distribution of observations is so non-uniform that this is unreasonable. Indeed a complex discipline for data interpolation has evolved to present initial data to a more uniform grid, and will be discussed in a later section. Thus the expansion functions could be chosen to fit statistics of observations interpolated to a more uniform grid such that the least number of functions (Me) are required to describe most of the variance of the variables at those points. Other issues that may play a role in the selection process include the advantage that some functions have in fitting boundary conditions efficiently, and how conveniently they can be orthogonalized. Models using expansion functions that optimize these features could have their computation speed greatly enhanced. For application to the prediction system it is only necessary to introduce (2.12) into (2.10). It is important to recognize however that to maintain the exact form of (2.10) the series given by (2.12) must be infinite. Using the truncated form will cause an error, just as the reduction to a grid does in (2.11). Selecting an optimum truncation is a significant issue and will be discussed subsequently. As noted earlier, modification of the primitive equations to another set of dependent variables by application of the operator L is often done when using the spectral method. However, the system is always linearly decoupled, so that L is represented as a diagonal matrix with L, elements on the diagonal. Utilization of this feature allows for the presentation of the
110
FERDINAND BAER
prediction equation for each variable n in scalar form as follows:
OB,,
L,, ~
Ot
= P,,(B, r, 0
(2.13)
although the variables are still nonlinearly coupled in the functions Fn. Substitution of (2.12) into (2.13) then leads to the error equation,
Z
" L, Zm Ot
m=l
F',, - c,,
(2.14)
The most popular approach to solving this system for the unknown expansion coefficients B,.,,, and denoted as the Galerkin approximation [15] is to multiply (2.14) by suitable test functions Zk(r) and requiring the integral over the space domain to vanish. It can be shown [16] that this is equivalent to minimizing the error in a least squares sense. Note that these test functions must be continuous over the domain, and can be arbitrary. In practice they are often chosen for convenience to be the expansion functions, but this is not required. With the Galerkin approximation, the prediction equations for the expansion coefficients become
Z m=l
,m Ot
I L~Z,,Zk dS ) -
A
F, Zk d S - 0
(2.15)
and yield NMe equations for the unknown quantities, OBn,m/Ot. This system may be solved by choice of a suitable time extrapolation procedure. It is informative at this point to represent (2.15) in a form that is more comparable to the finite difference equations (2.11). Let Bn = (B,,m) and Z - ( Z m ) , both vectors with Me elements. Additionally, assume that the test functions can be similarly represented, i.e., Z - (Zk). Recalling that the functions F, are implicitly functions of (r, t) (see (2.9)), they may be projected onto the expansion functions such that
F~ - Z
F,,. m(t)Zm - Z IF,,
(2.16)
m
where F. (Fn,m). Generating the coefficients F,,.m is a nontrivial operation, resulting from nonlinear combinations of the expansion coefficients Bn,m, and efficient procedures will be discussed subsequently. Using the defined -
NUMERICAL WEATHER PREDICTION
111
vectors, (2.15) becomes
I Z;L,,Z T dS. ~0B,,- I ZZ T dS. Fn Ot
(2.17)
representing Me equations for the expansion coefficients of each dependent variable. To combine the N equations of (2.17) into one expression, it is convenient to define the Me x Me matrices An-J" ZLnZ T dS and A -- f z z T dS, and then create NMe x NMe matrices having these matrices on the diagonals; i.e., Ac - d i a g (An) and AR = diag (A). Extended vectors for the expansion coefficients to include all the variables can be constructed such that Bs = (Bn) and F s - (Fn), leading finally to an equation which is formally identical to the finite difference equation (2.11),
OBs
~-= Ot
AZ1ARF~
(2.18)
The corresponding grid point values from this spectral representation may be calculated at each point (iAxl,jAx2, kAx3) for each dependent variable Bn by use of the expansion (2.12). Efforts with spectral models in three dimensions have generally not been successful, principally because there is no convenient top to the atmosphere, although some representations of the vertical dimension using finite elements have shown promise. Since all significant prediction models represent their dependent variables on a grid of points in the vertical and use finite differencing on that grid, the subsequent discussion of the spectral method will focus on the horizontal domain of the model representation. This requires that the variables B,, be represented on K surfaces in the vertical, with the surfaces separated by the grid intervals, and the variables described by (2.11) in those surfaces. In selecting appropriate spectral functions for the expansion (2.12), it has been noted that in addition to the benefit of fitting observations well, the functions should also be chosen with the properties of the model in mind. To this end, several conditions have been uniformly accepted as requirements for suitable expansion functions. The first condition is to require the functions Zm to satisfy the eigenvalue problem, L,,Zm = -c,,.mZ,,,.
(2.19)
Although this could lead to rather cumbersome functions, in practice the transformation of variables which leads to the selection of Ln almost always represents a conversion of wind components to vorticity and divergence and, as noted from (2.5), is then given by the Laplacian operator.
112
FERDINAND BAER
Application of this operator in (2.19) leads to a variety of useful and simple functions. The second condition is to require the expansion functions to be orthogonal and normal over the domain in a Hermitian sense, thus,
J ziZ~dS-6i,j
(2.20)
This condition is of course reasonably simple to satisfy, since most function sets can be orthogonalized. Finally, it has been shown that the test functions when selected as the expansion functions do not lead to a significant loss of generality, and this condition is also uniformly imposed and is expressed as Z = Z. Utilization of these three conditions greatly simplifies the calculations required to perform each prediction time-step as may be noted from (2.17), since both integrals become diagonal matrices. Various functions have been utilized for the expansion (2.12), most satisfying the conditions just enumerated, with the selection depending on the degree of generality used to approximate the general system (2.18). In some of the early studies with very limited computing resources, the atmosphere was represented on a channel with rigid boundaries at fixed northern and southern latitudes some degrees away from the poles, and double Fourier series in latitude and longitude were found to be convenient expansion functions. They satisfied the boundary conditions easily, and because of the very simple addition rules of these functions, their nonlinear products could be rapidly calculated. This was particularly important in those days, when supercomputers were still in their infancy. For a typical study in this framework, see Baer [17]. For the full global domain approximated by a spherical surface over the Earth, the most obvious expansion functions which satisfy the boundary conditions are the surface spherical harmonics, and their application in a simple model was first presented by Silberman [18]. Subsequently many experiments demonstrated their efficacy, and they became the functions of choice for spectral modeling. Indeed today virtually all global spectral models employed for prediction purposes by large modeling centers use solid harmonics as their basis functions. Other global functions were tested during this period but proved to be less effective. Included among these were Hermite polynomials [19], Hough functions [20], [21], and even some nonorthogonal functions based on composited Fourier series [22], the latter chosen to reduce the computational demands of the complete surface harmonics. The surface spherical harmonics are generally constructed as the product of associated Legendre polynomials and complex exponential functions. Selecting coordinates in spherical surfaces relative to the Earth's rotation ^
NUMERICAL WEATHER PREDICTION
1 13
such that # = sin ~, where ~ is latitude and A is longitude, the normalized Legendre polynomials represent the latitudinal structures and have the following form:
P~'(#)-
[
(n - m)! (2n + 1) (n + m)!
l J2
(1 - # 2"n!
d
2 (#
n 1) (2.21)
These are polynomials of degree n with n - n 7 roots in the domain -7r/2 < ~ < 7r/2 and m roots at the poles. Together with Fourier series in longitude the solid harmonics become Y,. re(A, # ) = P,,.m(#)e i'''~
(2.22)
and these are the complex expansion functions Zm used in (2.12). All functions vanish at the poles except for the zonal ones (m = 0), and these have finite values there. The indices (17,m) define the roots of the functions and thus may be considered scaling elements; i.e., the larger the indices, the smaller the scales represented by the functions. An example is given in Fig. 1, which shows the cellular structure of the function for fixed n and various values of m. Note that the number of cells within the domain remains the same since some of the roots appear at the poles, although the structures of the cells differ. It is convenient to represent the indices as a single complex index, say c~- (n + im). Since the functions are orthogonal over their respective domains, and are also normalized, the orthogonality condition (in a Hermitian sense) for the expansion functions Y, may be expressed as 1
f [ Yo Y*, d S - 6~ ,, 47r J
(2.23)
where integration is taken over the surface of the unit sphere, the asterisk signifies complex conjugation, and ~ is the Kroneker delta. If L , - 272 (the Laplacian operator) as suggested above, substitution of Y~ for Zm in (2.19) and applying the operator in spherical coordinates yields for the eigenvalues, co = n(n + 1)
(2.24)
Thus it is evident that the solid harmonics satisfy the conditions desired for suitable expansion functions. Moreover, numerous studies indicate that most atmospheric variables B, are sufficiently smooth that when expanded in these functions, the series converges rapidly. That expansion takes the
114
FERDINAND BAER
F I G . 1. C e l l u l a r s t r u c t u r e o f s o l i d h a r m o n i c
f u n c t i o n s f o r n = 5 a n d all a l l o w e d v a l u e s o f m.
form, B,(A, #, zk, t) - ~
B,,. ~. ~-(t) Y,~(A, #)
(2.25)
ct
where Zk is some selected vertical level. The series must be truncated as indicated above at Me. The range of the index is n t> 0 whereas, because of the complex nature of Fourier series, m takes on both positive and negative values. However, from the properties of the Legendre functions, it can * o. k thus be shown that B,, , ~ , . k __ ( - ) m R---,,. reducing the computations considerably. The expansion (2.25) can be introduced into (2.18) and the resulting equations will be a set of ordinary nonlinear differential equations in time for the expansion coefficients. To better understand the details of this methodology, it is advantageous to reduce (2.18) to a simpler form by suitable approximations which still contain the elements of the technique. Consider therefore the barotropic vorticity equation given by (2.7), perhaps the simplest form of (2.18) insofar as it contains only one dependent
NUMERICAL WEATHER PREDICTION
1 15
variable, yet has the essential nonlinearity necessary to demonstrate the application of the spectral method utilizing solid harmonics, and in addition, L,, - 272 is satisfied in (2.19). If one nondimensionalizes (2.7) using the Earth's radius (a) for space and its rotation rate (~) for time and noting that the Coriolis parameter is then f = 2#, the vorticity equation (2.7) may be written in terms of the stream function (~) as 0V2~
- 2 - -O~
_
Ot
F(~)
F(~)
Ol
0~, OV2~ O~ OV2~
(2.26)
-
0A
0#
0#
0A
Indeed, ~ = B, the only variable remaining of the set B,, in (2.25) and for only one k level. Note that (2.26) contains a linear term and two quadratic nonlinear terms; these terms constitute F, what remains of F,, in (2.16). The representation in terms of expansion coefficients ~ ( t ) can be attained using (2.19) and (2.23) for the Laplacian operator, (2.25) for the expansion of ~, and (2.17) for expansion of F, to yield
Ot
F~ Yo(A, #) (2.27)
~
The final step is to multiply (2.27) by the test functions and integrate over the unit sphere. In this case the solid harmonics are themselves the test functions and to exploit orthogonality the product is with the conjugate of each harmonic. This results in the prediction equation for each of the expansion coefficients:
Ot
= 2im, c-1 f o ( t ) + F,(t)
(2.28)
F.(t)
f -
I F( >) r . (*A , J
U) ds
It should be apparent how (2.28) can be extended to involve more dependent variables and any number of levels in the vertical. However, if more variables exist in the system, these variables will be coupled nonlinearly through the coefficients F~. Suppose that the series for c~ is truncated at Me as suggested. This implies that all values of f~ for c~> Me vanish. However, on calculating the
116
FERDINAND BAER
nonlinear product F(f~) one could get coefficients Fo for c~ ~<2Me since the quadratic product of a polynomial will yield a polynomial with twice the maximum order. Thus in principle at each time-step, the number of nonvanishing coefficients could double. This complication is uniformly resolved in the spectral method by ignoring all computations for c~ > Me once a calculation has begun. Several properties of the barotropic vorticity equation (BVE) exist that make the comparison of the spectral method with the finite difference method convenient. It is seen from (2.28) that no error in the linear phase speeds of waves propagating in the fluid need be incurred using the spectral method since the linear solution can be computed exactly. Indeed, it is a simple matter to transform the variables so that the linear terms do not appear in the equations [23]. However, many studies have shown that this is not true for finite difference equations. The errors that result from linear wave dispersion can have a dramatic effect when they are included in the nonlinear wave interactions, setting up systematic errors as the time integration evolves. Moreover, for variables such as the stream function that converge rapidly, the truncated representation in spectral form converges to the true solution whereas this is not the case for finite difference truncation. As noted earlier, the BVE conserves several second-order moments including kinetic energy, enstrophy, and momentum which are generally not conserved in a truncated system. However, it has been demonstrated by both Platzman [23] and Lorenz [24] that the truncated form of the spectral equations as represented by (2.28), including the truncation of the nonlinear terms as indicated, does maintain these conservation conditions and does not require a special process as is the case for the finite difference equations [13]. This is of course not true for the more general truncated primitive equations, but research with the shallow water equations indicates that the errors are small [25]. An additional and closely connected consequence of the spectral equations is their freedom from nonlinear instability as discussed by (2.2). Since the products with c~ > Me are discarded, they cannot fold back into the domain of c~ < Me to corrupt the coefficients in that range. This allows for stable computations with no requirement for artificial viscosity, regardless of the complexity of the prediction system. The truncation of c~ = Me is somewhat intricate since two real indices are involved, n and m. Because of the structure of the expansion functions (2.21), n/> 0 and n i> I m I whereas - - l l l m a x ~ m ~<mmax. The set of all allowed indices may best be described as the intersections of integers in a grid on an n,m plane; such a plane is depicted in Fig. 2. Although the allowed points fall on an infinite triangle bounded by the lines n = +m, it is sufficient to present only the triangle for m i> 0. All sequential values of n and m beginning at the origin are generally selected to satisfy convergence
117
NUMERICAL WEATHER PREDICTION
2M
....
%
9 .
]
i//
.
~
n@N
.
.
.
.
.
..
.a = n + l m 9
.
.
.
.
.
.
.
9
.
.
.
.
.
.
.
.
/
.
.
.
/,/i.
.
.
.
.
.
.
.
M
N
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
2M
m
FIG. 2. The domain and allowable range of indices m and n for triangular and rhomboidal truncations.
requirements for the dependent variables that they represent, but a relationship between maximum values must be chosen. Two options have become dominant over the years. The first, rhomboidal truncation, has a specified maximum value of mmax--ZM and allows for all values of n ~ < l m l + M for each ]ml~< M. The corresponding figure (this configuration actually describes a parallelogram) is represented in Fig. 2 and the notation is written as, for example, R30 if M = 30. The advantage of this truncation is that each planetary wave m is represented by the same number of expansion coefficients, thereby allowing equal resolution for all waves. However, since the energy of atmospheric flow decreases rapidly with increasing wave number (m), resolution of the shorter waves may not be as important as for the longer waves. This observation leads to triangular truncation, perhaps the most popular form of truncation, in which n ~
M, a predetermined integer. Usually N is selected equal to M and this option is described as a triangle in Fig. 2 with the notation given as T30 if N = 30 for
118
FERDINAND BAER
example. In terms of scaling, it would be advantageous to truncate the expansion at a fixed scale, but it appears that in two dimensions there are two scales (in the current case these are the indices m and n). However, the eigenvalues of the expansion functions when operated on by the Laplacian represent in some sense this two-dimensional index, but as seen from (2.24) depend on only the one index n, and do so linearly. Thus truncation at fixed N seems appropriate and efficient [26]. Moreover, Baer [26] has analyzed data archives of atmospheric winds and demonstrated that for decreasing scale, the kinetic energy decreases as a function of the n index alone, essentially independent of the m index. This result is an additional justification for selecting triangular truncation. The ultimate choice for truncation is to optimize the resolution of the model in terms of the number of scales included and to minimize the computing requirements by selecting the least degrees of freedom compatible with resolution. The degrees of freedom here are clearly measured by the total number of indices allowed. If one accepts the argument that n is a measure of scale, triangular truncation seems the obvious choice to meet these conditions. Nevertheless, rhomboidal truncation has been utilized successfully in a number of research studies, but may be less efficient in its use of computing resources. Thus it is not often employed in large forecast models which are run every day. Since all prediction models are computationally intensive, despite the apparent advantages of the spectral method in reducing errors it must compete in the efficient utilization of available computing resources if it is to be selected as the method of choice. It is apparent from (2.28) that most of the computing time required involves the calculation of the coefficients F~ and much effort has gone into optimizing this calculation. The earliest attempts [18,27,28] followed the obvious procedure of substituting the expansion series (2.25) for ~b into (2.26) to represent F(~) and calculating F~ from (2.28). This results in the following relation: i
F~(t) - 2 ~
,3
~
~b3(t)~,(t)I~. j. ~, (2.29)
Is a,-Y- (ca - c~,) ' '
m3 Y3 ___Z' _ m~, Y~ 0# ' ' 0#
Yo dS
The indices ~ and 1/go over the same range as c~ which is determined by the selected truncation and the integration is over the unit sphere. The integrals I~, ~, 7 are called interaction coefficients and have exact solutions which were first presented by Gaunt [29]. Applying (2.29) in (2.28) shows that the time
NUMERICAL WEATHER PREDICTION
1 19
change of any expansion coefficient of the set c~ depends on the coupling of all the coefficients allowed in the spectral domain (see Fig. 2) and each couple is weighted by its own interaction coefficient. Since each index consists of two real numbers, the set of interaction coefficients can be as large as the largest allowed index to the sixth power. In practice, because of the simple addition rules of trigonometric functions, the integration over longitude reduces this by one order, requiring mo = m3 + m r for nonvanishing interaction coefficients. The vector of these coefficients can be stored and need be computed only once. However, the number of multiplications that must be performed at each time-step is daunting, especially as the truncation limit, say N, becomes large. Careful study of these coefficients indicates that the number of non-significant or vanishing values is negligible and thus the calculation of (2.29) cannot be reduced significantly [23]. Although (2.28) is a demonstration for using this technique with the BVE, the more complex system (2.18) can be represented identically by simply increasing the number of expansion coefficients to include additional variables. Early attempts to integrate such a system proved prohibitive on the computers available at the time, and consequently application of the procedure with production forecast models languished at the expense of the finite difference method with which computations are considerably more efficient. An additional limitation of this process, and a shortcoming not yet resolved, concerns the convergence rate of a few dependent variables included in the general set B,,. As noted, the expansion of the flow variable included in the BVE converges rapidly and truncation creates no problem for computation. This is also true for most of the variables represented in the primitive equations such as temperature, density, etc. However, liquid water and its related precipitation do not converge rapidly when expanded in a series of global functions. Indeed, if one considers a grid of points on the sphere with typical separation of a few hundred kilometers, a typical grid interval used for many years in global models, it is quite probable that only one point among a sequence of many may have a nonvanishing value of precipitation. Any attempt to represent such a near-singular function by the expansion (2.25) would lead to a serious Gibbs phenomenon where the shortest scales would contain most of the variance of the function. It is evident from the foregoing discussion on nonlinear products, as represented by the example given in (2.29), that significant truncation errors will ensue with time integration utilizing such functions. Until a physically meaningful procedure is identified and employed to smooth functions that oscillate rapidly between observations, the interaction coefficient method will be seriously compromised when using nonsmooth functions as part of the dependent variable set describing a prediction model.
120
FERDINAND BAER
A breakthrough occurred in 1970 when independently, Orszag [30] and Eliasen et al. [31] developed what is known as the t r a n s f o r m m e t h o d , an alternate procedure for calculating the coefficients F~ but yielding the same results as the interaction coefficient method (or better). Their technique involves the transformation of the integrand in (2.28) onto a special grid and solving the integral by quadrature. If the grid is selected appropriately, the integral will be evaluated exactly and at a great reduction in computing cost. In the longitudinal direction the quadrature is most conveniently done by a trapezoidal formula since it is known that
,ll
e imAi
e imAdA - -
27r
(2.30)
J j=l
The summation is taken over an equally spaced grid of points Aj, and uses twice the number of points as the maximum wave number. Since the functions in latitude are Legendre polynomials, a Gaussian quadrature is preferred. In this case the quadrature is such that
j
,1
K
H(#) d # -
-1
~
Gk(#, K)H(#k)
(2.31)
k=l
and is exact if the polynomial H is of degree 2K - 1 or less. The Gk are called Gaussian weightsmtheir definition may be found in [32] or [16]--and the grid points #k are the roots of the Legendre polynomial PK(#). Thus the appropriate grid for this calculation is all allowed values (Aj, #k) as specified. The range of the grid points is determined by the functions of the integrand in (2.28). The derivatives in F(~)msee (2.26)--must be taken before evaluating the function on the grid. Based on (2.22) and (2.21) the differentiation with A is straightforward, but the #-derivative requires more information. The Legendre polynomials can be shown to satisfy the following differential equation: (1 - #2)1/2
dP,
= boP,_
l - b , + 1 P , +l
(2.32)
where the coefficients b~ are constants and this can be used to evaluate the latitudinal derivatives. Finally, (2.19) can be used to evaluate the Laplacian. Following this procedure, F(~) may be reduced to a quadratic series over the indices (/3, 7) in terms of the complex exponential functions in longitude and the associated Legendre polynomials in latitude, both of which can be evaluated on the specified grid from known formulas. The actual calculation
NUMERICAL WEATHER PREDICTION
121
proceeds as follows. First the quadrature over longitude is taken:
Fro~(#, t)--~---~ F( O(A, It,t))e-im"a dA - - Z
J ./= 1
F( f:(A/,#, t))e-im~'x/ (2.33)
where the sum goes over the value J - - 3 M - 1 if triangular truncation is chosen since the quantity under summation in (2.33) is proportional to ei(m~+ m- -,,,~))~j. The calculation is made over those latitudes # specified from the quadrature over latitude, which is
F~(t)
21
i
Fm (#, t)P~(#) d#
K
Z i Gk(#k, K)Fm.(#k, t)P~(#k)
(2.34)
Since the polynomial under summation in (2.34) is H(#) of (2.31) and is the product of three Legendre polynomials less one order, as each has a maximum order of N, it can easily be shown that K - ( 3 N - 1)/2. An analysis of the computing requirements for (2.33) and (2.34) indicate that the maximum number of calculations is proportional to N 3, which is a remarkable reduction from the N 5 needed by the interaction coefficient method. Indeed, as N increases this disparity suggests that there is a dramatic redundancy in the interaction coefficient method since the final integrals from both methods are identical. This demonstration of the reduction in the need for computational resources with the transform method was the basis for the almost universal acceptance of the spectral method for numerical weather prediction. Additionally, when using the method with a number of variables (the primitive equations, for example) some of which have unacceptable convergence properties over the spherical domain yet contribute to the right-hand side of (2.11), their representation in terms of solid harmonics is not essential. Given their distribution on the transform grid, their input may be included directly into the quadrature formulae. Since all the forcing functions may be summed over the grid before quadrature is completed, any singularities from individual terms will be smoothed out and their effects will be minimized. Indeed, this procedure has proved highly successful. As a historical sidelight, Baer [17] first applied the transform method to the BVE over a channel with constant boundaries at latitudes away from the poles and periodic conditions in longitude. Using a latitude and longitude grid, it was possible to expand the stream function in complex exponential functions in both dimensions. Given the simplicity of the trapezoidal quadrature in both dimensions which was employed, however,
122
FERDINAND BAER
no insights were provided for application to the spherical surface. Since the transform method took hold, many advances have taken place. It was soon discovered that if both velocity components were used in the spectral expansion, and if a nonvanishing wind existed at the poles, a discontinuity existed there and the expansion does not converge. This problem was solved by using the transformation of wind to vorticity and divergence (see 2.5) and converting the prediction equations to those variables [33]. Although this procedure is not recommended for the finite difference method because an additional boundary value problem must be solved, that boundary value problem has a trivial solution in the spectral domain. Most spectral models currently in use employ these transformed variables, but other modifications to the wind components that eliminate the polar singularities have been recommended and applications with them in prediction models have been successful. Perhaps the first successful primitive equation spectral model may be attributed to Bourke [34]. Since that time, most prediction centers have adopted the method. The Canadians and Australians implemented the method in 1976, the National Meteorological Center of NOAA did so in 1980, the French in 1982, and the ECMWF in 1983. As an example of how computing power has evolved, production spectral models at ECMWF have grown in resolution from T63 in 1983 to T213 in 1998, with experiments currently running at T319. Before writing off the interaction coefficient scheme permanently, it is worthwhile to relate its potential application to computing hardware. To understand how the machine time needed to compute a model integration might be reduced, define a computing cycle to be that time required to do once all calculations which are systematically repeated to complete the entire calculation. For conventional marching problems of the type discussed here, the computing cycle is one complete time-step. On a serial machine which handles only one computation at a time, the computing cycle is the total time for that operation and can only be reduced by a faster machine. On a massively parallel processor (MPP), the computing cycle is reduced insofar as many computations can be performed simultaneously provided that no calculations depend on others made during the cycle. The time required to complete this cycle could hopefully be made to converge to the time needed by the machine to perform one computation (the machine cycle) as the number of processors is increased. From this viewpoint, the exploitation of MPPs is clearly desirable. Consider a SIMD parallel processor. This machine was one of the first MPP designs and was comparatively simple and economical, with all its processors performing the same operation. Integrations using such a machine with a very large number of processors could approach the ideal
NUMERICAL WEATHER PREDICTION
123
computing cycle discussed above. With reference to (2.29), note that each quadratic product of expansion coefficients is multiplied by an interaction coefficient (IC). Since the vector of ICs is very large, the computation on a serial processor is extremely time consuming. If the ICs are distributed each to a SIMD processor however, each processor can perform the product of the expansion coefficients times their IC provided only that the two expansion coefficients have been delivered to it. Thus all processors perform the same task. A subsequent sweep and sum over all processors yields F~(t). Given a computer containing enough processors, this step can be made to approach one machine cycle. Under such conditions, the significance of the redundancy inherent in the interaction coefficient method disappears. The process, although simple and straightforward, currently has a drawback arising from limitations in communication to and from the processors. At the end of each cycle, new values must be communicated to the processors so that they can produce a new product. Innovative programming is speeding this activity, but the data distribution currently takes significant computer time. In an unpublished experiment, the writer has performed integrations with the BVE on a CM-200 and a CM-5 at Los Alamos National Laboratories (LANL), using both the IC method and the transform method for comparison. For various truncations the IC method has run as fast as or faster than the transform method, despite the limitations of communication. Additional experiments with a baroclinic model on the CM-5 showed the IC method to be comparable to the transform method in speed. Unfortunately, MPP developments have moved in the direction of M I M D machines, and SIMD machines with a large number of processors (the largest had 64 K) are no longer available. Thus this development must be set aside for the future.
2.2
The Finite-Element Method
The finite element method shares some of its features with the finite difference method and some with the spectral method. Although it is based on a selected grid of points in the space domain, it differs from the finite difference method insofar as the functions to be represented on the grid (the dependent variables Bn) are also specified everywhere on the domain as in the spectral method. This is accomplished by representing the variables at each grid point by a basis function which is defined over the entire domain but is nonvanishing only in the domain to include the neighboring points. Thus the basis functions may be considered local to each grid point, unlike the spectral basis functions which are truly global. A different basis function could be defined for each point but, for simplicity, the basis functions
124
FERDINAND BAER
selected are generally the same for each point. Moreover, they are usually low-order polynomials. For variables which may change with time, the local basis functions are weighted by a time dependent coefficient which varies from point to point. A detailed description of the method may be found in Stang and Fix [35] and applications to the atmospheric prediction problem are given by Temperton [36]. Since each of the dependent variables is represented locally by an amplitude and a basis function, (2.12) applies to the finite-element representation provided that m is any gridpoint, Me is the sum of all gridpoints which may include all three dimensions, and Z m are the basis functions. Noting that the basis functions have zero values at all points except m, the sum for each B,, is uniquely defined everywhere in the domain. To solve the prediction system (2.13), the error (2.14) may again be written. A variety of techniques to solve this equation have been suggested [37], including point co-location where c,, is set to zero at each grid point, and a least squares minimization procedure. However, it has been found that the Galerkin technique is most successful and it is generally used. In this instance the prediction equations become identical to (2.15) using the grid point notation given above. Additionally, as with the spectral method, the test functions are generally selected as the basis functions used in the expansion. To elaborate on the application of the method, it is sufficient to apply it to the BVE (2.26) which contains the basic operations of derivatives and products used in the general prediction system (2.15). Although the basis functions may be selected arbitrarily, linear functions are frequently chosen because of their simplicity, their computational efficiency and ironically, their accuracy. The typical linear functions used are denoted as h a t functions in one dimension and pyramids in two dimensions as may be seen in Fig. 3. The one-dimensional functions are defined as
Z,,,(,X) =
I )~ - )~,,,•
I
+ if Am -< A ,< A,,,+l; - if A,,,_ l -< A -< A,,,; 0 elsewhere
(2.35)
where the intervals between points need not be equal. It is convenient to do functional operations sequentially. Thus for derivatives, if W(A) = 0~(A)/0A, expand both W and Z) in basis functions,
Z
m
W,,,Z,,,(A)- ~
m
~,,,
az,,,(~) OA
(2.36)
NUMERICAL WEATHER PREDICTION
125
t(x) 1
0 0
I
e.--
Ax
-.~
I
m-1
m
m+l
(a)
1
2
6
}3 'v
5
(b)
0.0
v/ 4
FIG. 3. Finite-element basis functions: (a) hat functions in one dimension; (b) pyramids on triangles in two dimensions [37].
and using (2.35) for the derivative, multiply by the test functions (2.35), and integrate over the domain. Applying the Galerkin procedure, this yields
J
Z Wm ZmZkdA 117
~
t~,,,
I oz,,,(A) 0A Zk dA
(2.37)
!t1
Note the formal similarity to the spectral method. If higher-order polynomials are selected for the basis functions, it may be more efficient to solve for the weights by quadrature. Assuming linear functions (2.35), the weights given by the integrals lead to a three-point formula on the left-hand side and a two-point formula on the right. In vector terms, the vector W for the derivatives at the points can be calculated by the inversion of a tridiagonal matrix of weights, which is highly efficient on modern computers. It has been shown that for a uniform grid, this solution is of fourth-order accuracy. Higher-order derivatives can be considered in an identical fashion provided that the basis functions are differentiable. This is clearly not the case for the linear functions defined by (2.35). For second order derivatives, however, as needed by the Laplacian operator (2.26), because the Galerkin technique requires multiplication by the test function before integration,
126
FERDINAND BAER
integration by parts will reduce the order of differentiation by 1 and allow the use of linear basis functions. Thus for example
Z IV., Z,,,ZkdA-Z~,,, DI
J
I 0 2Z,,, Zk dA OZk
I1 l
-- Z ~b,,, ,,,
J OZmOZk0A + const 0A
(2.38)
0A
and the solution for W involves the inversion of only a tridiagonal matrix if linear basis functions are used. Unfortunately, the accuracy here is only of second order, similar to the finite difference method. The final operation involved in the solution of (2.26) is multiplication. The terms in the BVE which have this form come from the Jacobian F(~b). Assuming that the derivatives have been calculated such that W1 = &b/0A and W2-= 0V2~b/0# and only variations in A are considered, W = W1W2 must be evaluated as part of the Jacobian. Expanding in the basis functions of (2.35) and using the Galerkin technique, the solution for the vector elements of W on the grid is
Z W,,,JZ,,,ZkdA-Z Z W,,m,WZ,,,2Jz,,,,Zm2Zkd,~(2.39) rll
!111
t112
Note again the formal similarity to the interaction coefficient method. However, whereas most interaction coefficients are nonzero, only a few integrals on the right-hand side of (2.39) are non-zero. With the definition of these operations utilizing finite elements, the BVE may be readily solved provided that a suitable time extrapolation procedure is defined. By analogy, the more complex system of primitive equations may be solved using this method. The general solution again takes the form of (2.11) although, as noted from this foregoing discussion, the matrices are defined differently. As noted, the method allows for arbitrary shapes and sizes of grid domains which may be of value for special problems. However, from a computational point of view, Staniforth [38] has demonstrated that selecting a rectangular grid in two dimensions, as compared for example to a triangular grid, has significant computational advantages. This arises from the fact that the basis functions can be written as product functions in the individual dimensions and the resulting integrations can be performed in one dimension at a time, thereby reducing the computational demands dramatically. The principal advantage of this method is that it allows for an arbitrary distribution of points to accommodate for both nonuniform lateral
NUMERICALWEATHER PREDICTION
127
boundaries and variable grid spacing over the domain. If one elects to model the atmosphere with a vertical coordinate which does not provide a continuous horizontal surface at each level, it has been noted that the spectral method is not suitable. This can come about because near the surface of the Earth the mountains create a barrier. The finite-element method is ideal for such surfaces. Indeed to predict the evolution of the oceans, the domain must have boundaries at the continental coastlines and the finite-element method has been noted as highly suitable for application in such models. But perhaps the most promising use of this method is for generating high-resolution regional forecasts embedded in a lower-resolution global domain. Since this is an area of intense development in current numerical weather prediction, its discussion is deferred to a later section. The method also shows promise as a tool to represent the vertical coordinate in models, although the finite difference method is still most frequently used there. Despite numerous studies with this method and demonstrations indicating its advantages, it has not yet achieved any measure of popularity. However, as will be discussed later, this may change.
2.3
Spherical Geodesic Grids
Various other methods have been tested to solve the atmospheric prediction problem, stimulated by the apparent shortcomings of the more popular methods. The method using spherical geodesic grids was formulated in the late 1960s [39, 40] in an effort to use as uniform a grid as possible over a spherical surface. Finite difference models gradually drifted to using a latitude-longitude grid following the period discussed by Phillips, because that representation avoided the need to transform from Earth-based coordinates to projection coordinates, a process which both created additional computational errors and consumed valuable computer time. However it is evident that in latitude-longitude coordinates the elementary spatial dimension in longitude (distance between neighboring grid points) shrinks significantly as one approaches the pole. This causes dramatic variability in the spatial dimensions of grid boxes from the pole to the equator and sets up the well-known polar problem, where waves in the polar region can propagate with higher frequency than those in lower latitudes a n d r e q u i r e a shorter model integration time-step to avoid linear instability. Using shorter time-steps to solve this problem wastes valuable computer time but does not provide an improved solution since errors from space truncation significantly overshadow those from time truncation [41]. In addition, because it is much more computer cost efficient to predict velocity components in the primitive equation model using finite difference techniques, the polar singularity problem remains. The polar
128
FERDINAND BAER
problem does not occur in the spectral method, although the spatial dimension also shrinks rapidly on approaching the pole from the equator. The polar singularity arising from using velocity components is easily solved in the spectral method by using vorticity and divergence with no significant computational cost. However, the problems arising from the representation of poorly converging functions such as liquid water and topography when using the spectral method, and sometimes denoted as spectral ringing, have no obvious solution and can create difficulties with predictions. The spherical geodesic grids approach uniformity of grid elements closely over the entire domain and models using the representation can be formulated to maintain conservation of most integral constraints. Moreover, the method allows for model representation in true scalars (vorticity and divergence) thereby avoiding polar singularities, and very efficient solvers can be employed to minimize the cost of these additionally required computations. The grid is set up in a straightforward manner. A regular icosahedron is selected and is inscribed in a sphere, the surface of which represents a horizontal surface in the atmosphere; normalizing to the unit sphere is customary. The figure has twelve vertices and their connection yields 20 equilateral triangles or faces (see Fig. 4a). Note that each vertex is surrounded by five neighbor vertices creating a pentagonal element or grid box. The connecting lines may be projected onto the spherical surface, creating spherical triangles. To expand the number of elements, each connecting segment between vertices is subdivided and all points created by the subdivisions are connected forming a new mesh with many more triangles. These new points are projected onto the unit sphere unless spherical triangles are used. Each point in this new mesh except for the original points of the icosahedron are now surrounded by six neighboring points, thus forming hexagonal elements and the result is often denoted as an icosahedral-hexagonal grid (see Fig. 4b). A systematic way to increase the grid size is to sequentially bisect all lines connecting existing points, and drawing connecting lines between all the new points. This almost quadruples the original number of triangles and resulting grid elements. The process may be continued until the desired number of grid elements is created. Figure 4c shows an expanded mesh. As an example, if the division process is carried out six times, the number of elements become 40 962 and the mean distance between elements (the distance between closest neighbor points) is on average 120 km. The distances between grid elements are not uniform over the surface nor are the element areas, but the variations are not extreme and are much more uniform than the latitude-longitude grid. More details may be found in Ringler et al. [42] and Thuburn [43].
NUMERICAL WEATHER PREDICTION
129
FIG. 4. Icosahedral grids: (a) a regular icosahedron with 12 vertices: (b) the regular icosahedron with first subdivision: (c) the regular icosahedron with further subdivisions [46].
The first atmospheric model to be solved on this grid was the BVE (2.7) by Williamson [40] and Sadourny et al. [41]. Following these experiments, subsequent efforts were made with the shallow water equations model (SWE). This system, which represents a homogeneous fluid with a free surface, allows for divergence as well as rotational flow and thus contains many of the properties of the most general model represented by the primitive equations. It is thus a convenient and illustrative model to use for a demonstration of the method applied to solve prediction equations on the spherical geodesic grid. In the notation of (2.4)-(2.7), the SWE may be represented by tendency equations for absolute vorticity, divergence, and the depth of the homogeneous fluid denoted by h.
Orl --
Ot
= - J ( ~ , q) - V .
(qVX)
Oh --= Ot
- J ( ~ , h) - V - ( h V x )
06 --= Ot
- J ( x , r/) + V . (r/Vz)) - V2(h + K)
(2.40)
where the kinetic energy K - 1 / 2 V~. To predict this system, the three dependent variables must be evaluated at each cell center on the grid and the right-hand sides of (2.40) must be evaluated. Some procedure for calculating the operators J, V. and V 2 - V . V on the grid therefore needs to be established. One first notes that by virtue of the definitions of these
130
FERDINAND BAER
operators, integration of the terms involving them over a closed spherical surface vanishes. Thus, the numerical procedure selected to represent these operators should, if possible, preserve this conservative feature. Additional conservative quantities for this predictive system may be defined and their numerical representation should also be conserved. For example, the prediction equation for potential vorticity, which for the SWE model is simply Q--rl/h, as well as for kinetic energy is
OQ
= - J ( f J , Q ) - V . (QVx)
Ot OE Ot
(2.41) = -J(g,, E) - 27. (Ey7x)
and the operators are the same as those for the basic dependent variables (2.40). Consider integration over an individual hexagonal (or pentagonal) grid box. The sum of all such integrals will then satisfy the conservation condition over the closed spherical surface. The integration of all terms on the right-hand sides of (2.40) and (2.41) over the surface of the grid box can be converted to line integrals along the perimeter of the box as follows:
I
J( g;, q) dA -
V. r/Vx dA -
}
fJ m Old l
(2.42) r/On dl
where A represents the area of the grid box, l is the box's bounding curve, O1 is an elementary distance along this curve, and On is normal to the curve. It is evident that the Laplacian operator can be calculated from the second equation of (2.42) by simply setting r/to unity. A variety of computational schemes have been devised to calculate the line integrals numerically over the grid box and, for simplicity of computation, most techniques are based on linear interpolation along appropriate lines. Given a hexagonal box for example, the line integral is approximated by a sum over all the six lines bounding the box. Values needed along the lines are interpolated from neighboring box values and derivatives are approximated by linear finite differences generally with second order accuracy. Since all values of the required dependent variables are available at any given time-step, the calculations are straightforward. For details on this process, see [40-44]. Following the application of the method to the BVE by Williamson and Sadourny et al., subsequent successful experiments involving the SWE were
NUMERICAL WEATHER PREDICTION
131
undertaken by Masuda and Ohnishi [44] and Williamson [45]. Thuburn [43] and Heikes and Randall [46] had the advantage of performing integrations on the SWE test suite (a set of different and selectively chosen initial conditions) which was provided to the community by Williamson et al. [47], and allowed direct comparisons with the same model but using the spectral method and the finite-difference method. In summary, both of these experiments suggested that the method yields results comparable to those from the finite difference integrations but somewhat inferior to those from the spectral method. It should be emphasized that the model (SWE) is driven purely by dynamics and does not include external forces that might induce spectral ringing. Finally, Ringler et al. [42] presented a fully three-dimensional model of the primitive equations using a twisted icosahedral grid in the horizontal spherical surfaces and the conventional a-coordinate in the vertical. The modification to the conventional icosahedral grid described above allows for much more uniformity of area amongst the grid boxes, especially at high resolution, although the variability of box geometry remains. The model without most external forcing--now known as the dynamical core--was tested using simple Newtonian forcing and compared satisfactorily to results from the same model run using the spectral method. Indeed, for highresolution integrations, the computer time required by the model was also competitive with the spectral method. Given these favorable results from a technique that has languished for many years, it may yet emerge as a viable competitor to the currently preferred methods.
2.4
Time Truncation and the Semi-Lagrange Method
As noted from the foregoing discussion in this section, considerable effort has been expended to find more effective methods to solve the space truncation aspect of the prediction system, but little consideration has been given to the time truncation part of the solution. It is of course necessary to meet any stability conditions that arise, and for explicit time integration schemes that limitation is known as the C F L condition (see Phillips). This distribution of effort has been justified by Robert [41], who demonstrated by linear analysis that selecting time and space increments which satisfied the stability criterion for explicit time integrations yielded solutions in which the space truncation errors were two orders of magnitude larger than the time truncation errors. Since no such condition arises for implicit schemes, it has been suggested that terms inducing high-frequency modes such as gravity waves should be integrated with an implicit time scheme whereas the explicit scheme could be used for the advective terms which give rise to the slower moving Rossby waves. Implicit schemes,
132
FERDINAND BAER
although stable, can yield significant amplitude and phase errors; thus they should be applied only to terms whose amplitude is small relative to other terms in the prediction system. As gravity waves tend to have small amplitude in the atmosphere, this splitting of integration techniques is effective when the implicit method is confined to the adjustment terms, those terms giving rise to gravity wave motions. These terms generally appear linearly in the system of prediction equations. Given the CFL constraint on the advective terms, there is a dramatic difference in the errors imposed by the spatial differencing and this suggests that a method might be found which would allow for a substantial increase in time-step with no significant loss of prediction quality provided that the calculation remains stable. If Robert's estimate is realistic, computation speeds for fixed prediction periods could be enhanced dramatically. The semiLagrange method may satisfy this condition, although this feature of the method was not of primary concern to its pioneering investigators. In the Lagrangian integration scheme, particles of fluid identified at a distribution of points at some initial time are advected with their velocity to new locations, and this new position of each particle and its properties are recorded in time. Note the contrast to the Eulerian scheme wherein particles move past fixed points in the fluid and values at those points are recorded at regular intervals (time-steps). Because particles tend to convergence, as noted by Welander [8], the pure application of the method has severe limitations for longer time integrations. In application of the method to the BVE, Wiin-Nielsen [48] found it necessary to select a new set of particles to advect periodically during his integration period to avoid the distortion that developed from the particle trajectories. Also working with the BVE and the semi-Lagrange method, Sawyer [49] made an adjustment at every time-step. After selecting a uniform grid over the domain of interest initially, he advected all the particles for one time-step and then interpolated their values to the surrounding grid points, weighting by the distance from those points and the number of particles involved. With these new values at the original grid, he repeated the procedure until the integration was concluded for the required prediction time. He noted not only that the results were comparable to those of a corresponding Eulerian integration, but he was also able to use a longer time-step. On the basis of these promising results, further experiments with the method accelerated during the following decades. An additional stimulus was the ever-increasing complexity of the prediction models with its concomitant need for more computational resources. By 1981 when Robert [50], using the SWE, successfully demonstrated the time-saving advantages of splitting the advective from adjustment terms, integrating the advective terms with the semi-Lagrange method and the adjustment terms with the semi-implicit method, the stage was set for substantial advancement and
NUMERICAL WEATHER PREDICTION
133
application of the technique. The research community had by this time converged on a preferred process to be used for semi-Lagrange integration. Once a grid of points is selected to define the positions of the initial particles together with their properties, those points became the arrival points for particles at future time-steps. In this way the grid never changes although the particles at the points at any time-step are different from the original particles at those points. Moreover, it is necessary to find a departure point for each arrival point at each time-step. The particle which arrives at a given grid point at time t is advected with its velocity from a position at the previous time-step t - At, a position which generally does not coincide with a grid point. Several approximations are thus needed. The advecting velocity of the particle from the departure point must be determined. It may be considered constant during the interval At or it may change during the time interval. Given this velocity and the grid and time increments, any departure point may be found and the values associated with the particle there may be determined by interpolation from neighboring grid points. A simple example will demonstrate this process. Consider the BVE as given in (2.7) which states that the absolute vorticity q is conserved following a particle moving in time on a trajectory through the fluid. If one remains on the particle trajectory, clearly the absolute vorticity remains the same. For simplicity assume that the particle is moving in only one dimension (s) and the velocity propelling it is U--constant. Let the spatial domain be represented by finite differences with the basic grid interval of As. Then it is evident that if a particle arrives at a grid point j A s at time t + At, it must come from the point [ j - (r + 6)]As at time t since the advecting velocity is constant and the arrival value q (jAs, t + At) = q,t([j- (r + (5)]As, t), the departure value. This is demonstrated by the trajectory (often called a characteristic) in Fig. 5. The value of r + (5 = U A t / A s where r is an integer and 0 ~<(5 ~< 1. Since the value of r/,t is not known if (5 r 0, it must be evaluated from neighboring points at which the values of r/are known. Various interpolation formulae may be applied with varying degrees of accuracy. For example, if a linear interpolation is used for r/(t at the neighbor points ( j - r ) A s and (j- r - 1)As, the value at the arrival point can be calculated from the formula r/(jAs, t + At) = r/d = (1 -- (5)r/((j"- r)As, t) + (sr/((j"- r - 1)As, t) Bates and McDonald [51] have demonstrated that the calculation is unconditionally stable for any At provided that 0 ~<(5 ~< 1. This does not preclude serious amplitude and phase errors if the time increment gets too large. Higher-order interpolation reduces computation error and has been investigated. If one uses quadratic interpolation, three points must be used
134
FERDINAND BAER
t+At
t+ At~2
.......................
At
~ (j-r-1)z~s
*
As (j-r)As
~
(r-
( j - r+ 1)z~s
1)As
~
~
jAs
FIG. 5. Particle trajectories used in the semi-Lagrange integration method. One horizontal dimension (s) is described on the abscissa and two time levels are shown on the ordinate. The departure (d), mid- (m), and arrival (a) points are presented.
to determine the departure point location and the associated values there. They are selected such that the departure point falls closest to the mid-point which is taken as ( j - r ) A s , and the two other interpolation points are at ( j - r - 1)As and ( j - r + 1)As (see Fig. 5). The value for 6 is taken about the central point with the constraint that - 1 / 2 ~<~ ~<1/2, and the formula for the value of the departure point is 1
r/(j'As, t + At) - -
6(1 + 6 ) q ( ( j - r -
1)As, t) + (1 + 6)(1 - 6)q((/"-
2 1
6(1 - 6 ) q ( ( j - r + 1)As, t)
r)As, t) (2.43)
2 This calculation is also unconditionally stable if 6 is within the defined range, and the solution is more accurate than linear interpolation. Since the models for which this method is designed are three dimensional, the example presented must be expanded to multiple dimensions. Consider next a two-dimensional grid chosen for initial particle positions. The selected grid points remain the arrival points at each new time-step, but the departure points fall within a gridbox at the previous time-step. For the simplest interpolation the representation is bilinear in the two surface dimensions and four points surrounding the departure point are used to establish the interpolating polynomial. Note
NUMERICAL WEATHER PREDICTION
135
that the velocity components in both dimensions are required to establish the position of the departure point. For biquadratic interpolation, nine points surrounding the departure point are needed. Details on this process may be found in Bates and McDonald [51]. Current usage of the method favors bicubic interpolation, which has advantageous error properties when compared to other methods and is computationally cost effective. In all cases the multidimensional solutions are stable for any choice of the time-step. Although it is convenient to demonstrate the method with a constant advecting current, it is apparent that even for the BVE the advecting current changes along the trajectory. Thus for a realistic prediction, it is essential to take this variability into account. This is done by assuming some mean value for the advecting wind at the mid-point along the trajectory in both time and space, i.e., at s = [ 1 - (r + 8)/2]As and at t + A t / 2 . However, the advecting wind is not known at this point, so an iteration must be performed, based on a first guess of the wind and an interpolation of the wind from the neighboring points as was described above for the vorticity. The iteration formula for the departure point location j - (r + ~5) is the following, where k is the iteration index: =~
U As
j + ~ 2
As, t + ~
(2.44) 2
Clearly this formula must be generalized for higher dimensions. It has been noted that only a few iterations are necessary and furthermore quadratic interpolation for the wind to the kth iteration point is adequate. Staniforth and Cote [52] have discussed more efficient methods of establishing the departure point without the need to reference the mid-point explicitly, thus reducing the computational burden of the process. For a demonstration of how both the semi-Lagrange and semi-implicit schemes may be combined in one system, consider a simple coupled system in the variables ~(x, t), and G(x, t) where x is the position vector in the domain, ~ represents Rossby type motions, and G describes gravitational motions. Assume also that the advecting wind can be determined from ~. The prediction equations are
d~
+ A G = R~(x, t)
dt (2.45) dG dt
136
FERDINAND BAER
where A and B can be arbitrary functions of space and the R functions are known forcing functions. If the domain is converted to a grid of arrival points as discussed above, let the indices d, a, and m represent the departure, arrival, and mid-points (see Fig. 5). The advection terms will be integrated by the semi-Lagrange method, the linear terms by the semi-implicit method, and the forcing terms will be evaluated at the midpoint of the trajectory. Following the procedure described above for determining the departure points by using the interpolated wind field, and establishing the dependent variables at the departure points by suitable interpolation about the departure points, the prediction formulae are:
(a(t + A t ) - - (~d(t) + 2At
Rr
t+ 2
2 (2.46)
Ga - Gd + 2At
Ra.,,,
2
(r + r
In this simple coupled case, (2.46) may be solved simultaneously to yield the solution for Ca: Ca --(1 -- AB(At)2) -1 {(1 + A B ( A t ) 2 ) r
2A(At):Ra.m + 2At(Rr
- AGd)} (2.47)
and Ga can be determined from (2.46). Staniforth and Cote [52] and Bates [53] have discussed this procedure with application to the SWE and since then, many models using the primitive equations have adopted some form of this procedure. A variety of experiments and applications with the semi-Lagrange method were reported at a workshop on semi-Lagrange methods at the ECMWF in 1995 [54] and give insight into more recent developments. The method has numerous advantages. The polar problem is easily avoided simply by shifting the coordinate frame so that arrival points near the pole in the Earth-related system appear near the equator in the rotated frame. Spectral ringing does not occur. The stability advantages over the explicit method allow for larger time-steps with negligible loss of accuracy but significant decreases in required computing resources. Unfortunately, the conservation conditions of the differential system, which have been shown to be maintained by other computational schemes, are difficult to impose in this method because the local interpolation does not allow for cancellation of terms on summation over the total domain. It has, however, been demonstrated that with substantial effort conservation constraints can
NUMERICAL WEATHER PREDICTION
137
be maintained, but at some computing cost. The penalty of this cost for routine forecasts has not yet been definitively established.
3.
Data Analysis, Assimilation, and Initialization
We have established a variety of models that can and will produce significant forecast improvements in the future. However, these models cannot arrive at any forecast without initial conditions, since they are all representations of an initial value problem. No matter how sophisticated and polished a model is, if the initial data which it is fed does not conform to reality at the time when the forecast begins, the model will forecast a situation based on its given initial state and will not necessarily describe what nature will accomplish at that specified time. The consequence of this reasonably transparent observation is that the development of a high quality initial state is as important to a forecast as the selected model. As more applicable information on the state of the atmosphere has become available over the years, the solution of this problem, which has become known as data assimilation, has become ever more complex. If, by contrast, one considers climate prediction, wherein a forecast model is integrated over a long period of time (2 weeks or more), the forcing functions that drive the system gradually take on a more important role and over time their influence on the prediction dominates that of the initial conditions. This transition has been succinctly demonstrated in a simple example by Lorenz [9], who showed that details such as the flapping of a butterfly's wings might destroy a forecast after about 2 weeks, independently of the selected model's quality. Thus, to optimize any forecast, it is necessary to have both a high-quality model and the best estimate of the initial state of each variable to be predicted over the entire spatial domain. A wide assortment of data are now available for assimilation. In-situ measurements of standard variables such as temperature, pressure, velocity, and moisture have been and are recorded over the globe (to include some oceanic regions) and yield two-dimensional surface patterns at regular intervals, on some space scales as frequently as four times a day. Remote sounders of the atmosphere containing instruments that can measure the same variables as the in-situ instruments are released regularly into the atmosphere at select locations and provide three-dimensional patterns of the measured variables. Satellites, both orbiting and stationary, provide additional remote measurements to supplement the classical measurements of temperature and moisture profiles and cloud drift, and include information on some variables that are not directly predicted by the model
138
FERDINAND BAER
but may be inferred. Anchored and drifting buoys provide some data over oceanic areas, although this may not be regular either in time or in space. Observations taken by aircraft also supplement the more standard available measurements and are very numerous. Although again highly irregular in both space and time, they do augment the four-dimensional patterns of the atmosphere developed by the assimilation techniques. It should be noted that observations of variables that are not directly model-predicted require an additional model to transform them to predicted variables. Radiation is an example of such a variable; converting it to a predicted variable (temperature) is a challenging task and remains subject to significant errors. Indeed all measurements are subject to instrument error, and these errors must be clearly identified since their inclusion in the prediction model will cause forecast errors that may well be amplified through time by systematic nonlinear transports. In addition to observations given by instrumental measurements, the prediction model is run regularly and produces forecast data, which can be combined with the observations to enhance the data set from which the assimilation for the next forecast is prepared. Although this data is only as good as the forecast from which it ensues, which in turn depends on the quality of the initial state used, it has some distinct virtues. Notably, this data is available everywhere in the three-dimensional domain of the model and at the model grid points or its coefficients. Considering that observational data is not uniformly distributed, particularly over the oceans where little data is available, the forecast model produced data is particularly valuable as a first guess in observational data sparse regions. Moreover, this data has been processed by the model and thus cannot contain frequencies that exceed those which the model can predict with stability. This is clearly not the case for observed data. It should again be emphasized that this dataset contains unique errors which must be understood and included in the assimilation process. Finally, the forecast model itself may be used in the assimilation process to blend all the available data into an optimum initial data set. Steady state transform models that convert one variable to another as needed (note the transform of radiation to temperature as discussed above) can also be incorporated into this process. The assimilation process is designed to incorporate all this diverse data and create the best initial data set based on the characteristics of the data and the most advanced theoretical understanding to optimize the procedure. Clearly this process has changed over the last 40 years. Not only was less data available at the time of Phillip's presentation, but the models, as has been noted, were much less sophisticated. The assimilation must have the capacity to incorporate intermittent as well as continuous data (for example,
NUMERICAL WEATHER PREDICTION
139
satellite data). It must present the initial fields it produces on a regular threedimensional grid to match the needs of the model for which it is designed, and it must generate this data at regular time intervals so the model can create periodic forecasts with the most recent data. The technique must ensure that the data it develops is physically consistent, a condition which may not be met by the raw data. An example of this is the strong geostrophic balance between the wind and height fields noted in large-scale atmospheric flow. The assimilation must merge the observed data with the model forecast data in a systematic manner. It must include models that transform non-predicted variables into predicted ones. Finally, it must incorporate all these features into a coherent algorithm that yields a best fit result to the available input. At present some 105 independent pieces of information are available at each assimilation time and can be used to prepare the initial data for a model. For the most comprehensive models, such as the one at E C M W F , the set of grid points for which initial data must be provided is approaching 107 elements. If one takes the extreme position that all the data should be used to find the best estimate of the initial state at each point, then at least 105 calculations must be performed at each grid point. If indeed one were to include all the forecast grid information in this process, the calculation grows by two orders of magnitude. Although the concept is ideal, given the current availability of computing resources, the practice suggested is not feasible and the time to prepare the initial state would exceed the real time of the forecast. Serious compromises are needed, such as using only regional data for the analysis at each grid point. Moreover, initialization of the assimilated data should be performed if it has not been included in the assimilation algorithm, and this adds additional computations to the product preparation. Initialization is effectively a smoothing procedure that eliminates high-frequency noise from the initial data, noise which is inherent in observational data and could lead to computational instability. From a historical perspective, the progress made to date in the area of assimilation has been nothing short of phenomenal. The earliest analyses of observations were produced by hand: the forecaster plotted the available observations on a map and analyzed the patterns by drawing contours, a procedure known as subjective analysis. As early as 1949, Panofsky [55] attempted to make the process more objective by fitting a two-dimensional polynomial to the observations in the surface over which a forecast was to be made. The largest polynomial he attempted to use was third order, and the method did not become popular. An objective method that took root slightly later using local interpolating corrections based on observations to a first-guess field was developed by Bergthorsson and D66s [56] and Cressman [57]. The use of objective analysis as applied in this method
140
FERDINAND BAER
indicates that the process can be carried out without human intervention; i.e., it can be done by computer. In this technique, predictions at model grid points are interpolated to observation points and the differences are denoted as corrections. The estimated model grid point values are then adjusted by appropriate application of these corrections. This process achieved substantial popularity and remained in use as new methodology evolved. The newer methods which attempt to incorporate all available data including the prediction model in a single process have now become standard at most prediction centers and produce impressive improvements in forecasts. The general process applied to assimilation is called statistical interpolation and is most often used in a linear framework; it can be described as follows. Let some variable which must be represented by an appropriate initial value be denoted by Z(r). This could be, for example, the wind velocity as a dynamical variable or the temperature as a physical variable, and its value is for some location (r) in the three-dimensional space under consideration, either on the model grid points or the observation points, points which are usually not co-located. If the variable describes the final analysis point, it is denoted by Za and is assumed to represent a grid point. If the variable represents an observation point, it is denoted by Zo and the location is an observation point. Most frequently a separation is made between the vertical and horizontal coordinates such that the analyses are made in horizontal surfaces at a selected number of vertical levels. This procedure was discussed in section 2. As noted above, there is always a significant quantity of data available that can be used as a first guess for the initial state, data taken from a forecast or from climatological archives. This data is called the background field and denoted by Zb. Background values are available at the analysis points (a) as well as the observation points (o). If the primary background field is given by a model forecast, the background values at the observation points (Zbo) must be interpolated from the analysis points (Zb~) at which the forecast was made. A variety of interpolation schemes have been used for this purpose, and this process was discussed in section 2.4. The linear statistical interpolation formula then represents the analysis value (to be used in a model as the initial state) as the background value corrected by the weighted sum of observation differences from their background values, denoted as observation increments or innovations. Including all analysis points and variables in the vector Za and choosing similar vectors for the background and observation variables, the initial state analysis vector can be written as Za = Zba --t- W ( Z o - Zbo)
(3.1)
NUMERICAL WEATHER PREDICTION
141
The matrix W represents a set of weights for the observation points (columns) and each row describes a variable at an analysis point. If each observation has an impact at each analysis point, (3.1) highlights the enormous size of the computational problem based on the size of these fields. Should some observations Zo include variables that are not predicted, a transform formula must be included with the weights for them. Indeed if an additional weight matrix were applied to Zba to include the effect of all the background values on each analysis point, the computational problem could be increased by two orders of magnitude! Most of the effort in the assimilation problem to date has been in optimizing the weights so that the best estimate of the analysis fields can be found. As an example of the application of (3.1) in the early modeling days, Cressman [57] used only observations a maximum of five gridpoints away from the analysis point and employed a quadratic formula based on the distance between the observation and analysis points for determining the weights. In Gandin's development of optimal interpolation [58], he used climatological values for the background points and also severely limited the range of influence of observations in the vicinity of any analysis point. He did, however, include physical relationships such as geostrophy in determining the innovations. The weights may be optimized systematically if sufficient archived data is available. Suppose that one has true values of the variables everywhere to include the analysis and observation points, and denote those values by Zt. Of course these values are not known instantaneously, but a statistical measure may be developed from many realizations of the variable from data archives. One can then develop an analysis error variance from the mean square difference between the analysis and true values, ((Za - Zt)2}, where the norm { } is taken over many available realizations. Subtracting the true values from the analysis, background, and observational values in (3.1), squaring and taking the norm over many realizations gives a formula for these analysis error variances. One would like to establish the weights in (3.1) such that these variances are minimized. This can be accomplished by differentiating the variance formula with regard to the weights and minimizing. Several reasonable assumptions are included in this process to derive a workable formula for the weights. It is assumed that there are no biases in either the background or observational data, or if they exist, they have been removed before the data is used. Moreover it is assumed that there is no correlation between the background error and the observational error; i.e., {(Zb- Z b t ) ( Z o - Zot)} = 0. One then defines the background error at observation points as Cbo = Z b o - Zot, the observation error as Co = Z o - Zot and the background error at analysis points as Cba = Z b a - Zat and the corresponding vectors of these errors
142
FERDINAND BAER
including all available observation and background values as gbo, go, and ~b,. Using these vectors, error covariance matrices can be developed as" Ebo -- {} Eo
-
background error covariance matrix at observation points, observation error covariance matrix, background error covariance matrix at analysis points.
The weights are then determined by the expression W -- (Ebo + Eo) -1 Eba
(3.2)
on the assumption that the covariance matrices are nonsingular. Details on this development may be found in [59]. Equation (3.1) can be applied once at each initial forecast time, using all the observations available at that time and an appropriate set of background values. However, it has been found that a time cycle applied to (3.1) yields a more accurate initial state. This cycle is created as follows. The analysis values are calculated for a given start time from (3.1) and then used in the prediction model for a given time increment to yield new background values. These new background values as well as new observations, if available, are then used in (3.1) to provide new analysis values which are used as the initial values in the model to forecast for the next time increment. This process is repeated until the starting forecast time is reached, at which time a forecast is made. This procedure and subsequent modifications are known as 3DVAR and were first applied with fixed weights. However, it was noted that the weights could also be updated during the cycle if data for the error covariance matrices was available or created. Various tests with this modification proved successful in providing improved initial states. Although 3DVAR has provided substantial advances in the assimilation process, it does not allow for the interaction of input data in time since it is a sequential process moving relentlessly forward. This flexibility was incorporated in a development denoted 4DVAR, in which all the information available and produced during the assimilation cycle is interactive to yield a final analysis for prediction. The procedure can be visualized as a best fit integral of (3.1) over the assimilation cycle period, incorporating all the information available during that cycle. Details of the process have been presented by Daley [60], Coutier [61], and Talagrand [62], and detailed applications to NASA data have been discussed by Cohn et al. [63]. This is clearly the method of the future, but at present its computational costs are overwhelming, and even in approximated
NUMERICAL WEATHER PREDICTION
143
applications the technique requires computing time far in excess of the time needed to make an actual forecast. One issue that is not explicitly addressed by the assimilation process described above concerns the frequencies included in the final analysis state. Since observations are not frequency filtered, using observations in the development of the initial state will include high-frequency components in the model. If the frequencies are so large that a reasonable time-step during integration will not resolve them, computational instability will ensue. This instability can be avoided by suitably balancing the high-frequency components of the analysis field, and the process is called initialization. Initialization is generally applied to the analysis data after assimilation but can be incorporated into it, and has been in use since the late 1970s. It effectively inhibits the amplitudes of the high-frequency components in the model from propagating, a role analogous to that taken by the balance equation of earlier models which inhibited the amplitudes of divergent motions from propagating in time. The initialization process is demonstrated by reverting to (2.9) as the general prediction system for global scale motions and relevant variables. For consistency of notation with this section, let Z - B. Equation (2.9) can be modified by explicitly writing the linear term AZ, which has been incorporated into the right-hand side of (2.9). Moreover, the matrix A can be diagonalized such that A = S -~ AS, and transforming the vector Z' = S Z where S is the matrix of eigenvectors of A, (2.9) becomes dZ'
+ AZ'= eF'(Z', Z')
(3.3)
dt where e is a nondimensional parameter associated with the Rossby number and is of order 10-1 for large-scale motions. The eigenvalue matrix A contains all allowed frequencies and they can be sorted by magnitude into fast and slow components. If this is done, (3.3) can be broken up into two equations, one for the fast components and the other for the slow ones: dZf dt dZ,
+ AIZ/= eFI,(Z', Z')
+ cA~Z~, = cF,(Z', Z')
dt
Z'-
Zt~
A
0
&
(3.4)
144
FERDINAND BAER
It is evident from (3.4) that the slow modes can be predicted without instability for reasonable time increments and thus need not be balanced. However, the fast modes which basically represent gravity waves will in general cause computational instability unless an extremely small time-step is used. If, however, the initial state of these modes is balanced such that successive time-steps will have vanishing tendencies, the computation will remain stable. A first approximation toward accomplishing this was to set the time derivatives of Z l to zero in the first of (3.4) and then calculating the initial Z~ from the remaining terms. When this procedure is applied to Za of (3.1), the initial state for the forecast will be both balanced and derived from the assimilation process. A variant of this fairly simple initialization technique was first postulated and successfully tested by Machenhauer [64]; concurrently Baer and Tribbia [65] presented a somewhat higher-order formula. The general procedure was formalized by Leith [66]. Numerous subsequent studies have discussed the inclusion of diabatic effects (forcing), which were not included in the pioneering applications of initialization using (3.4). Inclusion of both initialization and the latest version of assimilation in the preparation of input data for models has significantly advanced the accuracy of current numerical weather prediction models but taxes the capabilities of the fastest computers. A rewarding sidelight to the development of the assimilation technique is its application to archived data sets, known as reanalysis. This procedure provides higher quality historical records which can be used for many purposes including the creation of improved error covariance matrices.
4.
Regional Prediction Modeling
Regional prediction has been going on almost since the beginning of numerical weather prediction, but at the beginning it was the consequence of limitations in computer resources and limited knowledge concerning modeling. As modeling developed during the 1960s, most efforts focused on the global scale, and particularly when the primitive equations became popular. By 1970 interest in regional scale modeling had resurfaced as an adjunct to global scale modeling, but with the definition that such models required forecast boundary conditions from a global model, run concurrently [67]. The properties that such boundary conditions must satisfy had already been discussed theoretically by Charney [68] in 1962, and a limitedarea numerical model was presented by Shuman and Hovermale [69] in 1968. The National Meteorological Center was perhaps the first to implement an operational model in 1973. As these models evolved, they were often nested (embedded) into lower-resolution global models, and
NUMERICAL WEATHER PREDICTION
145
multi-nesting also was employed. In this environment, the higher-resolution models nested into lower-resolution models could get their boundary conditions from the parent (low-resolution) model. This transfer was usually one way and there was no feedback from the high- to low-resolution model. A variant of this process was developed by the United Kingdom Meteorological Office (UKMO) based on a concept called the unified model. That model has the capability of running on any scale from global to highly regional by simply setting input controls. Needless to say, the necessary and appropriate initial conditions must be provided for the model to make a prediction. As high-resolution (regional and/or limited-area) models have evolved, they have taken on the characteristics of the global models but with variants that were relevant both to the scale and to the particular region for which they were designed. Examples include models of hurricanes, tornadoes, convective complexes, and even isolated cumulus clouds. Some principal issues which have arisen in conjunction with this modeling effort and which have been investigated include the boundary conditions, both lateral and at the bottom surface, initial conditions, relevant forcing functions, the appropriateness of the hydrostatic approximation, and suitable vertical coordinates. During the last decade, numerous regional stand-alone models have been developed at local sites not connected with a forecast center. These models are based on several prototypes, the most popular of which are the Penn State University/NCAR model currently known as the MM5 [70], and the RAMS model developed at Colorado State University [71]. No fewer than 11 such models are discussed by Mass and Kuo [72], most associated with universities where they are used not only for forecasts but as teaching tools. This evolution is based on several factors. The development and availability at modest cost of powerful workstations capable of performing the necessary calculations with acceptable speeds was perhaps the primary spur. In addition, advances in communications allow these sites to download the required initial and boundary data from a large forecast center where predictions with both global and large regional models make such data available in real time, and the transmission is sufficiently fast that meaningful forecasts can be made with the local models. These models are mostly grid point models with local coverage over a domain spanning various sub-regions of North America and Hawaii. Their resolution ranges from 10 to 80 km grid lengths in the horizontal and from 23 to 50 vertical levels. Many of the models do not use the hydrostatic approximation, thus allowing for the propagation of sound waves. The models almost uniformly do not employ data assimilation, beginning their integrations with the data downloaded from the source and interpolated to the local model grid as needed. The common length of forecasts with these
146
FERDINAND BAER
models is 24-48 h, and they are run in real time every day. Unfortunately they are as a rule not yet validated, so their quality cannot be assessed. Future plans developed by the community of regional modelers have validation as a primary objective. The quality of the forecast product may depend more or less strongly on the input data available. Since the scale of the local model is generally smaller than the scale of the model from which the initial and boundary data are provided, information on the smallest scales of the local model is not initially available. If the domain over which the prediction is taking place has steep topography, the development of the flow may be strongly influenced by orographic effects and the coarse boundary conditions may be adequate. However, if the primary source of flow development in the regional model comes from outside the domain and the topography is relatively flat, then the initial lateral boundary conditions become crucial to a successful forecast. Validation of the forecast products will help to clarify these issues. It is anticipated that development of regional models will accelerate in the future since there are strong markets demanding their products such as the air quality industry, hydrologic needs, and trajectory models for chemical species tracking. The source data for these models comes primarily from the National Weather Service/NCEP, which runs a regional grid point model known as the eta model as well as a regional spectral model in addition to their routine global model. The eta model is so named to highlight the unique fashion by which it treats the lower boundary in the presence of mountains [73]. In the neighborhood of such topography it employs a step-function that can resolve the mountain slope with considerably more accuracy than most other techniques, particularly the spectral method. The NCEP eta model is run every 6 h with new input data at resolutions of 48 and 29 km and every 24 h with a 10 km grid. Data from these forecasts is available for the regional models as their source data and is distributed via the Internet. At this time subsets of output data appropriate to any regional model are not available, so the local model must both wait for transmission of the entire data set and sort through it to abstract what it needs. Serious issues remain regarding the imposition of boundary conditions on a regional model taken from a global model but running independently of it without feedback between the models. Several interesting innovations have been designed to overcome this shortcoming and appear to be harbingers of models to gain popularity in the future. A finite difference model with variable resolution, also known as a stretched grid model, has been developed which allows for arbitrary high resolution over a selected region of the globe while predicting over the entire globe in a unified fashion. The concept seems to have been first introduced by Staniforth and Mitchell [74]
NUMERICAL WEATHER PREDICTION
147
and more recent augmentations highlight the merits of the procedure [75]. The method involves a gradual stretching of the model grid away from the selected region where high resolution is desired (effectively the regional model domain) to a coarser resolution elsewhere over the total domain, unless additional high resolution regions are desired. The gradual stretching overcomes the errors normally incurred on the regional model boundaries where the application of noncontinuous data can set up potential instabilities requiring serious smoothing. Such smoothing clearly has a negative impact on the prediction. The recent studies utilizing the NASA/Goddard GEOS model [75] indicate highly successful integrations over a variety of stretching factors ranging from 4 to 32, with a local grid length as short as 0.25 ~ of longitude. Another alternative which includes the regional model as an integral part of a global model is the recent development of a Spectral Element Atmospheric Model (SEAM) [76]. This model offers great flexibility and possibly some advantages over other global/regional models. It utilizes the geometric properties of finite element methods. It allows for very convenient local mesh refinement and regional detail, it is ideally suited to parallel processing and minimizes communication problems amongst the processors, it is very efficient computationally, and it has no pole problems. The method for generating this model is straightforward. The spherical surfaces in the atmosphere are tiled with an arbitrary number and size of rectangular elements. The elements are generated by first inscribing a polyhedron with rectangular faces inside the sphere and mapping the surface of the polyhedron to the surface of the sphere with a gnomic projection. Each face is then arbitrarily subdivided as needed independent of uniformity yielding a set of elements that cover the surface. These elements can be made as uniform as desired (Fig. 6) or can be of high resolution in selected regions. Each element is subdivided into a two-dimensional grid array and, using finite element methodology, a set of basis functions is selected. The variables appropriate to the model are then expanded in these basis functions with time dependent coefficients. Global test functions are selected identical to the basis functions. Following finite element methodology (see section 2.2), the model equations are multiplied by a test function and integrated over the spherical surface. Careful consideration is given to the boundaries where the elements meet. The resulting equations are unique and define the tendencies of the dependent variables at each point within the element and for all elements. The vertical representation follows the N C A R / C C M 3 , using sigma coordinates and finite difference discretization. The basis functions are the Legendre cardinal functions and are used in both horizontal dimensions on the grid. These functions lend themselves to Gauss-Lobotto quadrature if the grid in each element is selected to conform
148
FERDINAND BAER
FIG. 6. The cube projected onto the sphere and subdivided [76].
to this quadrature. The integral equations defined above are thus reduced to a set of summations over the quadrature points and the simple Legendre spectral transform method is employed yielding an extremely simple finite element scheme with a diagonal mass matrix. Experiments to date with a three-dimensional dynamical core model produce highly efficient and accurate integrations which are competitive with other prediction models, but use parallel processors more effectively, since communication between processors is only required on the boundaries of each element with its immediate neighbors. This procedure shows great promise for the future.
5.
Ensemble Prediction Techniques
Despite the dramatic improvements in weather prediction described above, the limits to forecasting skill initially proposed by Lorenz [9] remain. Thus no matter how good the model is, nor how accurate the data assimilation is, errors will continue to grow during the prediction process. The character and evolution of these errors can be studied, however, and the information achieved from this assessment can be used together with any particular forecast to enhance its utility. On the assumption that the model used is perfect, except for its truncation which is essential for computation,
NUMERICAL WEATHER PREDICTION
149
one can run the model with perturbed initial states to get a measure of the model variability in addition to a mean prediction. This information should be more useful than any single forecast and is at the heart of the ensemble prediction method. Epstein [77] addressed this issue by presuming that the model used for prediction was deterministic but the initial conditions were probabilistic, thus setting up uncertainties in the initialization of the forecast. He defined a phase space that included all model variables and also defined a probability distribution for the observations used as initial conditions. This probability distribution could be predicted in time based on a conservation principle that all ensemble members remain in the prediction system. Using this distribution, he was able to convert the prediction equations for the model variables into equations for the moments of these variables. Rather than stopping at second moments, which is customary for the nonlinear prediction equations (note the quadratic nature of (1.1), for example) he suggested that truncation of third moments might give more accurate results when used as closure to the second-moment equations. His expectations were born out through experiments with very highly truncated systems. However, the process is far too computer intensive to be applied to realistic forecast models, let alone current models, even with the advances in today's computing resources. Leith [78] explored the Monte Carlo approach, recommending the use of a limited number of predictions based on random perturbations of the observed initial state to create an ensemble from which to generate an improved forecast. He used these forecasts to generate a linear regression for his final prediction product, weighting each individual forecast with a forecast error covariance. Since the true solution is unknown, generation of the covariances required the use of climatology. Although the technique showed promise in statistically yielding improved forecasts, albeit he also used only simple models for his tests and noted the excessive demand on computer time for generating his ensemble, no single forecast is best represented by climatology, and many forecasts are far from climatology. To take advantage of knowledge based on the current state of the atmosphere at the time a forecast is proposed, Hoffman and Kalnay [79] introduced the Lagged Average Forecast (LAF) technique which Kalnay identified with "errors of the day." LAF uses forecasts from previous times just before the selected forecast time as its base for an ensemble and generates the weights for the forecasts in the linear regression equation by using observations for calculating the forecast error covariances. This overcomes the need to use climatological data to determine the regression weights in the Monte Carlo process as suggested by Leith, and leads to superior forecasts. However, it is clear that the atmosphere is constantly
150
FERDINAND BAER
changing--sometimes dramatically so, even on short time scales. Thus using information from just before a prediction period may not be adequate as one proceeds into the period. This shortcoming of the LAF led Toth and Kalnay [80] to develop a new concept termed breeding, in which the model modes that grow most rapidly in response to the initial state are used to generate perturbations for the ensemble predictions. The procedure calculates the difference of an initial random perturbation run from a control run for a short integration time, determines the largest differences, and uses these as perturbations to proceed with the integration; this is done periodically until the forecast period is complete. The regions of largest difference at the end are called the bred modes and are used as perturbations for ensemble forecasts, which are subsequently combined to give the ensemble forecast. It is clear that this procedure isolates the most active regions during the forecast period using the appropriate observational data and will have the most realistic impact on the ensemble predictions. Indeed, tests indicate that improved forecasts result. Buizza and Palmer [81] working at the E C M W F have presented an alternate approach to this problem that also searches for regions of maximum growth of perturbations during the forecast period. They identify singular vectors that represent the modes of maximum growth from a solution of a linearized form of their prediction equations (in this case the E C M W F prediction model) in response to relevant initial conditions. The calculation is limited in time to a few days to focus the effect on the given initial state and to avoid the introduction of significant model errors. These modes are then used to develop perturbations to the given initial state, thus setting up conditions from which the ensemble is created. Both the singular vector method and the breeding method have been in use for production forecasts since 1992 at the E C M W F and N W S / N C E P respectively. In the latest report available [82], the N C E P model uses a 17member ensemble and the E C M W F model uses 32 members. The results of predictions show improvement of forecasts over individual forecasts. Perhaps more valuable than small changes in prediction skill is the range of that skill, which can be established by evaluating the ensemble results. If the range of the ensemble is large, less reliability will be associated with the forecast than if the range of the ensemble is small. Moreover, this assessment of the ensemble can be localized to determine which regions over the globe are more predictable than others during the forecast period. Ensemble prediction has advanced dramatically over the last decade and appears to have a promising future in the coming years of the new millennium as computers allow for more computations. The principal thrust has to date been on assessing the impact of the uncertainties in the initial
NUMERICAL WEATHER PREDICTION
151
state. However, the uncertainties in the model itself have not been lost on researchers and efforts are under way to incorporate this variability into the ensemble concept. A recent study by Krishnamurti et al. [83] gives some insight into how this might be achieved. Instead of developing an ensemble based on perturbations in the initial state, the ensemble is created by running the same initial conditions on various models. The logical next step is to combine these individual procedures into an ensemble.
6.
Conclusions
Numerical weather prediction has advanced substantially in the last 40 years, having grown through infancy in the 1960s to a mature science in the year 2000. Progress in forecasting the weather has been steady during that period and in retrospect is remarkable although to the public the evolution may appear mundane, since improvements must be evaluated on average and day-to-day achievements are barely measurable. Nevertheless, the process has grown from using very simple models with innumerable approximations over limited domains to fully comprehensive models spanning the globe. Mathematical and numerical techniques have been comprehensively explored for their applicability to the problem, and the best currently available have been exploited and implemented. Added to this are the phenomenal advances in computer technology that have been brought to bear, making the applications possible and the potential progress feasible. Advances in modeling methodology have significantly reduced computational errors inherent in any forecast procedure. Beginning from very elementary finite difference methods used in the earliest models, spectral methods have been thoroughly exploited and are extensively used at many large forecast centers for global modeling. For models requiring unusual boundary conditions, the finite element method has been demonstrated to have advantageous properties and has been carefully studied and exploited to advantage. To overcome difficulties in modeling associated with unequal grid spacing over the integration domain, geodesic grids have been explored and experiments with them have demonstrated success in resolving unique error sources. With access to advanced computing capability, model resolution has systematically increased leading to predictions on smaller scales, but with this development the potential for computational instability has increased. This has led to the introduction of semi-Lagrangian integration as an alternative to Eulerian schemes, since the former has no stability constraints. Following intensive study, the method has been systematically and successfully introduced at many prediction centers and is now considered a fundamental tool for forecast models.
152
FERDINAND BAER
In tandem with these model developments, the processing of observations for inclusion as initial conditions in models has also seen dramatic enhancement. Not only has the available database expanded significantly, but also the methodology for incorporating that data in an efficient and productive way has mushroomed. Whereas the techniques for setting initial fields in the 1960s were rather primitive, albeit computerized, it was not until the advent of the fastest computers that the more sophisticated procedures could be exploited. Today the methodology is highly advanced, using not only the total database of observations available instantaneously but over extended periods, and including a vast reservoir of archived data to establish climatological and statistical databases which can be applied in the assimilation process. The technique now employed integrates all available data, using the forecast model as a tool to best fit the data for inclusion as initial conditions for its final prediction. The major limitations to this procedure currently are in the constraints imposed by available computers. The scales for which forecast models can now make useful predictions have expanded dramatically in the past 40 years. In 1960 one could expect a forecast for 24 h of the wind and pressure fields over a somewhat reduced hemispheric domain on at best three levels in the atmosphere. Today models can provide predictions over the entire globe of the principal dynamic and thermodynamic variables as well as clouds and precipitation and at as many as 30-80 vertical levels, from the surface to well into the stratosphere. These forecasts are reasonably accurate for 3 - 5 days, and 10 day outlooks, although less accurate, are routine. Even global models can provide forecasts down to a resolution of about 100 km. For finer resolution a class of mesoscale models has been developed which provide accurate predictions to about 10 km resolution over a limited spatial domain and for a period of 1-2 days. These models can be general purpose or tuned to forecast special events such as convective complexes and hurricanes. For yet finer resolution, experimental models exist which have shown some success at forecasting events as small as tornadoes, using models with horizontal grid resolution of tens of meters. The advances seen over the last decades would not have been possible without a concurrent increase in research activity on outstanding issues. As one summarizes the impact of this active research community, one is tempted to speculate on its future activities. A topic of much significance but not elaborated on in this chapter concerns the parameterization of physical and chemical processes which affect the atmosphere and which ultimately drive its evolution. Considerable effort has been expended on these various functions, but they are still in a developmental state and need considerably more work. Since parameterization represents an approximation of nonlinear processes that cannot be resolved because of model truncation,
NUMERICAL WEATHER PREDICTION
153
effective parameterization can only be determined by innumerable model experiments to test its efficacy under as many conditions as possible. As computers increase in power, as we expect they will over the coming years, advances in the quality of parameterizations by using this power should follow and will play a significant role in forecast improvements in the coming years. Limitations in the progress of data assimilation due to similar shortcomings in computing power will also share the benefits of upcoming computational advances by demonstrated positive impacts on predictions. With regard to methods for modeling, exploration and exploitation of the semi-Lagrangian method will continue to accelerate in the immediate future. However, owing to the cyclical nature of research foci, this process will gradually abate. The intense interest in regional modeling which now exists will stimulate research in models which can provide regional as well as global forecasts in one package. Rather than embedding high-resolution models in a global model in a nested fashion, which limits two-way interactions between the models, comprehensive models will evolve which will admit any or all scales simultaneously and with complete interactions on all scales in a transparent manner. A prototype of such a model already exists and exploits the advantages of parallel processing, the expected future of computing. The spatial cells of such a model will have arbitrary dimensions and will interact with its neighbors only on its boundaries. If each cell is included in one computer processor, communications will be minimized and computations will be optimized. Finally, more highresolution regional models will be developed so that for the shortest time scales they will provide forecasts without any need for global data. Such models could include areas on the size of cities or even neighborhoods, and provide guidance on weather on the scale of hours. Imagine knowing when a snowstorm might occur, or a tornado, or even a thunderstorm. The 21st century looks very promising indeed for weather forecasting, but perhaps not yet for its modification. REFERENCES [1] Richardson, L. F. (1922). Weather Prediction by Numerical Process, Cambridge University Press, London. [2] Charney, J., Fjortoft, R., and von Neumann, J. (1950). Numerical integration of the barotropic vorticity equation. Tellus 2, 237-254. [3] Phillips, N. A. (1960). Numerical weather prediction. Advances in Computers 1, 43-91. [4] Shuman, F. G. (1989). History of numerical weather prediction at the national meteorological center. Weather and Forecasting 4, 286-296. [5] Bengtsson, L. (1999). From short-range barotropic modelling to extended-range global weather prediction: a 40-year perspective. Tellus 51A-B, 13-32. [6] Kalnay, E., Lord, S. J., and McPherson, R. D. (1998). Maturity of operational numerical weather: medium range. Monthh' Weather Review 79, 2753-2769.
154
FERDINAND BAER
[7] Courant, R., Friedrichs, K. O., and Lewy, H. (1928). Uber die partiellen differenzengleichungen der mathematischen physic. Mathematische Annalen 100, 32-74. [8] Welander, P. (1955). Studies on the general development of motion in a two-dimensional ideal fluid. Tellus 7, 141 - 156. [9] Lorenz, E. N. (1963). Deterministic nonperiodic flow. Journal of Atmospheric Science 20, 130-141. [10] Phillips, N. A. (1959). An example of non-linear computational instability, in The Atmosphere and the Sea #i Motion, Rockefeller Institute Press, pp. 501-504. [11] Haltiner, G. J. and Williams, R. T. (1980). Numerical Prediction and Dynamic Meteorology, Wiley, New York. [12] Phillips, N. A. (1957). A map projection system suitable for large-scale numerical weather prediction. Journal of the Meteorology Society of Japan 75, 262-267. [13] Arakawa, A. (1966). Computational design for long-term numerical integrations of the equations of atmospheric motion. Journal of Computational Physics 1, 119-143. [14] Arakawa, A., and Lamb, V. R. (1977). Computational design of the basic dynamical processes of the UCLA general circulation model. Methods in Computational Physics 17, 174-265. [15] Galerkin, B. (1915). Rods and plates. Series occurring in various questions concerning the elastic equilibrium of rods and plates. Vestnik Inzhenerov 19, 897-908. [16] Machenhauer, B. (1991) Spectral methods, in Numerical Methods in Atmospheric Models 1, European Center for Medium-Range Weather Forecasts, Reading, UK, pp. 3-86. [17] Baer, F. (1961). The extended numerical integration of a simple barotropic model. Journal of Meteorology, 18, 319-339. [18] Silberman, J. (1954). Planetary waves in the atmosphere. Journal of Meteorology 11, 2734. [19] Abramowitz, M., and Stegun, I. A. (1964). Handbook of Mathematical Functions, Applied Mathematics Series 55, US Department of Standards, Washington DC. [20] Kasahara, A. (1977). Numerical integration of the global barotropic primitive equations with Hough harmonic expansion. Journal of Atmospheric Science 34, 687-701. [21] Longuet-Higgins, M. S. (1968). The eigenfunctions of Laplaces tidal equations over the sphere. Philosophical Transactions o[ the Royal Society of London A, 262, 511-607. [22] Robert, A. J. (1966). The integration of a low order spectral form of the primitive meteorological equations. Journal of the Meteorology SocieO" of Japan 44, 237-244. [23] Platzman, G. W. (1960). The spectral form of the vorticity equation. Journal of Meteorology 17, 635-644. [24] Lorenz, E. N. (1960). Maximum simplification of the dynamic equations. Tellus 12, 243254. [25] Weigle, W. F. (1972). Energy conservation and the truncated spectral form of the primitive equations for a one-layer fluid. PhD thesis, University of Michigan. [26] Baer, F. (1972). An alternate scale representation of atmospheric energy spectra. Journal of Atmospheric Science 29, 649-664. [27] Baer, F. and Platzman, G. W. (1961). A procedure for numerical integration of the spectral vorticity equation. Journal of Meteorology 18, 393-401. [28] Baer, F. (1964). Integration with the spectral vorticity equation. Journal of Atmospheric Science 21,260-276. [29] Gaunt, J. A. (1929). The triplets of helium. Transactions o['the Royal SocieO' of London A 228, 151-196. [30] Orszag, S. A. (1970). Transform method for calculation of vector-coupled sums: Application to the spectral form of the vorticity equation. Journal of Atmospheric Science 27, 890-895.
NUMERICAL WEATHER PREDICTION
155
[31] Eliasen, E., Machenhauer, B., and Rasmussen, E. (1970). On a numerical method for integration of the hydrodynamical equations with a spectral representation of the horizontal fields. Report No. 2, Institut for teoretisk meteorologi, University of Copenhagen. [32] Krylov, V. I. (1962). Approximate Calculation of Integrals. Macmillan, New York. [33] Merilees, P. E. (1968). The equations of motion in spectral form. Journal of Atmospheric Science 25, 736-743. [34] Bourke, W. (1974). A multi-level spectral model. I. Formulation and hemispheric integrations. Monthly Weather Review 102, 687-701. [35] Strang, G., and Fix, G. J. (1973). An Anah'sis of the.finite Element Method, Prentice-Hall, New York. [36] Temperton, C. (1991). Finite element methods, in Numerical Methods in Atmospheric Models 1, European Center for Medium-Range Weather Forecasts, Reading, UK, pp. 103-118. [37] Cullen, M. J. P. (1979). The finite element method, in Numerical Methods Used in Atmospheric Models II, World Meteorological Organization, Geneva, pp. 302-337. [38] Staniforth, A. (1987). Review: Formulating efficient finite-element codes for flows in regular domains. International Journal o/ Numerical Methods." Fluids 7, 1-16. [39] Williamson, D. L. (1968). Integration of the barotropic vorticity equation on a spherical geodesic grid. Tellus 20, 642-653. [40] Sadourny, R. A., Arakawa, A., and Mintz, Y. (1968). Integration of the nondivergent barotropic vorticity equation with an icosahedral-hexagonal grid for the sphere. Monthly Weather Review 96, 351 - 356. [41] Robert, A. J. (1979). The semi-implicit method, in Numerical Methods Used in Atmospheric Models II, World Meteorological Organization, Geneva, pp. 419-437. [42] Ringler, T. D., Heikes, R. P., and Randall, D. A. (1999). Modeling the atmospheric general circulation using a spherical geodesic grid: A new class of dynamical cores. Department of Meteorology, Colorado State University, Fort Collins, CO. [43] Thuburn, J. (1997). A PV-based shallow-water model on a hexagonal-icosahedral grid. Monthly Weather Review 125, 2328-2347. [44] Masuda, Y., and Ohnishi, H. (1986). An integration scheme of the primitive equation model with an icosahedral-hexagonal grid system and its application to the shallow water equations. Short- and medium-range numerical weather prediction. Journal of the Meteorology Society of Japan, Special volume, 317-326. [45] Williamson, D. L. (1970). Integration of the primitive barotropic model over a spherical geodesic grid. Monthly Weather Review 98, 512-520. [46] Heikes, R. P., and Randall, D. A. (1995). Numerical integration of the shallow-water equations on a twisted icosahedral grid. Part I: Basic design and results of tests. Monthly Weather Review 123, 1862-1887. [47] Williamson, D. et al. (1992). A standard test set for numerical approximations to the shallow water equations in spherical geometry. Journal of Computational Physics 102, 211-224. [48] Wiin-Nielsen, A. (1959). On the application of trajectory methods in numerical forecasting. Tellus 11, 180-196. [49] Sawyer, J. S. (1963). A semi-Lagrangian method of solving the vorticity advection equation. Tellus 15, 336-342. [50] Robert, A. J. (1981). A stable numerical integration scheme for the primitive meteorological equations. Atmosphere- Ocean 19, 35- 46. [51] Bates, J. R. and McDonald, A. (1982). Multiply upstream, semi-Lagrangian advective schemes: analysis and application to a multi-level primitive equation model. Monthly Weather Review 110, 1831-1842.
156
FERDINAND BAER
[52] Staniforth, A. and Cote, J. (1991). Semi-Lagrangian integration schemes for atmospheric models--a review. Monthh' Weather Revie~r 119, 2206-2223. [53] Bates, J. R. (1984). An efficient semi-Lagrangian and alternating direction implicit method for integrating the shallow water equations. Monthly Weather Review 112, 2033-2047. [54] European Center for medium-range weather forecasts, Reading, UK (1996). Proceedings of a Workshop on Semi-Lagrangian methods, 6-8 November 1995. [55] Panofsky, H. A. (1949). Objective weather-map analysis. Journal of Meteorology 6, 386392. [56] Bergthorsson, P., and D66s, B. (1955). Numerical weather map analysis. Tellus 7, 329-340. [57] Cressman, G. P. (1959). An operational objective analysis system. Monthly Weather Review 87, 367-374. [58] Gandin, L. (1963). Objective Anah'sL~of'Meteorological Fields, Guidrometeorizdat. Izdaf., Leningrad (English translation: Israel Program for Scientific translation, Jerusalem, 1965). [59] Daley, R. (1991). Atmospheric Data AnalysLs,Cambridge University Press, New York. [60] Daley, R. (1997). Atmospheric data assimilation. Journal o/the Meteorology Society of Japan 75, 319-329. [61] Coutier, P. (1997). Variational methods. Journal of" the Meteorology Society of Japan TS, 211-218. [62] Talagrand, O. (1997). Assimilation of observations, an introduction. Journal of the Meteorology Society of Japan 75, 191 - 209. [63] Cohn, S. E. et al. (1998). Assessing the effects of data selection with the DAO physicalspace statistical analysis system. Monthh' Weather Review 126, 2913-2926. [64] Machenhauer, B. (1977). On the dynamics of gravity oscillations in a shallow water model with applications to normal mode initialization. Contributions to Atmospheric Physics 50, 253-271. [65] Baer, F., and Tribbia, J. J. (1977). On complete filtering of gravity modes through nonlinear initialization. Monthh' Weather Review 105, 1536-1539. [66] Leith C. (1980). Non-linear normal mode initialization and quasi-geostrophic theory. Journal of Atmospheric Science 37, 958-968. [67] Mesinger, F. (1997). Dynamics of limited-area models: formulation and numerical methods. Meteorology and Atmospheric Physics 63, 3-14. [68] Charney, J. (1962). Integration of the primitive and balance equations. Proceedings of the International Symposium on Numerical Weather Prediction, Tokyo, Japan Meteorology Agency, pp. 131 - 152. [69] Shuman, F. G. and Hovermale, J. H. (1968). An operational six-layer primitive equation model. Journal of Applied Meteorology 7, 525-547. [70] Warner, T. W. and Seaman, N. L. (1990). A real-time mesoscale numerical weather prediction system used for research, teaching and public service at the Pennsylvania State University. Bulletin o/the American Meteorological Society 71, 792-805. [71] Cotton, W. R., Thompson, G., and Mielke Jr., P. W. (1994). Real-time mesoscale prediction on workstations. Bulletin o/ the American Meteorological Society 75, 349-362. [72] Mass, C. F. and Kuo, Y-H. (1998). Regional real-time numerical weather prediction: current status and future potential. Bulletin o1 the American Meteorological Society 79, 253-263. [73] Mesinger, F., Janjic, Z. I., Nickovic, S., Gavrilov, D.. and Deaven, D. G. (1988). The stepmountain coordinate: model description and performance for cases of alpine lee cyclogenesis and for a case of an Appalachian redevelopment. Monthly Weather Review 116, 1493-1518. [74] Staniforth, A. and Mitchell, H. (1978). A variable resolution finite element technique for regional forecasting with primitive equations. Monthly Weather Review 106, 439-447.
NUMERICAL WEATHER PREDICTION
157
[75] Fox-Rabinovitz, M. S., Stenchikov, G. L., Suarez, M. J., and Takacs, L. L. (1997). A finite-difference GCM dynamical core with a variation-resolution stretched grid. Monthly Weather Review 125, 2943-2968. [76] Taylor, M., Tribbia, J. J., and Iskandarani, M. (1997). The spectral element method for the shallow water equations on the sphere. Journal of Computational Physics 130, 92-108. [77] Epstein, E. S. (1969). Stochastic dynamic predictions. Tellus 21,739-759. [78] Leith, C. E. (1974). Theoretical skill of Monte Carlo forecasts. Monthh' Weather Review 102, 409-418. [79] Hoffman, R. N. and Kalnay, E. (1983). Lagged average forecasting, an alternative to Monte Carlo forecasting. Tellus 35A, 100-118. [80] Toth, Z. and Kalnay, E. (1993). Ensemble forecasting at NMC: the generation of perturbations. Bulleth7 of the American Meteorological Society 74, 2317-2330. [81] Buizza, R. and Palmer, T. N. (1995). The singular vector structure of the atmospheric general circulation. Journal o[" Atmospheric Science 52, 1434-1456. [82] European Center for Medium-Range Weather Forecasts, Reading, UK (1996). Proceedings of a Workshop on Meteorological Operational Systems, 13-17 November 1995. [83] Krishnamurti, T. N. et al. (1999). Improved weather and seasonal climate forecasts from multimodel superensemble. Science 285, 1548-1550.
This Page Intentionally Left Blank
Machine Translation SERGEI NIRENBURG Computing Research Laboratory New Mexico State University Las Cruces, NM 88003 USA sergei@crl, n msu.edu
YORICK WILKS Department of Computer Science The University of Sheffield Regent Court, 211 Portobello Street Sheffield, S 1 4DP y. [email protected], ac. uk
Abstract This paper gives an intellectual overview of the field of machine translation of natural languages (MT). Now 50 years old, this field is one of the oldest nonnumerical applications of computers. Over the years, M T has been a focus of investigations by linguists, psychologists, philosophers, computer scientists, and engineers. It is not an exaggeration to state that early work on MT contributed very significantly to the development of such fields as computational linguistics, artificial intelligence, and application-oriented natural language processing. Advances in Computers has followed the development of MT closely; a seminal article on M T by Yehoshua Bar Hillel was published in its very first issue. This paper surveys the history of MT research and development, briefly describes the crucial issues in MT, highlights some of the latest applications of MT, and assesses its current status.
1. 2. 3. 4.
5. 6.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Is Machine Translation Impossible? . . . . . . . . . . . . . . . . . . . . . . . . W h a t Sort of Computation is MT? . . . . . . . . . . . . . . . . . . . . . . . . Main Paradigms for M T - - D i v e r s e Strategies for Solving or Neutralizing the Complexity of Language Use . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Knowledge Sources for M T . . . . . . . . . . . . . . . . . . . . . . . . . The Evolution of MT Over its 50-year History . . . . . . . . . . . . . . . . . . . 5.1 M T "Generations". . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Choices and Arguments For and Against MT Paradigms . . . . . . . . . . . . . 6.1 Putting a Natural Language in the Center . . . . . . . . . . . . . . . . . .
ADVANCES IN COMPUTERS, VOL. 52 ISBN 0-12-012152-2
159
160 161 163 165 168 169 172 173 173
Copyright !~ 2000 by Academic Press All rights of reproduction in any form reserved.
160
7.
8. 9.
SERGEI NIRENBURG AND YORICK WILKS
6.2 Can One Avoid T r e a t m e n t of Meaning? . . . . . . . . . . . . . . . . . . . 6.3 More " D i s p r o o f s " for K B M T ? . . . . . . . . . . . . . . . . . . . . . . . 6.4 Ideal Interlingua and its Practical Realizations . . . . . . . . . . . . . . . . 6.5 Is an Interlingua a N a t u r a l Language in Disguise? . . . . . . . . . . . . . . 6.6 The Ideal and the Reality of Statistical M T . . . . . . . . . . . . . . . . . 6.7 Statistical M T Does Not Solve All Hard Problems . . . . . . . . . . . . . . M T in the Real World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Varying the Acceptability Threshold . . . . . . . . . . . . . . . . . . . . . 7.2 Partial A u t o m a t i o n in M T . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Restricting the A m b i g u i t y of Source Text . . . . . . . . . . . . . . . . . . 7.4 Statistical M T and the Economics of C o r p o r a . . . . . . . . . . . . . . . . 7.5 Novel Applications I: Translation of Spoken Language . . . . . . . . . . . 7.6 Novel Applications II: Multi-Engine M T . . . . . . . . . . . . . . . . . . 7.7 Novel Applications III: M T and Other Kinds of Text Processing . . . . . . . The Current Situation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.
174 175 176 177 178 179 180 180 181 181 182 182 183 184 185 186 187
Introduction
Machine translation of natural languages, commonly known as MT, has multiple personalities. First of all, it is a venerable scientific enterprise, a component of the larger area of studies concerned with the studies of human language understanding capacity. Indeed, computer modeling of thought processes, memory, and knowledge is an important component of certain areas of linguistics, philosophy, psychology, neuroscience, and the field of artificial intelligence (AI) within computer science. MT promises the practitioners of these sciences empirical results that could be used for corroboration or refutation of a variety of hypotheses and theories. But MT is also a technological challenge of the first order. It offers an opportunity for software designers and engineers to dabble in constructing very complex and large-scale non-numerical systems and an opportunity for field computational linguists to test their understanding of the syntax and semantics of a variety of languages by encoding this vast, though rarely comprehensive, knowledge into a form suitable for processing by computer programs. To complicate things even further, MT has a strong connection with the needs of modern societies. It has come to be understood as an economic necessity, considering that the growth of international communication keeps intensifying both at government level (European Union, NAFTA, GATT) and in business and commerce (exporters need product documentation in the languages of the countries where their products are marketed). The demand for faster and cheaper translations is strong--indeed governments,
MACHINE TRANSLATION
161
international agencies, and industry keep supporting MT research and development, even though the return on their investment is, though not negligible, still rather modest. Finally, MT is a gold mine for an amateur sociologist or a science reporter. This is one of the liveliest areas of computer science; its broad objectives can be readily understood by intelligent lay people. The practitioners of the field are rich sources of entertaining copy, a s - knowingly or unknowingly--they keep staging a soap opera of inflated claims leading to dramatic disappointments, eminently quotable public debates, and internecine fighting of almost religious proportions. In this paper we will attempt to give the reader a coherent, if necessarily brief and incomplete, picture of the many ~'faces" of MT.
2.
Is Machine Translation Impossible?
Many firmly established technologies and scientific theories, such as flying machines that are heavier than air or general relativity, were at some stage in their development targets for purported disproofs. MT has been at this stage during its whole existence. Its established status as a technology is not new: the US Federal Translation Division at Dayton, Ohio has been providing translations of Russian scientific publications to American scientists for decades at a rate of hundreds of thousands of words a day. The quality has not been high, but it is sufficient for the scientists' purposes, given that they know their discipline intimately. More recently, the European Commission has been using machine translation for rough but adequate translations of an increasing proportion of its internal memoranda between English and French, its two major languages. (Both these systems are based on SYSTRAN, the world's oldest major MT system.) So much for the reality of MT, and we shall produce empirical evidence in support of these claims later. The proofs of the impossibility of MT ranged from philosophical considerations, often believed to derive from the anti-logical views of Wittgenstein--who came to believe that there could be no abstract representation of the content of language--to more linguistic demonstrations. Among developed philosophical views, critics of MT often refer to the "indeterminacy of translation" claims of Quine, the greatest living American philosopher. In the skeptical tradition of Hume, who did not question that there were causes and effects but asserted the connection between them could never be proved, Quine argued that one could never prove the equivalence of two sentences in different languages. He has been
162
SERGEI NIRENBURG AND YORICK WILKS
widely misunderstood on this point, since he did not intend to argue that there were no translations. Quine wrote ... manuals for translating one language into another can be set up in divergent ways, all compatible with the totality of speech dispositions, yet incompatible with one another. In countless places they will diverge in giving, as their respective translations of a sentence of the one language, sentences of the other language which stand to each other in no plausible sort of equivalence however loose. [1] His main point is not that it is impossible to t r a n s l a t e - - h u m a n s do it all the t i m e - - b u t that what is translated cannot reliably be shown to be the "same meaning." Indeed, he was not attacking the notion of translation at all, but that of meaning in natural language. Since the task of machine translation is to reproduce what people do when they translate, at least as regards the output, Quine's position really poses no problem. As he wrote later: The critique of meaning leveled by my thesis of the indeterminacy of translation is meant to clear away misconceptions, but the result is not nihilism. Translation is and remains indispensable. Indeterminacy means not that there is no acceptable translation but that there are many. A good manual of translation fits all checkpoints of verbal behavior, and what does not surface at any checkpoint can do no harm. [2] Other theoretical positions that would render M T impossible include that of Whorf, the anthropologist, who argued that all meaning is culturally dependent, which implies that there cannot be translation between cultures with very different belief systems, norms, and values. W h o r f explicitly extended this argument to scientific cultures, arguing that there could be no full translation of a sentence of physics into one of chemistry that described the "same situation," a proposition that some readers might find perfectly acceptable. His own position was somewhat undermined by his printing, at the end of his best-known book, a sentence-by-sentence translation into English of precisely the kind of American Indian language whose culture he deemed too remote for translation! The most telling and direct denial of M T - - o r more precisely of "fully automatic high quality M T " - - c a m e from the Israeli philosopher Bar Hillel [3,4] (the latter paper was the one published in Volume 1 of Advances in Computers). Bar Hillel's central claim was that fully automatic high-quality machine translation was unattainable with the technology of the times that could not guarantee word sense choice. His now-famous example was the following: Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy.
MACHINE TRANSLATION
163
The word pen in the emphasized sentence has at least two m e a n i n g s - - a writing pen and a playpen. Bar Hillel's conclusion was that "no existing or imaginable program will enable an electronic computer to determine that the word pen in the given sentence within the given context has the second of the above meanings." Later, Bar Hillel changed his position (see below) and conceded that M T had made great advances, but that this was not due to an improved theoretical basis, but a set of heuristics that seemed to aid programs: MT research should restrict itself, in my opinion, to the development of what I called before "bags of tricks" and follow the general linguistic research only to such a degree as is necessary without losing itself in Utopian ideas. [4] It is interesting to note that Bar Hillel's position was not that far from Whorf's: both argued for the need for cultural knowledge in the translation process, and Bar Hillel, too, used scientific examples to make his point. In his claim that the computer must have some internal knowledge to do MT, Bar Hillel was very close to the defining position of the then nascent field of artificial intelligence and its program to model any distinctively human capacity that requires intelligence, which certainly would include translation. A sub-theme in AI is that such modeling requires "coded knowledge" of the world. The difference between Bar Hillel's position and that of AI is that he thought it could not be done. An important characteristic of Bar Hillel's position is that it straddles the often competing views of MT as science and as technology by suggesting that any means for resolving difficulties in language analysis and generation are appropriate, even if their theoretical nature is not yet understood by science. While paying homage to linguistics, most developers of MT systems in fact do not really use any of its findings in their system-building activities.
3.
What Sort of Computation is MT?
We used the term "heuristics" above in connection with MT, and that term has often been taken as a tell-tale sign of a problem being a part of artificial intelligence, at least since John McCarthy declared AI to be the "science of heuristics" [5]. The term is classically opposed in computer science to algorithms which, in the strong sense of the term, are computations that can be proved to terminate and produce provably correct results. Computations involving natural language are bad candidates for strong algorithms because there is no general agreement on the data to be "proved" by such algorithms: there is no general agreement on whether strings of words, in general, are or are not sentences of a given language.
164
SERGEI NIRENBURG AND YORICK WILKS
Even if Quine's arguments are not taken into account, there are no proofs that a given sentence in one language is a correct translation of a given sentence in another. There is a tradition in computational linguistics that still seeks algorithms, in a strong sense: namely the one descending from Chomsky's linguistic theories [6] and his original claim (since much modified) that the sentences of a language were a decidable set, in that an algorithm could be written to decide whether or not a given sentence string belonged to the set. Linguistics-inspired theories of MT continue to be put forward, but never produce any substantial or realistic coverage of languages in empirical tests. The more interesting issue is the relationship of MT to its other natural parent discipline, AI. The main question is whether, in order to perform successful MT, one needs to have a system which "understands" language to the same degree that is needed for intelligent robots. A tempting view is that it might be not necessary for the purposes of MT to understand the message in the text "deeply." Many inferences necessary, for instance, for robotic response and action, are spurious for machine translation. Indeed, ambiguities of sense and reference can sometimes be "benign" in MT. Consider as an example the English sentence: The soldiers fired at the women and I saw several of them fall. Do you see an ambiguity in it? Yes, '~them" can be construed to refer either to "women" or to "soldiers." Of course, people have enough knowledge to make the connection between the event of shooting and its typical consequences. A simple machine translation program may not have such knowledge. But when we translate the sentence into Russian, we can just compose it of the translations of the component words directly (Fig. 1). The problem is that when the sentence is translated into Spanish, the pronoun that translates "them" must be marked for gender. The correct gender can be only determined if we indeed know whether the soldiers or women fell. In the absence of evidence about any unusual events, we must conclude, based on our knowledge of the consequences of shooting, that it was some women who fell. (This is an example of the so-called default reasoning, where we assume that had it been, in fact, the soldiers who fell, a special mention of this fact would have been present in the text because it is an unexpected situation and the authors could not rely on the readers' world knowledge.) Worse, the machine translation system must be taught to deal with this kind of problem. yna~H
COJIJElaTbl
BblcTpeJIHJIH
B ~eHUU4H,
H R
ym~mAe,'l,
KaK
HeCKOJlbKO
143 HHX
Soldiers
fired
at women
and I
saw
how
several
of them
FIG. 1. Word-for-word translation into Russian.
fell
MACHINE TRANSLATION
165
The debate about the depth of analysis will no doubt continue and different recommendations will be produced depending on the objectives of particular projects. It is important to realize that there are genuinely different theories of how to do MT, and only practical experiment will show which ones are right.
11
Main Paradigms for MT---Diverse Strategies for Solving or Neutralizing the Complexity of Language Use
Traditionally, MT systems were classified into three major types--direct, transfer, and interlingua. 9Direct MT systems rely on finding direct correspondences between source and target lexical units, and have been criticized for their ad hoc character--the apparent impossibility of generalizing the translation rules. By the 1970s, they had lost their scientific standing, though, notably, not their technological impact. 9 Transfer systems involve a varying measure of target languageindependent analysis of the source language. This analysis is usually syntactic, and its result allows substituting source language lexical units by target language lexical units, in context. Bilingual lexicons connect the lexical units of the source and target languages. Transfer systems permit taking into account syntactic sentence constituents in which lexical units appear.
9 In interlingual systems the source language and the target language are never in direct contact. The processing in such systems has traditionally been understood to involve two major stages: (i) representing the meaning of a source language text in an artificial unambiguous formal language, the interlingua, and then (ii) expressing this meaning using the lexical units and syntactic constructions of the target language. Few large-scale interlingual systems have been fully implemented because of the very high complexity (both theoretical and empirical) of extracting a "deep" meaning from a natural language text. In practice, essential differences between transfer-based and knowledgebased machine translation are still a subject of debate. The major distinction between the interlingua- and transfer-based systems is, in fact, not so much the presence or absence of a bilingual lexicon but rather the attitude toward comprehensive analysis of meaning. MT systems that do not strongly rely on "deep" text understanding tend to prefer the transfer paradigm. Different transfer-based systems perform transfer at different levels--from
166
SERGEI NIRENBURG AND YORICK WILKS
simple "bracketing" of sentence constituents to passing complex semantic properties of the input across the transfer link. A recent trend in transfer-based MT is to downplay the need for structural transfer, that is, the stage of transforming standard syntactic structures of the source language into the corresponding target language structures. This trend is interlingual in nature. Transfer-based systems can also deal with lexical semantics; the language in which the meanings of source language lexical units are expressed is often the target language itself. This can be implemented through a bilingual lexicon featuring disambiguation information. The recent movement toward "deep transfer" (see, for example, work on Eurotra [7]) is, in essence, a movement toward an interlingual architecture. Distinctions between the transfer and the interlingua approaches are best drawn at a theoretical level. When practical systems are built, much of the work will be the same for both approaches, as a number of compromises are made in order to contain the amount of work necessary in preparing grammars, lexicons, and processing rules. Some source language lexical units in an interlingua environment can, in fact, be treated in a transfer-like manner. Conversely, in those, very frequent, cases when there is no possibility of direct transfer of a lexical unit or a syntactic structure between two languages, a transfer system would benefit by trying to express the meaning of such units or structures in an interlingua-like fashion. Most AI-oriented MT systems have been interlingual and have come to be known as knowledge-based MT (or KBMT) systems. Examples of interlingua systems that are not knowledge-based are CETA [8], D L T [9], and Rosetta [10]. The main difference between such systems and K B M T ones is the expected depth of source language analysis and the reliance of K B M T systems on explicit representation of world knowledge. The early roots of this movement were in Britain, with the work of Margaret Masterman and the Cambridge Language Research Unit, though similar contemporary developments can be traced in the USSR. The interlingua was assumed to be highly semantic/content-directed and not identical either with any interlingua of the sort that formal logic was traditionally taken to provide, or with the syntax-driven interlingua of CETA. Development of this style of work became significant within AI in the US during the 1970s, particularly with the work of Schank and his school, and the work of Wilks. Schank's [11] M A R G I E system took as input small English sentences, translated them into a semantic-network-based interlingua for verbs called Conceptual Dependency, massaged those structures with inference rules and gave output in German. In Wilks' system, called
MACHINE TRANSLATION
167
Preference Semantics, there was also an interlingua (based on 80 primitives in tree and network structures), between input in English and output in French, but there the emphasis was less on the nature of the representation than on the distinctive coherence algorithms ("preferences") for selecting the appropriate representation from among candidates. The reality and ubiquity of word-sense and structural ambiguity were a driving force behind that system. Both systems shared the assumption that traditional syntacticbased methods would not be able to solve that class of problems, and neither had separate syntactic components; the work of a syntactic component was performed under a semantic description. Again, both used MT only as a test-bed or application of more general claims about AI and natural language processing. Another strand of MT work done in close association with AI has been that of Martin Kay at XEROX-PARC [11]. Kay emphasized the role of morphology, of machine-aided translation, and of the structure of dictionaries on MT, but his most recent theme has been that of functional grammar, a formalism for syntax rules, usable in both analysis and generation, and hence part of the extensions of the linguistically-based movement in MT that began with GETA and TAUM, though now once again a subject of independent interest in AI. The beginning of the revival of MT as a scientific discipline and an application of linguistic and computer technology must, however, be traced to the establishment of the Eurotra project and the MT efforts in Japan. Begun in 1978 and incorporating earlier European influences, such as the CETA project at the University of Grenoble, Eurotra was an ambitious, well-supported, project aimed at providing MT capability among all official EEC languages. At its peak, Eurotra employed about 160 researchers in a number of national groups and at the project headquarters in Luxembourg. The latest generation of Japanese MT efforts started around 1980, supported both by the government and industry, most notably with the Mu project at Kyoto University laying the foundation for the extensive Japanese industrial MT projects of the 1980s. Further KBMT experiments were conducted by Jaime Carbonell, Richard Cullingford, and Anatole Gershman at Yale University [13] and Sergei Nirenburg, Victor Raskin, and Allen Tucker at Colgate University [14]. Larger-scale development work followed, and a number of pilot KBMT systems have been implemented. The major efforts included ATLAS-II [15], PIVOT [16], ULTRA [17], KBMT-89 [18], and Pangloss. Some other systems (e.g., HICATS/JE [19] use some features of the knowledge-based approach (such as semantic primitives for organizing the dictionary structure) while still maintaining the overall transfer architecture.
168
SERGEI NIRENBURG AND YORICK WILKS
Recently, the traditional classification of MT paradigms has been modified by the resurgence of enthusiasm for the statistics-based approach to the problem. ~ Statistical MT explicitly rejects any kind of representation, whether AI or linguistics-oriented, whether morphological, syntactic, or semantic. It relies on the availability of very large, aligned bilingual (or multilingual) corpora. The process of translation in this paradigm is, roughly, as follows: for each unit of SL input, find a match in the SL side of the bilingual corpus and write out in the output the unit in the TL side of the corpus which corresponds to the SL unit. The main issues in this paradigm are the quality and grain size of the alignment of the corpora and the problems of either lack of a complete match and of choosing the "best" match out of a set of candidates (for additional detail, see the discussion of the Candide project below). Another statistics-inspired MT method, originating in Japan, has come to be called example-based MT. It differs from the "basic" statistical method in that it relies on inexact m a t c h i n g - - f o r instance, matching a plural form of a phrase in input with a singular form of the same phrase in the corpus or matching a string where one (or more) words are substituted by their synonyms (table for desk) or even hyperonyms (animal for dog). Allowing inexact matching somewhat minimizes the need for huge bilingual corpora, as there is more chance for a "hit" when an input phrase can match several corpus phrases. Example-based MT, thus, relies at least on morphological analysis and a thesaurus to establish the degree of matching between source text and the corpus. The metric for preferring some matches to others becomes very complex as more and more kinds of inexact matches are introduced.
4.1
Knowledge Sources for MT
Knowledge sources necessary for an MT system are determined, of course, by the approach used. For example, some statistics-oriented approaches often require the existence of large bilingual corpora. Some others, such as an experimental MT system at Kyoto University, need large lists of sample sentences against which a sentence to be translated is matched [21]. Rule-based MT systems however, make use of at least some, or I The idea of statistical MT is not new: "'It is well known that Western Languages are 50% redundant. Experiment shows that if an average person guesses the successive words in a completely unknown sentence he has to be told only half of them. Experiment shows that this also applies to guessing the successive word-ideas in a foreign language. How can this fact be used in machine translation?" [20]. In the 1950s, however, the state of the art in computer hardware and software precluded any serious experiments in statistical MT.
MACHINE TRANSLATION
169
possibly all of the following kinds of knowledge sources: 9 m o r p h o l o g y tables 9 g r a m m a r rules 9 lexicons 9 representations of world knowledge. It is possible both to analyze and to represent the English language without the use of morphology tables, since it is inflected only to a small degree; for the analysis of a highly inflected language such as Japanese, on the other hand, they are almost essential. Some analysis systems claim not to use an independent set of identifiable grammar rules, but they must somewhere contain information such as the fact that an article precedes a noun in English. 2 The third form of knowledge (lexical) appears in virtually every M T system, except for the purely statistical ones. And only K B M T systems claim to contain world knowledge representations. The distinction between the last two categories of knowledge can also be tricky: in a G e r m a n lexicon, for example, das Fraulein is marked as neuter in gender but, in the real world, it must be marked female, as the word means "young woman"). We should deduce from this that a lexicon is typically a rag-bag of information, containing more than just semantic information about meanings.
5.
The Evolution of MT Over its 50-year History
M u c h of the current skepticism about M T is outdated and comes from sociological factors, notably including the early enthusiasm for the technology, followed by its virtual collapse in the US in the mid-1960s. Was early M T as bad as everyone now says? Was the information theory and its underlying cryptology-inspired decoding hypothesis absurd? A brief historical discussion can help answer these questions. It is customary to consider the so-called Weaver m e m o r a n d u m as the starting point of research in MT. 3 In 1949 Warren Weaver, then a vice 2Although there is room for doubt as to which of the first two categories (morphology or grammar) certain items of linguistic knowledge belong to (in Italian, for example, forms such as pronouns may stand alone but can also act as suffixes to verbs: e.g. daglielo), in Japanese and English this ambiguity of type is very unlikely indeed. 3Claims about the actual starting point of a field of study are notoriously inexact. Some work on mechanical translation preceded the Weaver memorandum. Not only did Booth and his colleagues start such work in Britain in about 1946, but several patent applications were filed in the 1930s for translation devices, including very ingenious ones by the Soviet engineer SmirnovTroyansky [22].
170
SERGEI NIRENBURG AND YORICK WILKS
president of the Rockefeller Foundation, distributed 200 copies of a letter in which he suggested the concept of MT to some of the people who were likely to have an interest in its development. Even though the memorandum was predominantly a strategic document, several important theoretical and methodological issues were discussed, including the problem of multiple meanings of words and phrases, the logical basis of language, the influence of cryptography, and the need to analyze language universals. The memorandum aroused significant scientific and public interest. In 1948, a University of London team led by Andrew Booth and Richard Richens was the world's only MT research and experimentation center. In the two years after the Weaver memorandum, MT work started in the US at the Massachusetts Institute of Technology, the University of Washington, the University of California at Los Angeles, the RAND Corporation, the National Bureau of Standards, Harvard University, and Georgetown University. The major concepts, topics, and processes of M T - - s u c h as morphological and syntactic analysis, pre- and post-editing, homograph resolution, interlingual representation of text meaning, work in restricted vocabularies, automating dictionary look-up, and so o n - - w e r e first defined and debated at that time. The first scientific conference on MT was held in 1952 at MIT, and the first public demonstration of a translation program took place in 1954. This was the famous Georgetown experiment which involved translating about 50 Russian sentences, preselected from texts on chemistry, into English. This experiment was perceived by the general public and sponsors of scientific research as strong evidence for the feasibility of MT. The wide publicity and resonance of this experiment has also led to the establishment of MT projects outside the US, notably in the Soviet Union. Through the 1950s and into the following decade, MT research in the US and the Soviet Union continued and grew. Attempting to scale upwards from the initial Georgetown experiment, however, proved more difficult than expected, as translation quality declined with expanded coverage. The quality of fully automatic translation remained largely below an acceptable level and required extensive human post-editing, as can be seen from the following example, an excerpt from the output of a 1962 demonstration of the Georgetown GAT system, one of the best examples of MT from Russian to English: By by one from the first practical applications of logical capabilities of machines was their utilization for the translation of texts from an one tongue on other. Linguistic differences represent the serious hindrance on a way for the development of cultural, social, political, and scientific connections
MACHINE TRANSLATION
171
between nations. Automation of the process of a translation, the application of machines, with a help which possible to effect a translation without a knowledge of a corresponding foreign tongue, would be by an important step forward in the decision of this problem. [23] Still, researchers in MT remained largely optimistic about the prospects of the field. "The translation machine .... " wrote Emile Delavenay in 1960, "is now on our doorstep. In order to set it to work, it remains to complete the exploration of linguistic data." Two factors brought an end to the first enthusiastic period of MT research and development--the so-called ALPAC report and the persistently intractable problems of treatment of meaning (word sense choice and semantic structure issues that we have already touched on). Since Bar Hillel was one of the early champions of MT and had intimate knowledge of the research in the field, his critique (and "disproof" described above) has had a wide resonance in public attitudes toward MT, as well as among its sponsors in the US government and industry. Coupled with the increased difficulty of problems facing MT research after the initial successes, and notwithstanding the fact that many of the then-current projects (notably, at Georgetown University and IBM) pursued exactly the type of MT research recommended by Bar Hillel-namely, a combination of machine translation with human post-editing-this criticism started the process of reassessment of attitudes toward the field. The reassessment culminated in the publication in 1966 of the National Academy of Sciences Automatic Language Processing Advisory Committee (ALPAC) report which was critical of the state of research in MT and recommended reductions in the level of support for it. Remarkably, the central ALPAC argument was purely economic--the cost of machine translation, with human post-editing, was at the time higher than the cost of purely human translation. Thus, it was a judgment of MT as technology. Notably, no negative judgment was pronounced on MT as a scientific enterprise. The principal mistake of the early MT workers was that of judgment: the complexity of the problem of natural language understanding was underestimated. The variety and the sheer amount of knowledge necessary for any solution to this problem proved to be enormous, so that the success of MT as technology became dependent on the solution to the problem of knowledge acquisition and integration. It would take more than 15 years for MT to start a scientific comeback in the US. Although the ALPAC report reduced American efforts in MT (though far from ending them altogether, which is a popular myth), research and
172
SERGEI NIRENBURG AND YORICK WILKS
development continued to grow in several scientific groups in the Soviet Union, Canada, Germany, France, and Italy, as well as in a small number of commercial institutions in the US Notable MT achievements in the 15 years after the ALPAC report included the development and everyday use of the first unquestionably successful MT system, T A U M - M E T E O , developed at the University of Montr6al and used routinely since 1977 to translate weather reports from English into French. The MT program SYSTRAN was used during the Apollo-Soyuz space mission in 1975, and in the following year was officially adopted as a translation tool of the EEC. MT research in the US gradually revived. MT activities at various scientific meetings have significantly intensified, and several conference series specifically devoted to MT have been founded in the past decade. In 1999 the Seventh International MT Summit Conference and the Eighth International Conference on Theoretical and Methodological Issues in MT were held. New research groups have been set up all over the world. The general mood of the conferences reflects a new optimism based on modern scientific advances and the fact that the need for MT in the 1990s is vastly more pressing than in the world of 40 years ago.
5.1
MT "Generations"
The early division of MT systems into direct, transfer, and interlingua was somewhat parallel to the recognition of several chronological "generations" of MT work. 4 The first generation was prevalent before the ALPAC report, when all systems were direct. In the wake of the report, attempts were made to program systems with rules closer to systems of contemporary syntactic theory. These included GETA [24], a system at Grenoble, France, due to the late Bernard Vauquois, and based on a version of valency theory; the TAUM system was a descendant of GETA. Such systems relied on tree-structure representations that allowed complex structures to be attached to nodes and, in that sense, were, in fact, richer than those then available in syntactic theory in the Chomskyan tradition. This type of direction, begun in the late 1960s, is often referred to
4Though generalizations such as this are notable for an abundance of special cases and exceptions, a brief discussion should give the reader a better understanding of the field of MT. One must be very careful with using the term "'generation": as with "Nth generation computing" its role is essentially rhetorical rather than descriptive and used to claim novelty for the product of the speaker.
MACHINE TRANSLATION
173
as "second generation" MT. 5 The "third generation" MT has come to be associated with knowledge-based approaches. It started in earnest at the same time as the "second generation" and blossomed in the 1980s. While a brief flirtation with connectionist approaches (which were based on processing language using artificial neural networks, electronic models of the human brain) does not constitute a significant enough trend, the flowering of statistical MT, roughly after 1990, should be recognized as MT's "fourth generation. ''6
6.
Choices and Arguments For and Against MT Paradigms 6.1
Putting a Natural Language in the Center
Some MT researchers maintain that there is no need to use an artificial language for representing meanings, that a natural language, such as Aymara or Sanskrit, will do. Others maintain that instead of inventing new artificial languages, MT can use some of the available artificial languages, such as Esperanto. Such claims are driven by two considerations: (i) that it is difficult to design and actually acquire the syntax and the semantics for an artificial language; and (ii) that some natural languages exhibit a sufficiently "logical" character to be used directly by computers. The latter consideration is, in the
5 Such usage is full of historical irony, in that, for example, the broad theoretical basis of "second generation M T " - - t h e adequacy of a family of phrase structure grammars for MT and natural-language processing generally--strongly predates some current developments in AI and computational linguistics, in which a resurrected form of phrase structure grammar has seized the center stage from more semantic and knowledge-based methods in natural language processing. Later work following in this tradition of using more perspicuous and context-free syntax rules for analysis in MT included Melby's work on "'junction grammar" at Brigham Young University [25], and Slocum's METAL system at Austin, Texas [26]. A later addition was the EUROTRA system, developed for the European Community in Luxembourg between 1982 and 1992 [27]. This attempted initially to blend a GETA-like syntax with some of the insights from AI-based natural-language understanding, at its semantic levels. However, this was eventually abandoned, and the project turned to a variant of definite clause grammar, which is to say, in MT-historical, rather than AI-historical, terms, firmly entrenched itself within second-generation techniques. 6A principal and misleading feature of the "'generation" analogy, when applied to MT, is that it suggests successive time segments into which the different methods fall. As already seen, that can be highly misleading: the SYSTRAN system, for example, is a surviving form of a "first generation" system, existing alongside second, third and later, generational developments. Evolutionary phylae would be a much better metaphor here, because earlier systems, like sharks, survive perfectly well alongside later developments, like fish and whales.
174
SERGEI NIRENBURG AND YORICK WILKS
final analysis, romantic or propagandist ("the language I like the most is the most logical") and not scientific, but the former claim is only too true. It is, therefore, natural, to continue to look for alternatives to complex artificial interlinguas, although such alternatives are not easy to find. Indeed, the crucial difference between languages used by humans and languages designed for the use of a computer program is the mechanism which is expected to process them. Natural languages are used by humans. Artificial languages have computer programs as understanding agents. For excellent understanding machines such as humans breviO' is at a premium, even at the expense of ambiguity and implicitness of some information. For computer programs, lack of ambiguity and explicitness of representation is at a premium, at the expense of verbosity. Thus, the key to an effective interlingual format is that it be unambiguous and explicit. Furthermore, the characteristics of the communication channel suggest that the texts in languages spoken by humans are single-dimensional strings. With computers, knowledge can have a much more complex t o p o l o g y - - o f hierarchies or even multidimensional lattices. The only way in which one can say, loosely, that a natural language is used as interlingua is when lexical units of this language are used to tag ontological concepts. However, there is ample additional representational apparatus which is entailed in designing an interlingua. It is not impossible to use lexical units from certain natural languages or human-oriented artificial languages such as Esperanto as markers for ontological concepts. In fact, in our own work we use combinations of English lexemes to tag concepts. However, in order to turn such a language into an efficient text meaning representation for MT (at least some of) these meanings will have to be explicated in terms of their properties and typical connections with other concepts.
6.2
Can One Avoid Treatment of Meaning?
The search for ways of avoiding the need for the use of knowledge and "deep" analysis of text takes other forms, too. That understanding of meaning is not necessary is also maintained by those who observe that the polysemous Spanish noun centro is translated into German as z e n t r u m no matter which of the senses of centro was used in the original text. The question is then posed: "Why waste time detecting and representing the meaning of the input string when the target language correlate is always the same?" Similar claims have been made about syntactic ambiguities [28] and ambiguities of prepositional phrase attachment [29]. A typical formulation of this position is given by Ben Ari et al. [30]: It must be kept in mind that the translation process does not necessarily require full understanding of the text. Many ambiguities may be preserved
MACHINETRANSLATION
175
during translation [Pericliev 84], and thus should not be presented to the user (human translator) for resolution. Similarly, Isabelle and Bourbeau [31] contend that: sometimes, it is possible to ignore certain ambiguities, in the hope that the same ambiguities will carry over in translation. This is particularly true in systems like TAUM-aviation that deal with only one pair of closely related languages within a severely restricted subdomain. The difficult problem of prepositional phrase attachment, for example, is frequently bypassed in this way. Generally speaking, however, analysis is aimed at producing an unambiguous intermediate representation.
6.3
More " D i s p r o o f s " for KBMT?
Some MT researchers adopt the position that different languages employ different concepts, or employ concepts differently, and this short-circuits attempts at meaning extraction. Thus Amano [32] writes: "Natural languages have their own articulation of concepts according to their culture. Interlingua must naturally take account of this." To illustrate this point, Amano reports that where the English word moustache is customarily defined in English dictionaries as comprising hair on the upper lip, the Japanese kuchi-hige is defined in one (unspecified) Japanese dictionary as a "beard under the nose." (Actually, the Kanji ideographs for kuchi-hige stand for "lip" or "mouth" plus "whiskers.") From this we are urged to infer that what Japanese speakers mean by kuchi-hige is somehow different than what English speakers mean by moustache. Of course, this opinion is simply a particularly hirsute version of Sapir-Whorfism that depends crucially on the vagaries of dictionary entries. In addition, the claim displays a common misunderstanding of the concept of interlingua. What differs among languages is not the meaning representation but rather the lexical and syntactic means of realizing this meaning. The meaning of kuchi-hige and moustache will be represented in the same way in an interlingua text. The realizations of this meaning in the two languages will be different. It is in the interlingua-TL dictionary that a connection is established between an interlingual meaning representation and the language-particular linguistic expression. This is not the place to argue against linguistic and cognitive relativism. The idea of linguistic relativity is, in fact, neutral with respect to the tasks of computational linguistics. It should be sufficient to point out that however efficient dictionaries might be as explicators of meaning for humans, it is a mistake to appeal to them as formal indices of a culture's conceptual structure. To contend that meaning exists within a language but not across
176
SERGEI NIRENBURG AND YORICK WILKS
languages means to subscribe to an extreme sort of relativism usually associated with treating language as a mass of individual dialects or even idiolects. In practice, of course, indigenous realia can be described encyclopedically and then assigned a linguistic sign (possibly, a direct calque from the original language). A slightly different "cultural-imperialist" argument against language-independent meaning representation states that the way an interlingua is built reflects the world view behind one dominant language. Examples of phenomena with respect to which "cultural imperialism" can be established include the cross-linguistic difference in subcategorization behavior of verbs, the grain size of concept description, and the difference in attitude similar to the moustache example above. For instance, a single interlingua concept can be suggested to represent the main sense of the English put (as in put a book/glass on the table). This might be considered a case of English cultural imperialism because in Russian this meaning can be expressed either as polozit or postavit depending on some properties of the object of put. The difference can be glossed as that between putHat and put upright. A book can be put either way; a glass will be usually be put upright.
6.4
Ideal Interlingua and its Practical Realizations
The view of the interlingua as a representation capturing all meanings in all languages is too simplistic because it talks about an ideal case. As M a k o t o Nagao [33] put it: ... when the pivot language method [i.e, interlingua] is used, the results of the analytic stage must be in a form which can be utilized by all of the different languages into which translation is to take place. This level of subtlety is a practical impossibility. On a more technological level, Patel-Schneider [34] justifies the choice of the transfer approach in the Metal project as follows: METAL employs a modified transfer approach rather than an interlingua. If a meta-language [an interlingua] were to be used for translation purposes it would need to incorporate all possible features of many languages. That would not only be an endless task but probably a fruitless one as well. Such a system would soon become unmanageable and perhaps collapse under its own weight. This "maximalist" view of interlingua is so popular probably because it is conceptually the simplest. In operational terms, however, it is not more complex to conceive such an interlingua as a set of bilingual dictionaries among all the language pairs in the world. A practical interlingua should be viewed both as an object and as a process. Viewed as an object developed in
MACHINETRANSLATION
177
a concrete project, an interlingua should be judged by the quality of the translations that it supports between all the languages for which the corresponding SL-interlingua and interlingua-TL dictionaries have been built. As a process, its success should be judged in terms of the ease with which new concepts can be added to it and existing concepts modified in view of new evidence.
6.5
Is an Interlingua a Natural Language in Disguise?
The persistent problem for those in MT who work with interlinguas in particular and knowledge representations in general is to explain why symbolic internal structures always look so much like a real language when set down. It is widely supposed that knowledge-based machine translation requires grounding in a fully interpreted logical calculus, that a meaningbased approach cannot be presented with such formal rigor, and hence that meaning-based MT cannot succeed. This argument may be understood as demanding formal proofs of the correctness of meaning representations. Without such proofs, it is supposed, there is no guarantee that a translation will be free of contradiction or that the same meanings will be always represented similarly. This formalist approach to machine translation stems from a logic and philosophy of language tradition which tends to believe that there is no distinction in principle between natural and formal languages. But even if this supposition were correct, it would not follow that uniformly formal representations are necessary for the task of machine translation. As Wilks [35] put it: ... we do need representations .... but their form, if interpretable, is largely arbitrary, and we may be confident it has little relation to logic. I shall restate the view that the key contribution of AI in unraveling how such complex tasks as "understanding" might be simulated by a machine lies not in representations at all but in particular kinds of procedures .... It would be the most extraordinary coincidence, cultural, evolutionary, and intellectual, if what was needed for the computational task should turn out to be formal logic, a structure derived for something else entirely. The demand for proof that a target language text will contain no contradiction is of course a demand that cannot be met. But, fortunately, the problem of avoiding contradiction ~ in machine translation in particular and natural language processing in general--is an empirical issue and not clearly delimited by formalist claims and purported requirements. That is to say, while it might be nice to be able to offer such proofs, it would be a grievous error to abandon any enterprise unable to provide a formal proof
178
SERGEI NIRENBURG AND YORICK WILKS
of its future success. Indeed, the formalist gambit has been tried against any number of sciences, including physics, and has come up short. Human translations are not "provably correct." Moreover, very few computer programs can be actually proved correct--and then only with respect to formal specifications and not to real-world implementations.
6.6
The Ideal and the Reality of Statistical MT
The attractiveness of the statistical approach to MT centers on the claim that it can perform MT without a glimmering of understanding of the organization of language or even the actual languages involved! In essence, the ideal, purely statistical method described by the IBM researchers [36] is an adaptation of one that worked well for speech decoding [37]. The method, which was claimed to underlie the French-English translation system Candide, establishes three components: (a)a trigram model of English sequences; (b)the same for French; (c)a model of quantitative correspondence of the parts of aligned sentences between French and English. In the Candide project, the first two are established from very large monolingual corpora in the two languages, of the order of 100 million words, the third from a corpus of aligned sentences in a parallel FrenchEnglish corpus that are translations of each other. All three were provided by a large machine-readable subset of the French-English parallel corpus of Canadian parliamentary proceedings. Both (a) and (b) are valuable independent of the language pair and could be used in other pairings (which is why Candide claims now to be a transfer MT project). In very rough simplification, the Candide system works as follows: an English sentence yields likeliest equivalences for word strings (substrings of the English input sentence) i.e. French word strings. The trigram model for French rearranges these into the most likely order, which is the output French sentence. One of their most striking demonstrations is that the trigram model for French (or English) reliably produces (as the likeliest order for the components) the correct ordering of items for a sentence of 10 words or less. What should be emphasized is the enormous amount of pre-computation that this method requires to produce alignments and the trigram analysis. But even after all the pre-computation is completed, a 10-word input sentence required an additional hour of computation to produce a translation. This figure will be undoubtedly reduced with time and hardware expansion but it gives some idea of the computational intensity of the Candide method. Because of this computational complexity, Candide has, in fact, taken in whatever linguistics has helped: morphology tables, sense tagging (which is directional and dependent of the properties of French in particular), a
MACHINETRANSLATION
179
transfer architecture with an intermediate representation and an actual or proposed use of bilingual dictionaries. The pure, ideal statistical MT has been shown to f a i l - - t h e purely statistical results topped out at around 40% of sentences acceptably translated. Fortified with rule-based addenda, Candide reached much higher acceptability figures. The Candide story is, indeed, one of the triumphs of the technology-oriented, "bag of tricks" mind set over the "pure" scientific attitudes.
6.7
Statistical MT Does Not Solve All Hard Problems
There remain crucial classes of cases that seem to need symbolic inference. Consider the following example: Priest is charged with pope attack (Lisbon, May 14) A Spanish priest was charged here today with attempting to murder the Pope. Juan Fernandez Krohn, aged 32, was arrested after a man armed ~l'ith a bayonet approached the
Pope while he was saying prayers at Fatima on Wednesday night. According to the police, Fernandez told the investigators today he trained for the past six months for the assault. He was alleged to have claimed the Pope "looked furious" on hearing the priest's criticism of his handling of the church's affairs. If found guilty, the Spaniard faces a prison sentence of 15-20 years. (The Times, 15 May 1982) The emphasized phrases all refer to the same man, a vital fact for a translator to know since some of those phrases could not be used in any literal manner in another language (e.g. "the Spaniard" could not be translated word-for-word into Spanish or Russian). It is hard to imagine multiple identity of reference like that having an)' determinable statistical basis. Other, more general, reasons to question the promise of pure statistical methods are self evident. Statistical information retrieval (IR) generally works below the 80% threshold, and the precision/recall trade-off seems a barrier to greater success by those methods. Yet it is, by general agreement, an easier task than MT and has been systematically worked on for over 35 years, unlike statistical MT whose career has been intermittent. The relationship of MT to IR is rather like that of sentence parsers to sentence recognizers. A key point to note is how rapid the early successes of information retrieval were, and how slow the optimization of those techniques has been since then! The standard model of a single language in statistical processing is a trigram model because moving up to even one item longer (i.e. a tetragram model) would be computationally prohibitive. This alone must impose a strong constraint on how well the pure statistical approach can do in the end, since it is universally agreed that any language has phenomena that "connect" outside the three item window. The issue is
180
SERGEI NIRENBURG AND YORICK WILKS
how far one can get with the simple trigram model (and, as we have seen, it yields a basic 40%), and how well distance phenomena in syntax can be finessed by various forms of information caching.
7.
M T in t h e Real W o r l d
The purpose of machine translation is, on the one hand, to relieve the human translator of the need to work with tedious, repetitive, and aesthetically unsatisfying material and, on the other, to speed up and facilitate worldwide information dissemination. Translating fiction and poetry is not a task for machine translation. The focus is on translating technical texts and other expository texts. It is hardly necessary to argue for the fundamental importance of the ability to disseminate information across linguistic borders. The volume of such information is already very large and is bound to increase with the growth of international trade and international political cooperation. A study by the Japan Electronic Industry Development Association estimates the size of the translation market in Japan in 1988 as being close to a trillion yen (about $8 billion) annually, calculated as about 240-million pages of translation at 4000 yen per page, a page containing 400 characters of kana or 125 English words. The market was expected to increase about twofold in the following 2 years. The pressure is growing for MT researchers and developers to come up with products that will "break through." What, then, should MT researchers do? As fully automatic, high-quality MT in large domains is not feasible, "successful" MT should be redefined. This could be done by any combination of the following methods: (i) accepting lower-quality output, (ii) making MT only partially automated, and (iii)making sure that the texts selected for translation can, in fact, be automatically processed by the system.
7.1
Varying the Acceptability Threshold
In many cases, especially when a machine-translated text is not intended for publication, the quality standards for MT can be effectively lowered without tangible harm. This situation arises, for instance, when an expert in a scientific field would like to read an article in his or her field of expertise but that is in a language he or she does not know. In the best case, a medium-quality machine-translated text will suffice for understanding the content. At worst, a decision could be made on the basis of such a translation whether the article is interesting enough for it to be translated by other means, such as a more costly human translation. Intermediate
MACHINE TRANSLATION
181
positions, such as human post-editing of the rough translation may prove economically superior to a complete re-translation.
7.2
Partial A u t o m a t i o n in M T
Until very recently human intervention in the MT cycle came predominantly in the form of post-editing--a human editor improving the results of MT before it is submitted to publication or otherwise disseminated. Among such systems are SYSTRAN and its direct predecessor, GAT, SPANAM, HICATS, and some others. A major objective in systems relying on post-editing is to make human involvement as painless as possible in comparing source and target texts and correcting any translation errors. Many professional post-editors report serious difficulties in improving machine-translated output compared to that of human translators. If MT is viewed in purely economic terms, as a technology, then developing interactive computer environments ("Translator's Workstations") which facilitate the work of a post-editor is as important a task as building '~mainline" MT systems.
7.3
Restricting the Ambiguity of Source Text
The most straightforward way of restricting the source text ambiguity is by choosing a sufficiently narrow subject domain. The terminological inventory in such a domain will be limited and relatively easy to describe. This approach to MT has come to be known as sublanguage-oriented MT. The best example of the sublanguage approach is the operational MT system TAUM-METEO, developed at the University of Montraal and delivered to the Canadian Weather Service for everyday routine translations of weather reports from English into French. The system operates very successfully, practically without human intervention. Its vocabulary consists of about 1500 items, about half of which are place names. There is very little lexical ambiguity in the system because words are expected to be used in only one of their senses--namely, the one that belongs to the subworld of weather phenomena. For instance, the word front in T A U M - M E T E O will be understood unequivocally as a weather front. Finding well-delineated, self-sufficient, and useful sublanguages is a very difficult task, and this is one of the reasons why the success of TAUMMETEO has not yet been repeated by any other operational system. Unfortunately, most subworlds (e.g., computer manuals, finance or chemistry) are not as restricted with respect to vocabulary size and syntactic diversity as repetitive weather forecasts.
182
SERGEI NIRENBURGAND YORICKWILKS
As an alternative, texts can be deliberately simplified for the use of an MT system. In order to prepare texts for machine translation a human pre-editor may be employed who reads the input text and modifies it to include only the words and constructions which the MT system is able to process automatically. Difficult and overly ambiguous words and phrases are replaced with those the editor knows that the program will be able to handle. A version of this method has been widely employed in industry (e.g., the TITUS system, the Xerox Corp. Docutran, etc.). Like post-editors, preeditors must also be supplied with interfaces and tools.
7.4
Statistical MT and the Economics of Corpora
In one sense, what the Candide project has done is partially automate the construction process of traditional MT systems, such as SYSTRAN: replacing laborious error feedback with statistical surveys and lexicon construction. However, Candide is, unlike other projects, totally tied to the bilingual corpus, the Rosetta Stone, one might say, of statistics-oriented MT. We should remember, too, that their notion of word sense is only and exactly that of correspondences between different languages, a wholly unintuitive one for many people. The problem for statistical MT is that few vast bilingual corpora are available in languages for which MT is needed. If, however, they had to be constructed by hand, then the economics of what Candide has done would change radically. By bad luck, the languages for which such corpora are available are also languages in which traditional MT systems (e.g., SYSTRAN) already have done pretty well, so for statistical MT to be taken seriously as a practical force, it will have to overtake, then widen the gap with, SYSTRAN's performance. They may be clever enough to make do with less than the current 100-million-word corpora per language, but one would naturally expect quality to decline as they did so. This resource argument could be very important: it has been shown (by the British computational linguist Geoffrey Leech) that for the case of statistical taggers, any move to improve the performance by adding more higher-level knowledge structures always ended up requiring a much larger corpus than expected. This observation contravenes the claims by MT statisticians, so an important point to watch in the future is monitoring the availability of adequate bilingual corpora for the domain-specialized MT that is most in demand (such as airline reservations or bank billings): Hansard is large but is very general indeed.
7.5
Novel Applications I: Translation of Spoken Language
MT's sister area, speech processing, has progressed significantly over the past 20 years. Improvement in speech recognition and synthesis systems has
MACHINE TRANSLATION
183
whetted the appetite for practical applications involving interpretation--the translation of spoken text. There are a number of uses for such technology, from multilingual telephony to personal communication. A number of projects have been devoted to this problem, which compounds the complexity of MT with the considerable complexity of the speech recognition problem as such. A variety of relatively small-scale efforts have been sponsored by the US government, and much more ambitious projects have been put together in Japan (the ATR Laboratory) and Germany (the Verbmobil project). The goals of such projects are necessarily modest. However, several spectacular demonstrations of speech-to-speech translation have been staged over the past 3 years, for instance, that of the Spanishto-English system at AT&T Bell Laboratories or the tripartite EnglishGerman-Japanese demonstration by ATR, the Siemens Nixdorf Corporation of Germany and Carnegie Mellon University in Pittsburgh. Connected by a communication satellite, three operators in Munich, Kyoto, and Pittsburgh were able to communicate speaking in their own language, while the translation systems generated translations of their interlocutors' utterances into the native tongues of the recipient. The conversation had to do with inquiries about registration for a scientific conference. This latter demonstration was, indeed, a spectacular technological achievement, especially with respect to integrating a variety of component systems--language processors, speech recognizers, speech synthesizers, communication interfaces, etc. However, arguably it did not push the envelope of the speech translation technology very hard. That this achievement was reported in national newscasts in Japan, Germany and the US and on the front page of The Ne~' York Times may have been a mixed blessing, as this coverage may have unduly raised the expectations of a variety of parties strongly interested in developing this technology. In fact, the system demonstrated operated on a very small vocabulary of less than 500 lexical units in each language, and the prospects for rapid scaling-up of the size of vocabulary used by this demonstration are at best not immediate. The feeling of dOj~t vu was experienced strongly by everybody who remembers the situation caused by the Georgetown experiment. Speechto-speech MT must not "oversell" its achievements and promise and thus avoid repeating a crucial mistake of the early MT. It is not wise to risk an ALPAC report on speech translation.
7.6
Novel Applications I1: Multi-Engine MT
Current MT projects--both "pure" and hybrid, both predominantly technology-oriented and scientific, are single-engine projects, capable of one particular type of source text analysis, one particular method of
184
SERGEI NIRENBURG AND YORICK WILKS
finding target language correspondences for source language elements and one prescribed method of generating the target text. Although such projects can be quite useful, we believe that it is time to make the next step in the design of MT systems and to move toward adaptive, multiple-engine systems. Practical MT systems are typically developed for a particular text type (e.g., weather reports, financial news articles, scientific abstracts) and for a particular end use--e.g., assimilation or dissemination of information. Special cases, such as translating an updated version of a previously translated text, abound in real-world practice. Gains in output quality and efficiency can be expected if a machine translation environment can be made to adapt to a task profile. Thus, for example, for translating abstracts of scientific articles in order to select just those that are of particular interest to a customer, a statistics-based approach might be most appropriate. Example-based translation seems to be most promising for translating new versions of previously translated documents. This correspondence between technique, input text type and end use (or output text type) provides further motivation for moving toward adaptive, multiple-engine systems. Two approaches to adaptivity in MT have been formulated. Both presuppose an MT environment in which a number of MT engines are present--for instance, one (or more!) each of knowledge-based, statisticsbased, transfer-based, or example-based engines can be used. In one of the approaches all available engines are "unleashed" on an input text and the final output is assembled from the best text segments, irrespective of which engine produced them. In another approach a heuristic "dispatcher" decides which of the available engines holds the highest promise for a given input text and then assigns the job to that engine. The former approach involves more processing but allows an a posteriori selection of the best results. The latter approach saves cycles but relies on heuristic a priori selection of the best output. In this latter case, the quality of the heuristics for the dispatcher module is crucial, but additionally, the approach expects each of the component engines to be of rather high quality, since they would not (as is the case in the other approach) be "bailed out" by other engines in case of failure.
7.7
Novel Applications II1: MT and Other Kinds of Text Processing
MT has recently lost its hitherto uncontested hold on multilingual text processing. For several years there has been an effort, supported by the US government, to develop systems for information retrieval from text in multiple languages. Several such programs (MUC, Tipster, TREC, etc.)
MACHINE TRANSLATION
185
have attracted a number of participants in the academia and industry. These researchers have based their systems on a variety of basic approaches (including, very prominently, the venerable "bags of tricks" approach). However, and quite naturally, none of the systems built to extract a predefined set of information elements from texts in English or Japanese, to say nothing about systems devoted to filtering message traffic by directing the messages into a variety of topically organized "'bins," has opted to use a fullscale machine translation system in their processing. This decision is natural because it has been widely perceived that the tasks of information extraction and message routing can be performed without the need for a "blanket" coverage of the input, as is necessary in MT. MT was perceived as too heavy and too unreliable a weapon for this task. So, in order to build an information retrieval system in two languages, the developers decided to develop two bags of tricks, one for each language involved. The information retrieval task has met with some success in the current formulation. At the same time, it is possible to argue that this task is an attractive application for machine translation. Indeed, why not have just one bag of information retrieval tricks, oriented at, say, English, and use it both on the texts that were supplied in the original English and on the texts that a battery of MT systems have produced from inputs in a variety of languages. Granted, the output of the translation programs should be expected to be less than high quality. But the same argument can be invoked here that was used in deciding not to use MT for information retrieval: the latter can be done without a "blanket" coverage of the text, without succeeding in analyzing (or even attempting to analyze) each and every input word. Good information retrieval may result from less than high-quality translation results.
8.
The Current Situation
MT technology today is still immature in many senses: 9 it is too dependent on the task, i.e., the carefully pre-specified kinds of texts 9 the effort to acquire sufficient knowledge for broad-domain fully automatic translation is still too high 9 text analysis techniques still concentrate predominantly on the syntax and semantics of single sentences, without sufficient coverage of the phenomena of multi-sentence meaning 9 technology is brittle in the face of unexpected input due to (i) words missing from the lexicons; (ii) word and phrase meanings outside of the set of senses in the lexicon; (iii) input errors
186
SERGEI NIRENBURG AND YORICK WILKS 9
systems are still too dependent on a particular pair of source and target languages 9 no effective and general reference treatment mechanisms have been proposed. At the same time, the intensity and variety of MT research is arguably at its all-time high. Work is being carried out in both parts of the following opposed pairs: 9 fully-automatic/machine-assisted (a whole range of sophistication of machine assistance, from simple post-editing to the use of an advanced TWS) 9 high-quality output/rough translations 9
language-pair-tuned/multilingual
9 unconstrained input/constrained input (possibly, pre-editing; authoring systems to assist in production of legal constrained text) 9text~speech~combination thereof 9 complete translation/abstracting (summarization)/database record creation 9 multilingual generation from text/tabular info/database records/graphics 9 rule-based/corpus-based MT 9interlingua-oriented/transfer-oriented/specialized "blanket coverage of small domain" glossary-based translation 9 full-blown statistical MT/example-based MT.
9.
Conclusion
The new optimism of MT researchers and sponsors is based on spectacular advances in computer technology (drastic improvements in processing speed and memory capacity of computers, advances in computer architecture, emergence of database technology, development of high-level programming languages and interactive programming environments, etc.) and computational linguistics (in particular, techniques for morphological and syntactic analysis and synthesis of natural language texts). Advances in automatic processing of meaning and techniques of human-computer interaction are also an important component of the current MT paradigms. With the knowledge of the past difficulties, and, therefore, with a realistic assessment of the possibilities of MT technology application, current MT projects are well equipped to produce a new wave of scientifically sound and practically useful machine translation systems. These systems are being
MACHINE TRANSLATION
187
designed both to compete and to cooperate with humans in translating a wide variety of scientific, industrial, official, journalistic and other texts. And, most significantly, low-cost, rapid-response MT systems promise to increase the volume of timely translations manyfold, addressing a societal need for timely delivery of currently untranslated quality information whose sources may be in a number of different languages.
REFERENCES [1] Quine, W. V. (1960). Word and Object, MIT Press, Cambridge, MA. [2] Quine, W. V. (1987). Quiddities. an Intermittenth' Philosophical Dictionao'. Belknap Press of Harvard University Press, Cambridge, MA. [3] Bar Hillel, Y. (1959). Report on the state of machine translation in the United States and Great Britain. Technical report, Hebrew University, Jerusalem, 15 February 1959. [4] Bar Hillel, Y. (1960). The present status of automatic translation of languages. Advances in Computers 1, 91 - 163. [5] McCarthy, J. (1979). Ascribing Mental Qualities to Machines, Philosophical Perspectives in Artificial Intelligence, Harvester Press, Brighton. [6] Chomsky, N. (1955). The Logical Structure of Linguistic Theols", distributed by Indiana University Linguistics Club. [7] Arnold, D. J., and Sadler, L. (1991). Eurotra." An Assessment of the Current State of the ECs M T Programme, Working Papers in Language Processing 32, Department of Language and Linguistics, University of Essex. [8] Vauquois, B. and Boitet, C. (1985). Automated translation at Grenoble University. Computational Linguistics 11(1), 28- 36. [9] Witkam, A. P. M. (1983). Distributed Language Translation." Feasibilio' Stud), of a Multilingual Facility for Videotex hformation Networks. BSO, Utrecht. [10] Landsbergen, J. (1987). Isomorphic grammars and their use in the ROSETTA translation system, Machine Translation Today." The State of the Art, Proceedings of the Third Lugano Tutorial, ed. Margaret King, Edinburgh University Press, Edinburgh, pp. 351-372. [11] Schank, R. C., Goldman, N., Rieger, C. and Riesbeck, C. K. (1973). MARGIE: memory, analysis, response generation and inferences in English. Proceedings of IJCAI-73, International Joint Conference of Artificial Intelligence. [12] Kay, M. (1980). The proper place of men and machines in language translation. Research Report CSL-80-11, Xerox Palo Alto Research Center, Palo Alto, CA. [13] Carbonell, J. G., Cullingford, R. E. and Gershman, A. G. (1981). Steps towards knowledge-based machine translation. IEEE Transactions Pattern Analysis and Machine Intelligence 3(4). [14] Nirenburg, S., Raskin, V. and Tucker, A. (1986). On knowledge-based machine translation. COLING-86. Proceedings of the 1l th hTternational Conference on Computational Linguistics, Bonn, pp. 627-632. [15] Uchida, H. (1988). ATLAS II: a machine translation system using conceptual structure as an interlingua. Proceedings o[ the 2nd International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Centre for Machine Translation, Carnegie Mellon University, Pittsburgh, PA, pp. 165-173. [16] Okumura, A., Muraki, K. and Akamine, S. (1991). Multi-lingual sentence generation from the PIVOT interlingua. Proceedings of the MT Summit 91, pp. 67-71.
188
SERGEI NIRENBURG AND YORICK WILKS
[17] Farwell, D. and Wilks, Y. (1991). Ultra: a multi-lingual machine translator. Proceedings of the M T Summit III, pp. 19-24. [18] Goodman, K. and Nirenburg, S. (1991). The KBMT Pro/ect: A case study in KnowledgeBased Machine Translation, Morgan Kaufmann, San Francisco, CA. [19] Kaji, H. (1988). An efficient execution method for rule-based machine translation, COLING-88, Budapest, 2, pp. 824-829. [20] King, G. W. (1956). Stochastic methods for mechanical translation. Mechanical Translation 3(2), 38-39. [21] Nagao, K. (1990). Knowledge-based structural disambiguation, COLING-90: Proceedings of the 13th International Coll['erence on Computational Linguistics. Helsinki, Finland, 2, pp. 282-287. [22] Hutchins, J. and Lovtsky, E. (to appear). Petr Petrovich Troyanskii (1894-1950): a forgotten pioneer of mechanical translation, in Mach#Te Translation. [23] Hutchins, J. (1986). Machine Translation." Past, Present, Future, Ellis Horwood/Wiley, Chichester. [24] Boitet, C. (1987). Research and development on MT and related techniques at Grenoble University (GETA), in Machine Translation Today." The State of the Art, Proceedings of the Third Lugano Tutorial, ed. Margaret King, Edinburgh University Press, Edinburgh, pp. 133-153. [25] Melby, A. K. (1980). A comparative look at junction grammar, in The Sixth LACUS Forum, Hornbeam Press Columbia, SC, pp. 344-352. [26] Slocum, J. (1987). METAL: the LRC machine translation system, in Machine Translation Today. The State of the Art, Proceedings of the Third Lugano Tutorial, ed. Margaret King, Edinburgh University Press, Edinburgh. pp. 319-350. [27] King, M. (1982). Eurotra: an attempt to achieve multilingual MT, in Practical Experience of Machine Translation, ed. Veronica Lawson, North-Holland, Amsterdam, pp. 139-147. [28] Pericliev, V. (1984). Handling syntactical ambiguity in machine translation. COLING-84, pp. 521- 524. -29] Kay, M. (1989). Head-driven parsing. Procee~fings o1 the hTternational Workshop on Parsing Technologies. 30] Ben-Ari, D., Berry, D. M. and Rimon, M. (1988). Translational ambiguity rephrased. Proceedings of the 2nd International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages, Centre for Machine Translation, Carnegie Mellon University, Pittsburgh, PA, pp. 175-183. [31] Isabelle, P. and Bourbeau, L. (1985). TAUM-AVIATION: its technical features and some experimental results. Computational Linguistics 11(1), 18-27. 32] Amano, S., Hirakawa, H., Nogami, H. and Kumano, A. (1989). The Toshiba machine translation system. Future Computhlg Systems 2(3), 227-246. 33] Nagao, M. (1989). Machine Translation, Oxford University Press, Oxford. .34] Patel-Schneider, P. F. (1989). A four-valued semantics for terminological reasoning. Artificial Intelligence 38. [35] Wilks, Y. A. (1990). Where am I coming from: the reversibility of analysis and generation in natural language processing, in Thirty Years of Lh~guistic Evolution, ed. Martin Putz, John Benjamins, Amsterdam. 36] Brown, P. et al. (1990). A statistical approach to machine translation. Computational Linguistics 16(2), 79-85. 37] Jelinek, F. and Mercer, R. L. (1980). Interpolated estimation of Markov source parameters from sparse data. Proceed#lgs of the Workshop on Pattern Recognition in Practice.
The Games Computers (and People) Play JONATHAN SCHAEFFER Department of Computing Science University of Alberta Edmonton Alberta Canada T6G 2H 1 jonathan@cs, ualberta.ca
Abstract In the 40 years since Arthur Samuel's 1960 Advances in Computers chapter, enormous progress has been made in developing programs to play games of skill at a level comparable to, and in some cases beyond, what the best humans can achieve. In Samuel's time, it would have seemed unlikely within 40 years worldclass backgammon, checkers, chess, Othello, and Scrabble programs would be built. These remarkable achievements are the result of a better understanding of the problems being solved, major algorithmic insights, and tremendous advances in hardware technology. Computer games research is one of the major success stories of artificial intelligence. This chapter can be viewed as a successor to Samuel's work. A review of the scientific advances made in developing computer games is given. These ideas are the ingredients required for a successful program. Case studies for the games of backgammon, bridge, checkers, chess, Othello, poker, and Scrabble are presented. They are the recipes for building highperformance game-playing programs.
1. 2.
3.
Introduction . . . . . . . . . . . Advances . . . . . . . . . . . . . 2.1 Advances in Search . . . . . 2.2 Advances in Knowledge . . . 2.3 Simulation-Based Approaches 2.4 Perspectives . . . . . . . . . Advances in Computer Games . . . 3.1 Backgammon . . . . . . . . 3.2 Bridge . . . . . . . . . . . . 3.3 Checkers . . . . . . . . . . . 3.4 Chess . . . . . . . . . . . . 3.5 Othello . . . . . . . . . . .
ADVANCES IN COMPUTERS, VOL. 52 ISBN 0-12-012152-2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
189
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .
190 191 192 203 207 210 211 211 218 224 230 236
Copyright s 2000 by Academic Press All rights of reproduction in any form reserved.
190
JONATHAN SCHAEFFER
3.6 Poker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Scrabble. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Other Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.
242 250 260 261 262 262
Introduction
Arthur Samuel is one of the pioneers of artificial intelligence research. Together with Claude Shannon [1] and Alan Turing [2], he laid the foundation for building high-performance game-playing programs. Samuel is best known for developing his checkers program. Throughout his career, he consistently sold his work as research in machine learning. His papers describing the program and its learning capabilities are classics in the literature [3, 4]. These papers are still frequently cited today, almost four decades since the original research was completed. Few computing papers have a lifespan of 10 years, let alone 40. In the years since Samuel's 1960 chapter for the first volume of Advances in Computers, enormous progress has been made in constructing highperformance game-playing programs. In Samuel's time, it would have seemed unlikely that within a scant 40 years checkers (8 • 8 draughts), Othello 1 and Scrabble 2 programs would exist that exceed the abilities of the best human players, while backgammon and chess programs could play at a level comparable to the human world champion. These remarkable accomplishments are the result of a better understanding of the problems being solved, major algorithmic insights, and tremendous advances in hardware technology. The work on computer games has been one of the most successful and visible results of artificial intelligence research. For some games, one could argue that the Turing test has been passed [5]. When talking about computer games, it is important to draw the distinction between using games as a research tool for exploring new ideas in computing, and using computing to do research into games. The former is the subject of this chapter; the latter is not. Nevertheless, it is important to recognize that building high-performance game-playing programs has also been of enormous benefit to the respective game-playing communities. The technology has expanded human understanding of games, 1Othello is a registered trademark of Tsukuda Original, licensed by Anjar Co. 2Scrabble is a registered trademark of the Milton Bradley Company, a division of Hasbro, Inc.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
191
allowing us to explore more of the intellectual challenges that games have to offer. Computers offer the key to answering some of the puzzling, unknown questions that have tantalized game aficionados. For example, computers have shown that the chess endgame of king and two bishops versus king and knight is generally a win, contrary to expert opinion [6]. In checkers, the famous 100-year position took a century of human analysis to "prove" a win; the checkers program Chinook takes a few seconds to prove the position is actually a draw (it is now called the 197-year position) [7]. This chapter can be viewed as a successor to Samuel's 1960 chapter, discussing the progress made in developing programs for the classic board and card games over the last four decades. A review of the scientific advances made in developing computer games is presented (Section 2). It concentrates on search and knowledge for two-person perfect-information games, and simulation-based approaches for games of imperfect or non-deterministic information. These ideas are the ingredients needed for a successful program. Section 3 presents seven case studies to highlight progress in the games of checkers, Othello, Scrabble (superior to man), backgammon, chess (comparable to the human world champion), bridge, and poker (human supremacy may be threatened). These are successful recipes for building high-performance game-playing programs. Although this chapter discusses the scientific advances, one should not underestimate the engineering required to build these programs. One need only look at the recent success of the Deep Blue chess machine to appreciate the effort required. That project spanned 8 years, and included several people working full-time, extensive computing resources, computer chip design, and grandmaster consultation. Some of the case studies hint at the amount of work required to build these systems. In all cases, the successes reported in this chapter are the result of consistent progress over many years.
2.
Advances
The biggest advances in computer game-playing over the last 40 years have come as a result of work done on the alpha-beta search algorithm. Although this algorithm is not suitable for some of the games discussed in this chapter, it received the most attention because of the research community's preoccupation with chess. With the Deep Blue victory over world chess champion Garry Kasparov, interest in methods suitable for chess has waned and been replaced by activity in other games. One could
192
JONATHAN SCHAEFFER
argue that the chess victory removed a ball and chain that was stifling creativity in research on game-playing programs. Because of the historical emphasis on search, the material in this section is heavily biased towards it. In the last decade, new techniques have moved to the forefront of games research. Two in particular are given special emphasis since they are likely to play a more prominent role in the near future: Monte Carlo simulation has been successfully applied to games with imperfect or non-deterministic information. In these games it is too expensive to search all possible outcomes. Instead, only a representative sample is chosen to give a statistical profile of the outcome. This technique has been successful in bridge, poker, and Scrabble. Temporal-difference learning is the direct descendent of Samuel's machine learning research. Thus it is fitting that this method be included in this chapter. Here a database of games (possibly generated by computer self-play) can be used to bootstrap a program to find a good combination of knowledge features. The algorithm has been successfully applied to backgammon, and has recently shown promise in chess. This section gives a representative sample of some of the major results and research thrusts over the past 40 years. Section 2.1 discusses the advances in search technology for two-player perfect information games. Advances in knowledge engineering have not kept pace, as discussed in Section 2.2. Section 2.3 discusses the emerging simulation framework for games of nondeterministic or imperfect information. The material is intended to give a flavor of the progress made in these areas, and it is not intended to be exhaustive.
2.1
A d v a n c e s in Search
The minimax algorithm was at the heart of the checkers program described in Samuel's 1960 chapter. Minimax assumes that one player tries to maximize their result (often called Max), while the other tries to minimize what Max can achieve (the Min player). The program builds a search tree of alternating moves by Max and Min. A leaf node is assigned either the gametheoretic value if known (win, loss, draw) or a heuristic estimate of the likelihood of winning (using a so-called evaluation function). These values are maximized (by Max) and minimized (by Min) from the leaves back to the root of the search. Within given resources (typically time), it is usually not possible to search deep enough to reach leaf nodes for which the
THE GAMES COMPUTERS (AND PEOPLE) PLAY
193
game-theoretic result is known. The evaluation function uses applicationdependent knowledge and heuristics to come up with an estimate of the winning chances for the side to move. Consider the example in Fig. 1, where maximizing nodes are indicated by squares and minimizing nodes by ovals. The root, Max (node A), has to choose a move that leads to positions B, C, or D. It is Min to play at these three positions and, similarly, Min's choice of move will lead to a position with Max to move. At the leaves of the tree are the heuristic values. These values are maximized at the Max nodes (the nodes left-to-right beginning with E), minimized at the Min nodes (B, C, and D), and maximized at the root (A). In this example, the minimax value of the tree is 5. The bold lines indicate the best line of play: Max will play from A to C to maximize his score, while Min will play from C to H to minimize Max's score. Max then chooses the branch leading to a score of 5, the maximum of the possible moves. The best line of play is often called the principal variation. The minimax algorithm is a depth-first, left-to-right traversal of the tree, evaluating all successors of Max and Min nodes. If one assumes that the tree has a uniform branching factor of ~' and a fixed depth of d moves (or ply), then the algorithm must examine O(~ '~t) leaf nodes. Clearly, the exponential growth of the search tree limits the effectiveness of the algorithm. Sometime in the late 1950s or early 1960s, the alpha-beta algorithm for searching minimax trees was invented (the algorithm may have been independently discovered, but the first publication appears in [8]). A l p h a beta simply and elegantly proves that many branches in the search tree need not be considered since they are irrelevant to the final search result. Consider Fig. 1 again. The search of nodes E and F shows that the value of B is ~<4. Now, consider searching G. The first child of G has a value of 6. Thus, Max will guarantee that G's value is/>6. Searching G's other children
FIG. 1. Searching a min-max tree.
194
JONATHAN SCHAEFFER
is pointless since they can only increase G's value, which cannot affect B's value. Hence further search at G has been proven to be unnecessary. The other children are said to be cut off or pruned. Shaded nodes in the figure have been eliminated from the search by alpha-beta. In this example, the number of leaf nodes considered has been reduced from 27 using minimax to 16 with alpha-beta. The alpha-beta algorithm searches a tree using two bounds: c~ and/3, c~ is the minimum value that player Max has achieved. ~ is the maximum value to which player Min can limit Max (conversely, the best that Max can achieve given Min's best play). Any node where a score results in the condition c~ 1>/3 causes a cut-off. The alpha-beta algorithm is given in Fig. 2. 3 For a d-ply search, it is called by: AlphaBeta(rootnode, - o c , +oc, d, M A X N O D E ) .
int {
AlphaBeta(
position
p,
int
alpha,
int
beta,
int
depth,
int
type
/* C h e c k for a leaf n o d e */ if( d e p t h == 0 ) return( Evaluate( p ) );
/* I d e n t i f y legal m o v e s */ numbmoves = GenerateMoves( p, if( n u m b m o v e s == 0 ) return( Evaluate( p ) );
movelist
);
if(
t y p e == M A X N O D E ) nexttype = MINNODE; else n e x t t y p e = M A X N O D E ; /* Call A l p h a B e t a recursively for for( m o v e = I; m o v e <= n u m b m o v e s ; {
each m o v e move++ )
*/
p = MakeMove( p, m o v e l i s t [ m o v e ] ); value = AlphaBeta( p, a l p h a , b e t a , d e p t h - l , p = UndoMove( p, m o v e l i s t [ m o v e ] ); /* U p d a t e best v a l u e found so far if( t y p e == M A X N O D E ) a l p h a = M A X ( v a l u e , a l p h a ); else beta = MIN( value, beta );
}
/* Check for if( a l p h a >= break;
a cut-off. beta )
Minimax
nexttype
*/
without
this
if(
t y p e == M A X N O D E ) r e t u r n ( alpha ); else r e t u r n ( beta );
FIG. 2. The alpha-beta algorithm.
3 The Negamax formulation is more concise [9].
);
line
of
code
*/
)
THE GAMES COMPUTERS (AND PEOPLE) PLAY
195
Again assuming a tree of fixed branching factor w and search depth d, a l p h a - b e t a improves the best case of the search tree size to O(w ~t/2) (or, to be more precise, wFa/21 + w L~l/2j - 1) [9]. This best case occurs when the move leading to the best minimax score is searched first at every interior node. 4 If the worst move is search first at every node, then a l p h a - b e t a will build an O(w a) minimax tree. Although the 20 lines of code in Fig. 2 look simple, this is misleading. These are possibly the most deceptive lines of code in the artificial intelligence literature! A l p h a - b e t a has the insidious property of hiding errors. For example, an evaluation function error may only be detected when the error happens to propagate to the root of the search tree. The deeper the search, the harder it is for the error to be minimized and maximized all the way back to the root. Consequently, many game-playing programs have bugs that survive for years before the right sequence of events occurs that allows the problem to manifest itself. 5 In practice, a high-performance a l p h a - b e t a implementation is often 20 or more pages of code. The reason for this is the exponential difference in the search effort between the best and worst a l p h a - b e t a cases. Considerable effort has to be invested to ensure a nearly best-case result. The consequence is a myriad of enhancements to a l p h a - b e t a , significantly increasing the complexity of the search process. The main a l p h a - b e t a search enhancements can be characterized into four groups: 9 caching information." avoiding repeated work, 9 move ordering." increasing the likelihood of the best move being searched first at a node, 9 search window." changing the [c~, /3] window to speculatively reduce search effort, and 9 search depth." dynamically adjusting the depth to redistribute search effort, attempting to maximize the value of the information gathered from the search. Each of these enhancements is discussed in turn.
2. 1.1
Caching Information
For most games, the search tree is really a misnomer; it is a search graph. Two different sequences of moves can transpose into each other. For example, in chess, the move sequence 1. d4 d5 2 . 5 f3 gives rise to the same 4At nodes where a cut-off occurs, one only needs to search a move that is sufficient to immediately cause the cut-off. 5Empirical evidence suggests that this only happens in important tournament games!
196
JONATHAN SCHAEFFER
position as the sequence 1. f3 d5 2. d4. Detecting these transpositions and eliminating redundant search effort can significantly reduce the search-tree size. The transposition table is a cache of recently searched positions. When a sub-tree has been searched, the result is saved in the transposition table. Before searching a node in the tree, the table is consulted to see if it has been previously searched. If so, the table information may be sufficient to stop further search at this node. The table is usually implemented as a large hash table [10, 11]. The effectiveness of the transposition table is application dependent [12, 13]. For games such as chess and checkers, where a single move changes a few squares on the board, the benefits can be massive (roughly a 75% reduction for chess and 89% for checkers for a typical search). For games such as Othello, where a move can change many squares on the board, the likelihood of two move sequences transposing into each other is small (roughly a 33% reduction for a typical search).
2. 1.2 Move Ordering The exponential difference in the search-tree size between the best and worst case of alpha-beta hinges on the order in which moves are considered. At a node where a cut-off is to occur, it should be achieved with the first move searched. Hence, effort is applied at interior nodes to order the moves from the most to least likely to cause a cut-off. The most important place for move ordering is at the root of the search. For example, if a 10-ply search is initiated without any prior preparation, the resulting search tree is likely to be large. The first move searched may provide a poor bound (c~ value), increasing the size of the search window used for the subsequent moves. Considering the best move first narrows the search window, increasing the chances for cut-offs in the tree. Most alpha-beta-based programs use a technique called iterative deepening to maximize the chances of the best move being searched first [11]. The program starts by searching all moves 1-ply deep. The moves are then ordered based on the returned scores. The tree is then re-searched, this time 2-ply deep, and so on. The idea is that the best move for a ( d - 1)-ply search is likely to also be best for a d-ply search. By investing the overhead of repeating portions of the search, the chances are increased that the best move is considered first in the last (most expensive) iteration. Experience shows that the cost of the early iterations is a small price to pay for the large gains achieved by improved move ordering at the root of the tree. This is an important result that has been applied to many other search domains (for example, single-agent search [14]).
THE GAMES COMPUTERS (AND PEOPLE) PLAY
197
The idea of considering the best move first should be applied at all nodes in the tree. At interior nodes, cheaper methods are used to improve the quality of the move ordering. Three popular choices are:
9 Transposition table." When recording a search result in the table, save the score and the move that leads to the best score. When the node is revisited (within the same or a subsequent iteration), if the score information is insufficient to cause a cut-off, then the best move from the previous search can be considered first. Since the move was previously best, there is a good chance that it is still the best move.
9Application-dependent kno~dedge." Many games have applicationdependent properties that can be exploited by a move ordering scheme. For example, in chess capture moves are more likely to cause a cut-off than non-capture moves. Hence, many programs consider all capture moves first at each node.
9History heuristic." There are numerous application-dependent moveordering algorithms in the literature. One application-independent technique that has proved to be simple and effective is the history heuristic [15, 16]. A move that is best in one position is likely also best in similar positions. The heuristic maintains a global history score for each move that indicates how often that move has been best. Moves can then be ordered by their history heuristic score. A subset of this idea, the killer heuristic, is also popular [17]. The contrast between the transposition table and history heuristic is interesting. The transposition table stores the exact context under which a move is considered best (i.e., it saves the position with the move). The history heuristic records which moves are most often best, but has no knowledge about the context that makes the move strong. Other move ordering schemes fall somewhere in between these two extremes by, for example, adding more context to the history-heuristic moves. Move ordering in game-playing programs is highly effective. For example, a recent study showed that the best move was searched first over 90% of the time in chess and checkers programs, and over 80% of the time in an Othello program [12]. The chess result is quite impressive when one considers that a typical position has 35 legal moves to choose from.
2. 1.3
Search Window
The a l p h a - b e t a algorithm searches the tree with an initial search window of [-e~, + ~ ] . Usually, the extreme values do not occur. Hence, the search
198
JONATHAN SCHAEFFER
can be made more efficient by narrowing the range of values to consider, increasing the likelihood of cut-offs. Aspiration search centers the search window around the value expected from the search, plus or minus a reasonable range (~5) of uncertainty. If one expects the search to produce a value near, say, 40 then the search can be called with a search window of [40 - 5, 40 + ~5]. The search will result in one of three cases: 9 40 - ~5< result < 40 + f: The actual value is within the search window. This value has been determined with less effort than would have been required had the full search window been used.
9 result <~4 0 - 6: The actual value is below the aspiration window (the search is said to fail lo~'). To find the actual value, a second search is needed with the window [ - e c , result]. 9 result >>,40 + ~5: The actual value is above the aspiration window (the search is said to fail high). To find the actual value, a second search is needed with the window [result, +re]. Aspiration search is a gamble. If the result is within the search window, then the enhancement wins. Otherwise an additional search is needed. Aspiration search is usually combined with iterative deepening. The result of the ( d - 1)-ply iteration can be used as the center of the aspiration window used for the d-ply search. ~5is application dependent and determined by empirical evidence. The idea of speculatively changing the search-window size can be applied throughout the search. Consider an interior node with a search window of [c~,/3]. The first move is searched and returns a score v in the search window. The next move will be searched with the window [v, t3]. If the move ordering is effective, then there is a high probability that the best move at this node was searched first. Hence, the remaining moves are expected to return a score that is ~0?). More recently, the MTD(f) algorithm uses only minimal windows to determine the value of the root. This algorithm has been shown to be superior to a l p h a - b e t a in terms of number of nodes expanded in the search tree [12, 13].
THE GAMES COMPUTERS (AND PEOPLE) PLAY
199
2. 1.4 Search Depth Although alpha-beta is usually described as a fixed-depth search, better performance can be achieved using a variable search depth. The search can be compared to a stock portfolio: don't treat all stocks as being equal. You should invest in those that have the most promise, and reduce or eliminate your holdings in those that look like losers. The same philosophy holds true in search trees. If there is some hint in the search that a sequence of moves looks promising, then it may be a good idea to extend the search along that line to get more information. Similarly, moves that appear to be bad should have their search effort reduced. There are a number of ways that one can dynamically adjust the depth to maximize the amount of information gathered by the search. Most alpha-beta-based programs have a number of applicationdependent techniques for altering the search depth. For example, chess programs usually extend checking moves an additional ply since these moves indicate that something interesting is happening. Most programs have a "hopeless" metric for reducing the search depth. For example, in chess if one side has lost too much (e.g., a queen and a rook), it is very unlikely this sub-tree will eventually end up as part of the principal variation. Hence, the search depth may be reduced. There are a number of techniques that may be useful for a variety of domains. In chess, null-move searches have been very effective at curtailing analysis of poor lines of play. The idea is that if one side is given two moves in a row and still can't achieve anything, then this line of play is likely bad. Hence, the search depth is reduced. This idea can be applied recursively throughout the search [22, 23]. Another important idea is ProbCut [24]. Here the result of a shallow search is used as a predictor of whether the deeper search would produce a value that is relevant to the search window. Statistical analysis of the program's searches is used to find a correlation between the values of a shallow and deep search. If the shallow search result indicates that the deeper search will not produce a value that is large enough to affect the node's value, then further effort is stopped. Although both the null-move and ProbCut heuristics purport to be application independent, in fact they both rely on game-specific properties. Null-move cut-offs are only effective if the consequences of giving a side two moves in a row is serious. This causes problems, for example, in checkers where giving a player an extra move may allow them to escape from a position where having only one move loses (these are known as zugzwang positions). ProbCut depends on there being a strong correlation between the values of shallow and deep searches. For games with low variance in the leaf
200
JONATHAN SCHAEFFER
node values, this works well. If there is high variance, then the evaluation function must be improved to reduce the variance. In chess programs, for example, the variance is generally too high for ProbCut to be effective. The most common form of search extension is the quiescence search. It is easier to get a reliable evaluation of a leaf position if that position is quiet or stable (quiescent). Hence, a small search is done to resolve immediate capture moves or threats [11]. Since these position features are discovered by search, this reduces the amount of explicit application-dependent knowledge required in the evaluation function. A search-extension idea that has attracted a lot of attention is singular extensions [25]. The search attempts to identify forced (or singular) moves. This can be achieved by manipulating the search window to see if the best move is significantly better than the second-best move. When a singular move is found, then the search along that line of play is extended an additional ply (or more). The idea is that forcing moves indicate an interesting property of the position that needs to be explored further. In addition, various other extensions are commonly used, most based on extending the search to resolve the consequences of a threat [26, 27].
2. 1.5
Close to Perfection?
Numerous studies have attempted to quantify the benefits of alpha-beta enhancements in fixed-depth searches (for example, [10, 16]). Move ordering and the transposition table usually make the biggest performance difference, with other enhancements generally being much smaller in their impact. The size of trees built by game-playing programs appears to be close to that of the minimal alpha-beta tree. For example, in chess, Belle is reported to be within a factor of 2.2 [28], Phoenix within 1.4 [28], Hitech within 1.5 [28] and Zugzwang within 1.2 [29]. These results suggest that there is little room for improvement in fixed-depth alpha-beta searching. The above comparisons have been done against the approximate minimal search tree. However, finding the real minimal tree is difficult, since the search tree is really a search graph. The real minimal search should exploit this property by: 9 selecting the move that builds the smallest tree to produce a cut-off, and 9 preferring moves that maximize the benefits of the transposition table (i.e., re-using results as much as possible). Naturally, these objectives can conflict. In contrast to the above impressive numbers, results suggested that chess programs are off by a factor of 3 or
THE GAMES COMPUTERS (AND PEOPLE) PLAY
201
more from the real minimal search graph [12, 13]. Thus, there is still room for improvements in alpha-beta search efficiency. Nevertheless, given the exponential nature of alpha-beta, that programs can search within a small constant of optimal is truly impressive. Forty years of research into alpha-beta have resulted in a recipe for a finely tuned, highly efficient search algorithm. Program designers have a rich set of search enhancements at their disposal. The right combination is application dependent and a matter of taste. Although building an efficient searcher is well understood, deciding where to concentrate the search effort is not. It remains a challenge to identify ways to selectively extend or reduce the depth in such a way as to maximize the quality of the search result.
2. 1.6 Alternative Approaches Since its discovery, alpha-beta has been the mainstay of computer games development. Over the years, a number of interesting alternatives to alphabeta-based searching have been proposed. Berliner's B* algorithm attempts to prove the best move, without necessarily determining the best move's value [30, 31]. In its simplest form, B* assigns an optimistic (upper bound) and a pessimistic (lower bound) value to each leaf node. These values are recursively backed up the tree. The search continues until there is one move at the root whose pessimistic value is as good as all the alternative move's optimistic values. In effect, this is a proof that the best move (but not necessarily its value) has been found. There are several drawbacks with B*, most notably the non-standard method for evaluating positions. It is difficult to devise reliable optimistic and pessimistic evaluation functions. B* has been refined so that the evaluations are now probability distributions. However, the resulting algorithm is complex and needs considerable application tuning. It has been used in the Hitech chess program, but even there the performance of alpha-beta is superior [31]. McAllester's conspiracy numbers algorithm tries to exploit properties of the search tree [32]. The algorithm records the minimal number of leaf nodes in a search tree that must change their value (or conspire) to change the value of the root of the tree. Consider a Max node having a value of 10. To raise this value to, say, 20, only one of the children has to have its value become 20. To lower the value to, say, 0, all children with a value greater than 0 must have their value lowered. Conspiracy numbers works by recursively backing up the tree the minimum numbers of nodes that must change their value to cause the search tree to become a particular value. The algorithm terminates when the effort required to change the value at the root of the search (i.e., conspiracy number) exceeds a predefined threshold.
202
JONATHAN SCHAEFFER
Conspiracy numbers caused quite a stir in the research community because of the innovative aspect of measuring resistance to change in the search. Considerable effort has been devoted to understanding and improving the algorithm. Unfortunately it has a lot of overhead (for example: slow convergence, cost of updating the conspiracy numbers, maintaining the search tree in memory) which has been an impediment to its usage in high-performance programs. A variation on the original conspiracy numbers algorithm has been successfully used in the Ulysses chess program [33]. There are other innovative alternatives to alpha-beta, each of which is worthy of study. These include BPIP [34], min/max approximation [35], and meta-greedy algorithms [36]. Although all these alpha-beta alternatives have many desirable properties, none of them is a serious challenger to alpha-beta's dominance. The conceptual simplicity of the alpha-beta framework makes it relatively easy to code and highly efficient at execution time. The alpha-beta alternatives are much harder to code, the algorithms are not as well understood, and there is generally a large execution overhead. Perhaps if the research community devoted as much effort to understanding these algorithms as they did in understanding alpha-beta, we would see a new algorithm come to the fore. Until that happens, alpha-beta will continue to dominate as the search algorithm of choice for two-player perfect information games.
2. 1.7
Conclusions
Research on understanding the alpha-beta algorithm has dominated games research since its discovery in the early 1960s. This process was accelerated by the discovery of the strong correlation of program performance with alpha-beta search depth [37]. This gave a simple formula for success: build a fast search engine. This led to the building of specialpurpose chips for chess [38] and massively parallel alpha-beta searchers [29]. Search alone is not the answer. Additional search eventually leads to diminishing returns in the benefits achievable [39]. Eventually, there comes the point where the most significant performance gains are to be had by identifying and implementing missing pieces of application knowledge. This was evident, for example, in the 1999 world computer chess championship, where the deep-searching, large multiprocessor programs finished behind the shallower-searching, PC-based programs that used more chess knowledge. For many popular games, such as chess, checkers, and Othello, alphabeta has been sufficient to achieve world-class play. Hence, there was no need to look for alternatives. For artificial-intelligence purists, this is an
THE GAMES COMPUTERS (AND PEOPLE) PLAY
203
unsatisfactory result. By relying on so-called brute-force searching, these programs can minimize their dependence on knowledge. However, for other games, most notably Go, search-intensive solutions will not be effective. Radically different approaches are needed.
2.2
Advances in Knowledge
Ideally, no knowledge other than the rules of the game should be needed to build a strong game-playing program. Unfortunately, for interesting games it is usually too deep to search to find the game-theoretic value of a position. Hence knowledge for differentiating favorable from unfavorable positions has to be added to the program. Nevertheless, there are some cases where the program can learn position values without using heuristic knowledge. The first example is the transposition table. This is a form of rote learning. By saving information and reusing it, the program is learning, allowing it to eliminate nodes from the search without searching. Although the table is usually thought of as something local to an individual search, "important" entries can be saved to disk and used for subsequent searches. For example, by saving some transposition table results from a game, they may be used in the next game to avoid repeating the same mistake [40, 41]. A second example is endgame databases. Some games can be solved from the end of the game backwards. One can enumerate all positions with one piece on the board, and record which positions are wins, losses, and draws. These results can be backed up to compute all positions with two pieces on the board, and so on. The result is an endgame database containing perfect information. For chess, most of the five-piece and a few six-piece endgames have been computed [6]. This is of limited value, since most games are over before such a simplified position is reached. In checkers, all eight-piece endgames have been computed [42]. The databases play a role in the search of the first move of a game! Endgame databases have been used to solve the game of Nine Men's Morris [43]. A third form of knowledge comes from the human literature. Most games have an extensive literature on the best opening moves of the game. This information can be collected in an opening book and made available to the program. The book can either be used to select the program's move, or as advice to bias the program's opening move selection process. Many programs modify the opening book to tailor the moves in it to the style of the program. When pre-computed or human knowledge is not available, then the gameplaying program must fall back on its evaluation function. The function assigns scores to positions that are a heuristic assessment of the likelihood of
204
JONATHAN SCHAEFFER
winning (or losing) from the given position. Application-dependent knowledge and heuristics are usually applied to a position to score features that are indicators of the true value of the position. The program implementor (usually in consultation with a domain expert) will identify a set of features ( f ) that can be used to assess the position. Each feature is given a weight (w) that reflects how important that feature is in relation to the others in determining the overall assessment. Most programs use a linear combination of this information to arrive at a position value:
value - ~
wi x f .
(1)
i--1
where n is the number of features. For example, in chess two features that are correlated with success are the material balance and pawn structure (fl and f2). Material balance is usually much more important than pawn structure, and hence has a much higher weighting (~'l >> w2). Identifying which features might be correlated with the final result of the game is still largely done by hand. It is a complex process that is not well understood. Usually the features come from human experience. However, human concepts are often vague and hard to define algorithmically. Even well-defined concepts may be impractical because of the computational overhead. One could apply considerable knowledge in the assessment process, but this increases the cost of performing an evaluation. The more expensive the evaluation function is to compute, the smaller the search tree that can be explored in a fixed amount of time. Thus, each piece of knowledge has to be evaluated on what it contributes to the accuracy of the overall evaluation, and the cost (both programmer time and execution time) of having it. Most evaluation functions are carefully tuned by hand. The knowledge has been judiciously added, taking into account the expected benefits and the cost of computing the knowledge. Hence, most of the knowledge that is used is of a general-purpose nature. Unfortunately, it is the exceptions to the knowledge that cause the most performance problems. As chess grandmaster Kevin Spraggett said [42]: I spent the first half of my career learning the principles for playing strong chess and the second half learning when to violate them. Most game-playing program's evaluation functions attempt to capture the first half of Spraggett's experience. Implementing the second half is often too difficult and computationally time consuming, and generally has a small payoff (except perhaps at the highest levels of play). Important progress has been made in setting the weights automatically. Although this seems as if it should be much easier than building an
THE GAMES COMPUTERS (AND PEOPLE) PLAY
205
evaluation function, in reality it is a laborious process when done by hand. Automating this process would result in a huge reduction in the effort required to build a high-performance game-playing program. Temporal difference learning has come to the fore as a major advance in weighting evaluation function features. Samuel pioneered the idea [3, 4], but it only became recognized as a valuable learning algorithm after Sutton extended and formalized this work [44]. Temporal difference learning is at the heart of Tesauro's world-championship-caliber backgammon program (see Section 3.1), and has shown promising results in chess (discussed later in this section). Temporal difference learning (TDL) is a reinforcement learning algorithm. The learner has an input state, produces an output action, and later receives feedback (commonly called the reward) on how well its action performed. For example, a chess game consists of a series of input states (positions) and actions (the move to play). At the end of the game, the reward is known: win, loss, or draw. In between the start and the end of the game, a program will use a function to map the inputs onto the outputs (decide on its next move). This function is a predictor of the future, since it is attempting to maximize its expected outcome (make a move that leads to a win). The goal in reinforcement learning is to propagate the reward information back along the game's move sequence to improve the quality of actions (moves) made. This is accomplished by attributing the credit (or blame) to the outputs that led to the final reward. By doing so, the learner's evaluation function will change, hopefully in such a way as to be a better predictor of the final reward. To achieve the large-scale goal of matching inputs to the result of the game, TDL focuses on the smaller goal of modifying the learner so that the current prediction is a better approximation of the next prediction [44, 45]. Consider a series of predictions P1, P2, ..., Px on the outcome of a game. These could be the program's assessment of the likelihood of winning from move to move. In chess, the initial position of a game, P1, has a value that is likely close to 0. For a win PN 1 while a loss would have Px = - 1 . For the moves in between, the assessments will vary. If the likelihood of winning for position t (Pt) is less (more) than that of position t + 1 (Pt+ 1), then we would like to increase (decrease) the value of position t to be a better predictor of the value of t + 1. The idea behind temporal difference learning is to adjust the evaluation based on the incremental differences in the assessments. Thus, -
-
A = Pt
+1
--
Pr
measures that difference between the prediction for move t + 1 and that for move t. This adjustment can be done by modifying the weights of the evaluation function to reduce the A from move to move.
206
JONATHAN SCHAEFFER
T e m p o r a l difference learning is usually described with a variable weighting of recency. R a t h e r than considering only the previous move, one can consider all previous moves with n o n - u n i f o r m weights (usually exponential). These moves should not all be given the same i m p o r t a n c e in the decision-making process, since the evaluation of moves m a d e m a n y moves previously is less likely to be relevant to the current evaluation. Instead, previous moves are weighted by Ap, where p reflects how far back the m o v e is. The p a r a m e t e r A controls how m u c h credit is given to previous moves, giving exponentially decaying feedback of the prediction error over time. Hence, this a l g o r i t h m is called TD(A). Figure 3 gives the t e m p o r a l difference relation used by TD(A). A typical application of T D L is for a p r o g r a m with an evaluation function, but u n k n o w n weights for the features. By playing a series of games, the p r o g r a m gets feedback on the relative i m p o r t a n c e of features. T D L p r o p a g a t e s this i n f o r m a t i o n back along the m o v e sequence played, causing incremental changes to the feature weights. The result is that the evaluation function values get tuned to be better predictors. In addition to T e s a u r o ' s success in b a c k g a m m o n (Section 3.1), there are two recent T D L d a t a points in chess. First, Cilkchess, currently one of the strongest chess p r o g r a m s , was tuned using t e m p o r a l difference learning and the results are encouraging. D o n Dailey, a c o - a u t h o r of Cilkchess, writes that [46]: Much to my surprise, TDL seems to be a success. But the weight set that comes out is SCARY; I'm still afraid to run with it even though it beats the
Aw, = a(P, ~_1 - P,) ~
k'-kV,,,Pk
k=]
where: 9 9 9 9 9
w is the set of weights being tuned, t is the time step being altered, in a sequence of moves from 1,2 ..... N - 1, Aw, is the change in the set of weights at step t as a result of applying temporal differences, P, is the prediction at time step t (for the end of the game, Px, the final outcome is used), A(0 ~ 0 is the rate of learning (a small c~ causes small incremental changes; a large c~ makes larger steps).
A and c~ are heuristic parameters that need to be tuned for each application domain. FIG. 3. The TD(A) algorithm.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
207
hand-tuned weights. They are hard to understand too, because TDL expresses chess concepts any way that is convenient for it. So if you create a heuristic to describe a chess concept, TDL may use it to ~'fix'" something it considers broken in your weight set. An interesting data point was achieved in the KnightCap chess program [47]. Starting with a program that knew only about material and had all other evaluation function terms weighted with zero, the program was able to quickly tune its weights to achieve an impressive increase in its performance. The authors recognized that the predictions of a chess program were the result of an extensive search, and the score at the root of the tree was really the value of the leaf node on the principal variation. Consequently, the temporal difference learning should use the principal variation leaf positions, not the positions at the root of the search tree [48]. This algorithm has been called TDLeaf(A) [47]. These successes are exciting, and offer the hope that a major component of building high-performance gameplaying programs can be automated. 6
2.3
Simulation-Based Approaches
In the 1990s, research into non-deterministic and imperfect information games emerged as an important application for artificial-intelligence investigations. In many ways, these domains are more interesting than two-player perfect-information games, and promise greater long-term research potential. Handling imperfect or probabilistic information significantly complicates the game, but is a better model of the vagaries of realworld problems. For non-deterministic and imperfect information games, a l p h a - b e t a search does not work. The branches in the search tree represent probabilistic outcomes based on, for example, the roll of the dice or unknown cards. At best one can back up probabilities of expected outcomes. For these games it is usually impractical to build the entire game tree of all possibilities. Simulations can be used to sample the space of possible outcomes, trying to gather statistical evidence to support the superiority of one action. 7 The program can instantiate the missing information (e.g., assign cards or determine dice rolls), play the game through to completion, and then record the result. This can be repeated with a different assignment of the missing information. By repeating this process many times, a statistical ranking of the move choices can be obtained. 6An excellent survey of machine learning applied to games can be found in [49]. 7Some of the material in this section has been taken from [50].
208
JONATHAN SCHAEFFER
Consider the imperfect-information game of bridge. The declarer does not know in which hand each of the 26 hidden cards is. The simulator can instantiate one possible assignment of cards to each opponent, and then play the hand through to completion (a trial). Thus, a single data point has been obtained on the number of tricks that can be won. This can then be repeated by dealing a different set of cards to each opponent. When these simulated hands have been repeated a sufficient number of times, the statistics gathered from these runs can be used to decide on a course of action. For example, it may be that in 90% of the samples a particular card play led to the best result. Based on this evidence, the program can then decide with high confidence what the best card to play is. For each trial in the simulation, one instance of the non-deterministic or unknown information is applied. Hence, a representative sample of the search space is looked at to gather statistical evidence on which move is best. Figure 4 shows the pseudo-code for this approach. Some of its characteristics include: 9 the program iterates on the number of samples taken, 9 the search for each sample usually goes to the end of the game, and 9 heuristic evaluation usually occurs at the interior nodes of the search to determine a subset of branches to consider, reducing the cost of a sample (and allowing more samples to be taken). The simulation benefits from selective samples that use information from the game state (i.e., such as the bidding auction in bridge), rather than a uniform distribution or other fixed distribution sampling technique. /* F r o m a g i v e n move S i m u l a t o r ( {
state, s i m u l a t e and r e t u r n known _ state state )
obvious_move = NO; trials O; while( ( t r i a l s <= {
MAX_TRIALS
) and
the
best
( obviousmove
move
==
*/
NO
) )
trials = trials + I; /* G e n e r a t e the m i s s i n g i n f o r m a t i o n */ m i s s i n g i n f o = s e l e c t i v e s a m p l i n g to g e n e r a t e m i s s i n g i n f o r m a t i o n ; numbmov~s = GeneratePlau~ibleMoves( state, mi~s ng_inTo, movelist ); /* C o n s i d e r all m o v e s */ f o r ( m = I; m <= n u m b m o v e s ; m++ ) {
}
state = MakeAction( value[m ] = value[m] state = UndoAction(
state, movelist[ + Search( state state, movelist[
)
m ],
missing_info
);
m
missing_info
);
],
/* T e s t to see if one m o v e is s t a t i s t i c a l l y b e t t e r than all if( 3 i s u c h t h a t v a l u e [ i ] >> v a l u e [ j ] ( V j, j ~ i) ) { }
}
obvious_move
= YES;
/* R e t u r n the move with the h i g h e s t s c o r e */ r e t u r n ( m o v e i II v a l u e [ i ] >= v a l u e [ j ] (V
j,
j ~
i)
FIG. 4. Simulation-based search.
);
others
*/
THE GAMES COMPUTERS (AND PEOPLE) PLAY
209
Statistical sampling has noise or variance. The sampling must be done in a way that captures the reality of the situation, ruling out impossible scenarios and properly reflecting the likelihood of improbable scenarios. The more representative the samples, the less the variance is likely to be. Selective sampling refers to carefully choosing the simulation data to be as representative as possible [50]. It is important to distinguish selective sampling from traditional Monte Carlo techniques. Selective sampling uses information about the game state to skew the underlying probability distribution, rather than assuming uniform or other fixed probability distributions. Monte Carlo techniques may eventually converge on the right answer, but selective sampling allows for faster convergence and less variance. As an example, consider the imperfect-information and non-deterministic game of Scrabble. Brian Sheppard, author of Maven, writes that for his simulations he generates [51]: ... a distribution of racks that matches the distribution actually seen in games. In Maven we use a uniform distribution of the rack, and we take steps to ensure that every tile is represented as often as it should be. We do this without introducing statistical bias by always including in the opponent's tiles for the next iteration the one tile that has been most underrepresented among all previous racks. Selective sampling need not be perfect. In Scrabble, the opponent's tiles do not come from a uniform distribution: opponents tend to play away bad letters and keep good letters. Sheppard is convinced that this small refinement to the model of the opponent's hands would make little difference in the simulation results. An important feature of the simulation-based framework is the notion of an obvious move. Although many alpha-beta-based programs incorporate an obvious move feature, the technique is usually ad hoc and the heuristic is the result of programmer experience rather than a sound analytic technique. In the simulation-based framework, an obvious move is statistically well defined. As more samples are taken, if one choice exceeds the alternatives by a statistically significant margin, one can stop the simulation early and commit to the action, with full knowledge of the statistical validity of the decision. It is interesting to compare a l p h a - b e t a and simulation-based search methods. A l p h a - b e t a considers all possible moves at a node that cannot be logically eliminated; simulation-based search can only look at a representative sample. Whereas a l p h a - b e t a search typically has a depth limitation, most simulation-based programs follow a path from the root of the search to the end of the game. Thus, one can characterize a l p h a - b e t a search trees
210
JONATHAN SCHAEFFER
as having large width, but limited depth. Simulations, on the other hand, typically have limited breadth but large depth. Figure 5 illustrates the differences in these two approaches, where an "x" is used to indicate where evaluations occur in the search. Simulations are used in many branches of science, but have only recently emerged as a powerful tool for constructing high-performance game-playing programs. They have proven to be effective in backgammon, bridge, poker, and Scrabble. This technique deserves to be recognized as an important framework for building game-playing programs, equal in stature to the alpha-beta model. 2.4
Perspectives
Enormous progress has been made in understanding the algorithms needed to build game-playing programs. Most of this work has been driven by the desire to satisfy one of the early goals of artificial intelligence research, building a program capable of defeating the human world chess champion. Hence, most games-related research has concentrated on alphabeta search. With the chess success on the horizon, many researchers branched out into other games. As a result, research efforts on two-player perfect-information games has moved to the background. New vistas are being explored, with temporal-difference learning and simulations being samples of the current research thrusts. The research in developing algorithms for game playing has applicability to other application domains, but the community of researchers involved have done a poor job selling the technology. For example, many of the
(a) Alpha-beta search
(b) Simulation search
FIG. 5. Contrasting search methods.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
211
search techniques pioneered with alpha-beta have become standard in other search domains (e.g., iterative deepening), with few realizing the lineage of the ideas.
3.
Advances in Computer Games
This section summarizes the progress made in a number of popular games. These include games where computers are better than all humans (checkers, Othello, and Scrabble), are as good as the human world champion (backgammon and chess), and some where human supremacy may be challenged in the near future (bridge and poker). Each section contains a brief history of program development for that game, a case study on the best program in the area, and a representative sample of their play. The case study highlights interesting or unique aspects of the program. The histories are necessarily brief. I apologize in advance to the many hard-working researchers and hobbyists whose work is not mentioned here. 3.1
Backgammon
The first concerted effort at building a strong backgammon program was undertaken by Hans Berliner of Carnegie Mellon University. In 1979 his program, BKG9.8, played an exhibition match against the newly-crowned world champion Luigi Villa [52, 53]. The stakes were $5000, winner take all. The final score was 7-1 in favor of the computer, with BKG9.8 winning four of the five games played (the rest of the points came from the doubling cube). Backgammon is a game of both skill and luck. In a short match, the dice can favor one player over another. Berliner writes that [53]: In the short run, small percentage differences favoring one player are not too significant. However, in the long run a few percentage points are highly indicative of significant skill differences. Thus, assessing the results of a five-game match are difficult. Afterwards Berliner analyzed the program's play and concluded that [52]: There is no doubt that BKG9.8 played well, but down the line Villa played better. He made the technically correct plays almost all the time, whereas the program did not make the best play in eight out of 73 non-forced situations. BKG9.8 was an important first step, but major work was still needed to bring the level of play up to that of the world's best players.
212
JONATHAN SCHAEFFER
In the late 1980s, IBM researcher Gerry Tesauro began work on a neuralnet-based backgammon program. The net used encoded backgammon knowledge and, training on data sets of games played by expert players, learned the weights to assign to these pieces of knowledge. The program, Neurogammon, was good enough to win first place in the 1989 Computer Olympiad [54]. Tesauro's next program used a neural network that was trained using temporal difference learning. Instead of training the program with data sets of games played by humans, Tesauro was successful in having the program learn using the temporal differences from self-play games. The evolution in T D - G a m m o n from version 0.0 to 3.0 saw an increase in the knowledge used, a larger neural net, and the addition of small selective searches. The resulting program is acknowledged to be on par with the best players in the world, and possibly even better. In 1998, an exhibition match was played between world champion Malcolm Davis and T D - G a m m o n 3.0. To reduce the luck factor, 100 games were played over 3 days. The final result was a narrow 8-point win for Davis. Both Davis and Tesauro have done extensive analysis of the games, coming up with similar conclusions [55]: While this analysis isn't definitive, it suggests that we may have witnessed a superhuman level of performance by TD-Gammon, marred only by one horrible blunder redoubling to 8 in game 16, costing a whopping 0.9 points in equity and probably the match!
3. 1.1
TD-Gammon 3.0
Backgammon combines both skill and luck. The luck element comes from the rolls of the dice, making conventional search techniques impractical. A single roll of the dice results in one of 21 distinct combinations, each of which results in an average of 20 legal moves to consider. With a branching factor of over 400, many of which are equally likely and cannot be pruned, brute-force search will not be effective. T D - G a m m o n is a neural network that takes as input the current board position and returns as output the score for the position (roughly, the probability of winning) [56]. The neural network acts as the evaluation function. Each of the connections in the neural net is parameterized with a weight. Each node is a function of the weighted sum of each of its inputs, producing an equity value as output. The neural net has approximately 300 input values (see Fig. 6) [45, 47]. For each of the 24 points on the board, there are 4 inputs for each player giving the number of pieces they have on that point. Additional inputs for
THE GAMES COMPUTERS (AND PEOPLE) PLAY
213
Output: predicted probability of winning
Neural net: 160 hidden inputs
Inputs: backgammon position and features (300 units) FIG. 6. T D - G a m m o n 3.0"s neural network.
each side are the number of pieces on the bar, the number of pieces taken off the board, and whose turn it is. The likelihood of achieving a gammon or a backgammon are also input. The remaining 100 inputs are from functions that compute positional features, taken from the Neurogammon program. The inputs to the net were chosen to simplify the system, and not to minimize the number of inputs. TD-Gammon 2.0 used no backgammon knowledge and had a neural net containing 80 hidden units. This program was sufficient to play strong backgammon, but not at a world-class level. Tesauro was able to improve the program's performance to be world-class caliber by adding Neurogammon's backgammon knowledge as input to the neural net. This version, TDGammon 3.0, contains 160 hidden units in the neural network. Each unit in the net takes a linear sum of the weighted values of its input, and then converts it to a value in the range - 3 to 3. A backgammon is worth 3 points, a gammon 2, and a win, 1 point. The conversion is done with a nonlinear sigmoid function, allowing the output to be a nonlinear function of the inputs. The resulting neural net has approximately 50 000 weights that need to be trained. The weights in the hidden units were trained using temporal difference learning from self-play games. By playing the program against itself, there was an endless supply of training data. In a given game position, the program uses the neural net to evaluate each of the roughly 20 different ways it can play its dice roll, and then chooses the move leading to the maximum evaluation. Each game is played to completion, and then temporal difference learning is applied to the sequence of moves. Roughly 1 500 000 self-play games were used for training T D - G a m m o n 3.0.
214
JONATHAN SCHAEFFER
T D - G a m m o n has been augmented with a selective three-ply search. For each of its moves, T D - G a m m o n considers the most likely opponent responses, and its replies to those responses. Each state considered in the search has roughly 400 possibilities, so for each of the 21 dice rolls, TDG a m m o n considers only a handful of likely best moves for the opponent (selectively paring down the search). A critical component of strong backgammon is the handling of the doubling cube. The cube strategy was added after the program was trained. It uses a theoretical doubling formula developed by mathematicians in the 1970s [58]. During a game, T D - G a m m o n ' s reward estimates are fed into this formula to come up with an approximation of the expected doubling payoff. Post-mortem analysis of backgammon games use simulations (or rollouts as they are called in the backgammon community). A roll-out consists of repeatedly simulating the play from a starting position through to the end of the game. Each trial consists of a different sequence of dice rolls. Each move decision is based on a one-ply search. A simulation is stopped after 10 000 trials or when a move becomes statistically better than all the alternatives.
3. 1.2
The Best of Computer Backgammon
The following game was the 18th played in the exhibition match between T D - G a m m o n 3.0 and world champion Malcolm Davis, held at the 1998 conference of the American Association for Artificial Intelligence. The game comments are by Gerry Tesauro (GT), Malcolm Davis (MD) and TDG a m m o n (TD). TD gives the top moves in a position, ordered by their score. These values were determined after the match by roll-outs. Tesauro explains how to interpret the scores [59]: Ignoring gammons and backgammons, if player's move decision is 0.1 worse than the best move, the player has reduced his winning chances by about 5%, and backgammon experts would regard that as a "blunder." On the other hand, if the error is 0.02 or less, it only costs about 1% in winning chances, and such errors are regarded as small. Each of the 24 points on the board is numbered and given relative to the side to move (White is counter clockwise from their home; Black is clockwise from their home). A move consists of 1-4 checkers being moved, each specified with their from- and to-points. The bar is labeled as point number 25. An x indicates a capture move. In the following text, for each turn the side to move (Black or White) is given, followed by the dice roll and the moves chosen.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
215
Black: Malcolm Davis--White: TD-Gammon 3.0 B 5,1:24-23 13-8; W 4,2:8-4 6-4; B 6,2:24-18 18-16; W 6,4:24-18 13 • MD: A good play. Hitting twice is reasonably close. G T: Good play by TD. Aggressively blitzing with 13 • 9 8 • 2 o1"8 • 2 6 - 2 is not bad, but committal. TD's play keeps more options open and seems to be a more solid all-around positional play, and in fact it comes out ot7 top in the roll-out results. TD. 2 4 18 13 • 9 = 0.252; 13 z 9 8 z 2 = 0.216; 8 z 2 6 - 2 = 0.207.
B 4,2:25-23 13-9; MD." A total toss-up versus 8 - 4 . Going to the 9 point is more my style, ignoring the 3 duplication. G T. Whoops, the roll-outs say that 8 - 4 is slightly better. Not only is it fewer shots, but it's also a better point if missed and covered. Not sure what M D was thinking here. TD. 2 5 - 2 3 8 - 4 = - 0 . 3 8 9 ; 25-23 13-9 =-0.407.
W 6,6:24-18 13-7 13-7 13-7; MD." Not challenging. Going to the 3 point duplicates 1 's and is about a 4% error. G T." I have to confess that this one is beyond me. I ~'ould have held on to the midpoint and slotted the 3 point ~'ith the last six. TD's play gives up a point, leaves two blots instead of one, in not veo' good locations, and yet it wins
FIG. 7. Malcolm Davis (Black) versus T D - G a m m o n 3.0 (White). White to play a 2 - 1 .
216
JONATHAN SCHAEFFER
the roll-out. I guess what's going on is that TD's play leaves a bunch o f builders to make the 5 point, which is perhaps the key point in this position. The midpoint is not so valuable when White already o~rns the 18 point, and playing 9 - 3 makes it very unlikely that White ~rill be able to make the 5 point anytime soon. Well done, TD/ TD." 2 4 - 1 8 1 3 - 7 1 3 - 7 1 3 - 7 = 0.516; 2 4 - 1 8 1 3 - 7 1 3 - 7 9 - 3 = 0.444.
B 4,3:9-5 8-5; MD." High marks to the human. Hitting is a slight error, perhaps more so if the cube is about to be turned, as it creates a more volatile position. TD." 9 - 5 8 5 = -0.479; 2 3 - 2 0 20 x 16 = -0.482.
White doubles, Black takes; MD." This double is apparently a little too aggressive, although not at all unreasonable. GT." Based on 10 O00 full roll-outs with the cube, the double is barely correct. The no-double equity is 0.73, whereas the equity after double/take is 0.75.
W 3,3:8-5 8-5 6-3 6-3; MD." TD's biggest piece movement error. Making the 15 and 3 points is about 2% better. GT." M a n y options here." one can attempt to disengage with 1 8 - 1 5 18-15, safety the blot, make the 5 point, or make the 3 point. TD makes the best play hanging back on the 18 point and keeping all full-contact options open. However, the position thematically points to disengaging since White is far ahead in the race and has gotten all the back checkers out. The roll-outs reveal that this is better than TD's choice. TD." 1 8 - 1 5 1 8 - 1 5 6 - 3 6 3 = 0.547; 8 - 5 8 - 5 6 - 3 6 - 3 = 0.522.
B 1,1:13 x 12 12-11 11-10 6-5; MD." M y biggest e r r o r - - I was playing quickly and didn't consider the much better 6 - 4 6 - 4 . GT." A tough choice for MD. He can hit the blot while he has the chance, or wait and build up his board with 6 - 4 6 - 4 and hope that TD has trouble clearing the 18 point. I f he's going to hit, perhaps he should hit safely with 13 x 12 1 3 - 1 2 1 3 - 1 2 6 - 5 . The choice is not clear to me, but the roll-outs say that M D ' s play is a big mistake. TD. 6 - 5 6 - 5 5 - 4 5 - 4 = - 0 . 4 6 6 ; 13 x 12 1 3 - 1 2 1 3 - 1 2 6 - 5 = -0.489; (5th ranked move) 13 x 12 1 2 - 1 1 1 1 10 6 - 5 = -0.549.
W 5,2:25 x 23 23-18; B 5,1:10-5 5-4; W 6,3:18-15 15-9; B 4,2:8-4 5-3; W 4,1:9-5 7-6; B 5,1:8-3 6-5; MD." A photo compared to 6 - 5 6 - 1 . GT." Bold play by M D , risking an immediate f a t a l hit in order to keep a nice inner board structure. I would have chickened out and played safe with 6 - 1 4 - 3 . The roll-outs come out about equal. TD." 6 - 1 4 - 3 = -0.535; 8 - 3 6 - 5 = -0.544; 6 - 5 6 - 1 = -0.545.
W 6,5:6-1 9-3; B 5,2:8-3 6-4; MD." Perhaps a very good play. G T." I f M D slots the ace point, it could be a liability in the event o f immediate action after TD rolls a five. Instead, he
THE GAMES COMPUTERS (AND PEOPLE) PLAY
217
chooses 6 - 4 , avoiding the blot, but making his o~l'n jives a~l'k~l'ard. The rollouts give a statistically insign(fi'cant edge to 3 - 1 . TD." 8 - 3 3 - 1 = -0.401; 8 - 3 6 - 4 = -0.414.
W 2,1:3-1 5-4; B 5,1" 6-1 3-2; MD." Perhaps a very good play by the human. Leaving only one blot gives up about 2 % . G T." Talk about fives being a~vk~vard,r M D makes an outstanding play here, leaving three blots in the home board, but keeping a smooth distribution and hoping to straighten everything out next turn. I would have played the more craven 4 - 3 or 5 - 4 , ~'hich turns out badly in the roll-outs. Being able to play the bad rolls ~'ell is the hallmark o f a champion. TD" 6 - 1 3 - 2 = - 0 . 6 5 7 ; 6-1 4-3 = -0.716; 6-1 5 - 4 = -0.728.
W 2,1" 18-16 18-17; (see Fig. 7) M D : A fine play. G T." Brian Sheppard, ~rho ~'as in the audience at the time, applauded TD for this spectacularly un-computer-like play. White could easily leave no blots with 7 - 6 7-5, and keep all the points ~'ith 4 - 1 . However, TD realizes that it's ahead in the race and behind in timing, and that if it waits on the 18 point, it may well have to break it next turn, when M D ' s board will likely be cleaned up and much stronger. So this is an excellent time to run with 1 8 - 1 7 18-16. Black's board is such a mess that he probably won't hit even if he can. TD" 1 8 - 1 7 1 8 - 1 6 = 0.718; 18-16 16-15 = 0.668; 4 - 3 3 - 1 = 0.558; 7 - 6 7 - 5 = 0.550.
B 4,2:5-1 4-2; W 4,1:16-15 15-11; MD" Straightforward. GT." Another fine play by TD. up and leave one blot and fel~'er shots with 17-16 7-3. difficult point to cleat" next turn, plus it's not a good accomplish both hitting and escaping with ,fives next 11 = 0.424; 1 7 - 1 6 7 - 3 = 0.392.
It's tempting to button Ho~vever, this leaves a idea to allo,~' Black to turn. TD. 16-15 1 5 -
B 3,3:13-10 13-10 10-7 10-7; W 3,2:11-9 9-6; B 4 , 3 : 7 - 4 6-2; MD." Close, but the best play. G T: M D saves a six in the outfield so he won't have to break the 23 point next turn. An eminenth' reasonable idea, but curiously 7 - 3 7 - 4 comes out a tin)' bit better in the roll-outs, quite possibly due to sampling noise. TD." 7 - 4 7 - 3 - - 0 . 7 6 3 7 - 4 6 - 2 = -0.777.
W 2,1" 6 - 5 17-15; B 6,4:23-17 7-3; MD." Not clear and very close. G T." A tough choice. M D boldly breaks the anchor and leaves two blots, rather than ~'reck his board with 7-1 5 - 1 . Comparing apples and oranges is often d(ffl'cult for humans and, here, the rollouts say that safety is better. TD: 7 - 1 5 - 1 = -0.822; 2 3 - 1 7 7 - 3 = -0.865.
W 4,1" 7 - 3 3 • 2; MD" TD is fearless. Hitting is right by a huge margin. G T: A scar)' play, but it's often been said that "Computers don't get scared." TD: 7 - 3 3 • 2 = 0.575; 7 - 6 7 - 3 = 0.514.
218
JONATHAN SCHAEFFER
B 2 , 1 : 2 5 - 2 3 17-16; W 2,1: no move; Black redoubles, White passes; MD." Against a human it would be right to double if there's any chance at all that the cube might be taken. Ho~'ever, there's no chance of that against a [ TDGammon], so this is a small cube error. G T." Whoops/This position is actually too good to redouble/Black does slightly better by holding the cube and trying to win a gammon by picking up the second blot. 10 000 roll-outs with the cube indicate an equity advantage of 0.04 to playing on instead of cashing. T e s a u r o ' s p o s t m o r t e m analysis of the m a t c h strongly suggests that TDG a m m o n was the better player [55]: I rolled out every position in the Davis-TD match where the doubling cube was turned (full roll-outs with the cube, no settlements). There were 130 such positions. In 72 positions, TD-Gammon doubled: TD made 63 correct doubles and 9 incorrect doubles; total equity loss 1.25. MD made 56 correct take/pass decisions and 16 incorrect; total equity loss 2.60. In 58 positions, Malcolm Davis doubled: MD made 46 correct doubles and 12 incorrect; total equity loss 1.58. TD made 54 correct take/pass decisions and 4 incorrect; total equity loss 0.19. Of course, to get the whole story, we also have to check all the positions where a player could have doubled but didn't. It's infeasible to roll out all these positions, but I did do roll-outs of each of the 130 "turn-before" positions, to see if a player missed a double the turn before the cube was actually offered. To summarize those results: In 72 positions, TD correctly waited in 67 and missed doubles in 5; total equity loss 0.25. In 58 positions, MD correctly waited in 45 and missed doubles in 13; total equity loss 1.24. However, 4 of MD's "errors" were at the end of the match when he was playing conservatively to protect his match lead. If we ignore these then he only missed doubles in 9 positions, for a total equity loss of 0.61. Malcolm has also done a preliminary analysis with Jellyfish [a commercial program] of the checker plays, which indicated that TD played better. (The fact that TD obtained more opportunities to double than MD also suggests it was moving the pieces better.)
3.2
Bridge
W o r k on c o m p u t e r bridge began in the early 1960s ([60], for example), but it wasn't until the 1980s that m a j o r efforts were made. The advent of the
THE GAMES COMPUTERS (AND PEOPLE) PLAY
219
personal computer spurred on numerous commercial projects that resulted in programs with relatively poor capabilities. Perennial world champion Bob Hamman once remarked that the commercial programs "would have to improve to be hopeless" [61]. A similar opinion was shared by another frequent world champion, Zia Mahmood. In 1990, he offered a prize of s 000 000 to the person who developed a program that could defeat him at bridge. At the time, this seemed like a safe bet for the foreseeable future. In the 1990s, several academic efforts began using bridge for research in artificial intelligence [61-65]. The commercial Bridge Baron program teamed up with Dana Nau and Steve Smith from the University of Maryland. The result was a program that won the 1997 world computer bridge championship. The program used a hierarchical task network for the play of the hand. Rather than building a search tree where each branch was the play of a card, they would define each branch to be a strategy, using human-defined concepts such as finesse and squeeze [64, 65]. The result was an incremental improvement in the program's card play, but it was still far from being world-class caliber. Beginning in 1998, Mathew Ginsberg's program GIB started dominating the computer bridge competition, handily winning the world computer bridge championship. The program started producing strong results in competitions against humans, including an impressive result in an exhibition match against world champions Zia Mahmood and Michael Rosenberg. The match lasted 2 hours, allowing 14 boards to be played. The result was in doubt until the last hand, before the humans prevailed by 6.31 International Match Points (IMPs). This was the first notable man-machine success for computer bridge-playing programs. Zia Mahmood, impressed by the rapid progress made by GIB, withdrew his million pound prize. GIB was invited to compete in the Par Contest at the 1998 world bridge championships. This tournament tests the contestant's skills at playing out bridge hands. In a select field of 35 of the premier players in the world, the program finished strongly in 12th place. Michael Rosenberg won the event with a score of 16 850 out of 24 000; GIB scored 11 210. Of the points lost by GIB, 1000 were due to time (there was a 10 point penalty per minute spent thinking), 6000 were due to GIB not understanding the auction, and 6000 were due to GIB's inability to handle some hands where the correct strategy involves combining different possibilities [61]. The latter two issues are currently being addressed.
3.2.1
GIB
The name GIB originally stood for "Goren In a Box", a tribute to one of the pioneers of bridge. Another interpretation is "Ginsberg's Intelligent Bridge."
220
JONATHAN SCHAEFFER
To play out a hand, a variation of a l p h a - b e t a search can be used. The average branching factor is roughly 4. A l p h a - b e t a pruning and transposition tables reduces it to approximately 1.7. Ordering moves at interior nodes of the search to favor those moves that give the opponent the least number of possible responses (i.e., preferring small sub-trees over large ones), further reduces the branching factor to 1.3. Given the depth of the search (to the end of the hand; possibly a tree of depth 52), the trees are surprisingly small (on the order of 106 nodes). Ginsberg's partition search algorithm is used to augment the search [63]. Partition search is a "smart" transposition table, where different hands that have inconsequential differences are treated as the same hand, significantly increasing the number of table hits. For example, from a transposition table's point of view, the hands " 6 K Q 8 4 2" and " 6 K Q 8 4 3" are different. However, by representing the entry as " 6 K Q 8 X X", where "X" denotes any small card, the analysis of the first hand can be applied to the second hand. The result of adding partition search reduces the average search tree size for a deal to a remarkably small 50 000 nodes. To decide how to play a hand, GIB uses a simulation [61]. For each trial, cards are dealt to each opponent that are consistent with the play thus far. Typically 50 deals are used in the simulation; the card play that results in the highest expected number of tricks won is chosen to be played. Simulations are not without their disadvantages. An important component to the play of the hand are so-called information-gathering plays. A trick is played (and possibly lost) to reveal more information on the makeup of the opponent's hands. Unfortunately, since a simulation involves assigning cards to the opponents, the program has perfect knowledge of where all the cards lie and, within a given trial, information gathering plays are not needed! This demonstrates a limitation of perfect-information variants of imperfectinformation reality. Most previous attempts at bridge bidding have been based on an expertdefined set of rules. This is largely unavoidable, since bidding is an agreedupon convention for communicating card information. GIB takes this one step further, building on the ability to quickly simulate a hand [61]. The program has access to a large database of bidding rules (7400 rules from the commercial program Meadowlark Bridge). At each point in the bidding, GIB queries the database to find the set of plausible bids. For each bid, the rest of the auction is projected using the database, and then the play of the resulting contract is simulated. GIB chooses the bid that leads to the average best result for the program. Although intuitively appealing, this approach does have some problems. Notably, as with opening books in other games, the database of rules may have gaps and errors in it. Consider a rule where the response to the bid 46
THE GAMES COMPUTERS (AND PEOPLE) PLAY
221
is incorrect in the database. GIB will direct its play towards this bid because it assumes the opponents will make the (likely bad) database response. As Ginsberg writes [61], it is difficult to distinguish a good choice that is successful because the opponent has no winning options from a bad choice that appears successful because the heuristic fails to identify such options. GIB uses three partial solutions to the problem of an erroneous or incomplete bidding system. 9 First, the bidding database can be examined by doing extensive off-line computations to identify erroneous or missing bid information. This is effective, but can take a long time to complete. 9 Second, during a game, simulation results can be used to identify when a database response to a bid leads to a poor result. This may be evidence of a database problem, but it could also be the result of effective disruptive bidding by GIB. 9 Finally, GIB can be biased to make bids that are "close" to the suggested database bids, allowing the program the flexibility to deviate from the database. To summarize, GIB is well on the way to becoming a world-class bridge player. The program's card play is already at a world-class level (as evidenced by the Par Contest result), and current efforts will only enhance this. The bidding needs improvement, and this is currently being addressed. Had Zia M a h m o o d not withdrawn his offer, he might have lost his money early on in the 21st century.
3.2.2
The Best of
Computer Bridge
The hand shown in Fig. 8 is board l l of the 1998 exhibition match between GIB and world champions Zia M a h m o o d and Michael Rosenberg, held at the annual conference of the American Association for Artificial Intelligence in 1998. The humans won the match by 6.31 IMPs over 14 deals. The hand was analyzed by Mike Whittaker and reported in Bridge Magazine [66].8 Commenting on GIB's defensive play, Whittaker writes: GIB1, West, led with a small ~, won by the Queen. GIB2 switched to a ~, won by dummy's Jack. Leading a qP to the King ~,'on, and Rosenberg then led a ~ , won by GIB1. A second {7 lead ~l'as ~l'on by the Ace and Rosenberg tried a 8Reproduced with permission. Minor editing changes have been made to conform with the style of this chapter.
222
JONATHAN SCHAEFFER
North: Zia ~AJ9 (373 0 K Q J 10853 West: GIB1 ~I, 1 0 7 6 2 q~J64
&K
{}72 &A1093 South: Rosenberg ~853 ~)KQ9852 (~A4
East: GIB2 ~KQ4 ~A10 <}96 &876542
&OJ South
West
1r 3r 4(} pass
pass pass pass pass
North 1(} 3(} 3~ 4(3
East pass pass pass pass
Opening lead: 2~ FIG. 8. M a h m o o d - R o s e n b e r g versus two GIBs.
to the Jack, losing to the King. GIB2 cashed the Ace ~ before leading a small to dummy's Ace. Rosenbergfound himself locked in dummy, forced to lead a ~. This had the effect of promoting the Jack r for GIB2 and Rosenberg finished two down. Finally, we come to the FAQ (Frequently Asked Question)." will the computers ever triumph against top quality human opposition? The idea has always been laughed at but I would not be too complacent. Before long the sheer computing power of the computer will give it a definite edge over even the best human declarer in contracts that require technical expertise. However, I think that the complexities of the bidding language, the use of
THE GAMES COMPUTERS (AND PEOPLE) PLAY
223
deception in play and defense and some abstract qualities, such as table presence, will keep the humans ahead, at least for a while. Figure 9 is used to illustrate GIB's stellar play of the hand. The analysis was done by O n n o Eskes and reported in I M P magazine [67]. 9
West opens 2 &, showing a weak hand ~'ith both major suits. Unpleasant, but on the other hand, it becomes a lot easier for us to stay out of a Q contract. We confidently reach 7 {). West leads the Jack h. I greet dummy with approval. "Well bid," I remember thinking. I count five trumps, twice AceKing-Queen, Ace h and a ruff in dummy. What can go wrong? Trumps four-zero. I f left-hand opponent has them all, I will go down. I f opponent on the right has them, I can finesse against his Jack. But then I cannot ruff a ~ anymore. Well, then they'd better not be four-zero. I take Ace h and lead a ~ to the Ace. West discards a h. I curse under my breath and start thinking again. Are there any chances left? In &, maybe? I f those are four-four, I can discard a loser on the fifth &. But I have only one entry left. I should have started by ruffing a &. "One down," I concede, "how are the clubs divided? .... Four-four," is the
North
ibA3 q) 74 (}Q874
&AKQ63 West
East
South
6q2 q) A N Q 1 0 8 3 {7 A K 1 0 3 2
A Contract: 7{~ Opening lead: Jib FIG. 9. Illustrating GIB's play of the hand.
9Reproduced with permission. Minor editing changes have been made to conform with the style of this chapter.
224
JONATHAN SCHAEFFER
painful reply. Fortunately for us, exactly the same thing happened at the other table/ The next day I am still disgusted with the hand. It is a nice problem, however. I decide to present the hand to GIB .... I enter the hands and the auction. I also enter the explanation of the bids (West at least 4 - 4 in the majors, less than opening strength) and the opening lead. Then the computer starts to bubble. After 30 seconds it produces Ace ~. I tell it which card East plays to the first trick. Again 30 seconds of thinking. Ace ~ / I let East and West follow small. The computer discards Queen ~. Another 30 seconds. Small &, ruffed in hand/Forlorn, I ~'atch the computer finish the rest of the play in immaculate fashion. Ace ~ (discovering the bad trump split), {) to the Queen, King &, Queen & and a good &. East ruffs, South over-ruffs and he can now ruff his last ~) in dummy. Beaten by the computer/ The humiliation is complete when the machine subtly announces that it just scored plus 2140.
3.3
Checkers
Arthur Samuel began thinking about a checkers (draughts) program in 1948 but did not start coding until a few years later. He was not the first to write a checkers-playing program; Strachey pre-dated him by a few months [68]. Over the span of three decades, Samuel worked steadily on his program with performance taking a back seat to his higher goal of creating a program that learned. Samuel's program is best known for its single win against Robert Nealey in a 1963 exhibition match. From this single game, many people erroneously concluded that checkers was a "solved" game. In the late 1970s, a team of researchers at Duke University built a strong checkers-playing program that defeated Samuel's program in a short match [69]. Early success convinced the authors that their program was possibly one of the 10 best players in the world. World champion Marion Tinsley effectively debunked that, writing that: "The programs may indeed consider a lot of moves and positions, but one thing is certain. They do not see much!" [70]. Efforts to arrange a match between the two went nowhere and the Duke program was quietly retired. Interest in checkers was rekindled in 1989 with the advent of strong commercial programs and a research effort at the University of Alberta: Chinook. Chinook was authored principally by Jonathan Schaeffer, Norman Treloar, Robert Lake, Paul Lu, and Martin Bryant. In 1990, the program earned the right to challenge for the human world championship. The checkers federations refused to sanction the match, leading to the creation of a new title: the world m a n - m a c h i n e championship. This title was contested for the first time in 1992, with Marion Tinsley defeating
THE GAMES COMPUTERS (AND PEOPLE) PLAY
225
Chinook in a 40-game match by a score of 4 wins to 2. Chinook's wins were the first against a reigning world champion in a non-exhibition event for any competitive game. There was a rematch in 1994, but after six games (all draws), Tinsley resigned the match and the title to Chinook, citing health concerns. The following week he was diagnosed with cancer, and he died 8 months later. Chinook has subsequently defended its title twice, and has not lost a game since 1994. The program was retired in 1997 after it became clear that there was no living person whose abilities came close to that of the program [42]. Chinook is the first program to win a human world championship. At the time of this writing, the gap between Chinook and the highest-rated human is 200 rating points (using the chess rating scale) [42], making it unlikely that humans will ever improve to Chinook's level of play.
3.3. 1
Chinook
In his 1960 Advances in Computers chapter, Samuel felt bad about using the article to describe his work instead of his predecessor, Strachey. The same comment that Samuel wrote in 1960 applies to this section, after substituting Schaeffer's name for Samuel's and replacing Strachey with Samuel: While it is grossly unfair to dismiss Samuel's work in a single paragraph and to discuss the present writer's own efforts in some detail, in the interests of conciseness this will have to be done. Perhaps such high-handed behavior can be excused if the writer publicly apologizes for his action, as he does now, and publicly acknowledges the credit which Dr. Samuel is due. The structure of Chinook is similar to that of a typical chess program: search, knowledge, opening book, and endgame databases [42, 71]. Chinook uses alpha-beta search (NegaScout) with a myriad of enhancements including iterative deepening, transposition table, history heuristic, search extensions, and search reductions. Further performance is provided by a parallel search algorithm. With 1994 technology, an 18-processor Silicon Graphics Power Challenge, Chinook was able to average a minimum of 19ply searches against Tinsley with search extensions occasionally reaching 45 ply into the tree. The median position evaluated was typically 25-ply deep into the search. The search depths achieved are usually sufficient to uncover most tactical threats in a position, but they are inadequate to resolve positional subtleties. Hence, considerable computational effort is devoted to identifying promising lines of play to extend the search, and futile lines to reduce the search depth. Experiments show that a program searching 17-ply plus extensions
226
JONATHAN SCHAEFFER
will defeat a program going 23-ply deep without extensions (each program used the same amount of time for each search). By most standards, giving up 6-ply of search for the extensions is extraordinarily high. However, players like Tinsley have consistently demonstrated an ability to analyze 30-ply deep (or more), so Chinook has to be able to match this capability. The evaluation function was manually tuned over a period of 5 years. For each of four game phases, it consists of 25 features combined by a linear function. Interestingly, most of the features in Samuel's program were not used in C h i n o o k - - m a n y of them were there to overcome the limitations of the shallow search depths that Samuel could achieve using 1960s hardware. The evaluation function was carefully tuned by a checkers expert through extensive trial-and-error testing. Attempts at automatically tuning the evaluation function were unsuccessful. There were two major improvements to the evaluation function that are of interest. 9 In 1992 a major change enhanced the program's knowledge but resulted in a two-fold reduction in the number of positions that the program could analyze per second (effectively costing it one ply of search) [42]. Despite the reduced search proficiency, the new knowledge significantly improved the quality of the evaluations, resulting in a stronger program. This was strong evidence that Chinook's search depths were hitting diminishing returns for additional search efforts [39]; more was to be gained by the addition of useful knowledge than additional search. 9 The second refinement was allowing the sum of positional scores to be able to exceed the value of a checker. In principle, this is dangerous since the program may prefer large positional scores over material ones. However, a critical component in human grandmaster play is the ability to recognize the exceptions; when material is a secondary consideration. Adding this capability eliminated a serious source of errors, and was a major reason for Chinook's excellent result in the 1992 world m a n - m a c h i n e championship. Chinook uses an endgame database containing all checkers positions with eight or fewer pieces. This database has 444 billion (4 x 1011) positions, compressed into 6 Gbytes for real-time decompression. Unlike chess programs which are compute-bound, Chinook becomes I/O-bound after a few moves in a game. The deep searches mean that the database is occasionally being hit on the first move of a game. The databases introduce accurate values (win/loss/draw) into the search, reducing the heuristic error. In many games, the program is able to backup a draw score to the root of a
THE GAMES COMPUTERS (AND PEOPLE) PLAY
227
search within 10 moves by each side from the start of a game. This suggests that it may be possible to determine the game-theoretic value of the starting position of the game (one definition of "solving" the game). Chinook has access to a large database of opening moves compiled from the checkers openings literature. This extensive opening book allows the program to play its first moves from the opening database, come out of the book, and then usually be able to search deeply enough to find its way into the endgame database. This implies that the window for program error is very small. In the 1994 Chinook-Tinsley match, five of the six games followed this pattern (in the other game, Tinsley made an error and Chinook had to try for a win). All the positions in the opening book were verified using at least 19-ply searches. This uncovered numerous errors in the published literature. A database of all known games played by Marion Tinsley was compiled. When the program was out of its opening book, this database could be used to bias the search. For example, when playing the weaker side of an opening, the program would include a favorable bias towards any move that Tinsley had previously played in this position. The idea is that, since Tinsley rarely made a mistake, his move is likely to be the right choice. When playing the stronger side of the opening, the database was used for a different purpose. By biasing the search against moves suggested by the database, the program could increase the chances of playing a new move, thereby throwing the human opponent onto their own resources (increasing the chance of human error). Arthur Samuels' program did not come close to reaching the pinnacle of checkers. In part, this was because of the limited hardware resources that he had available to him at the time. But it was also due to his insistence on developing a program that learned everything by itself. Samuel wrote in his 1960 chapter that "suggestions that [I] incorporate standard openings or other forms of mandevised checker lore have been consistently rejected .... [I] refuse to pass judgment on whether the program makes good moves for the right reasons, demanding instead, that it develop its own reasons" [72]. Ironically, a major reason for the success of Chinook was the use of the "man-devised" lore that Samuel consistently rejected.
3.3.2
The Best of Computer Checkers
In the 1992 world m a n - m a c h i n e championship, the match was tied at one win apiece at the start of game 14. The annotations are based on those appearing in [42]. Comments in italics are from Marion Tinsley. The game notation is identical to that used in chess: columns are labeled " a " to "h"
228
JONATHAN SCHAEFFER
from left to right, and the rows are labeled 1 to 8 from bottom to top. An x indicates a capture move.
Black: Marion Tinsley~ White: Chinook 1. h 6 - g 5 a 3 - b 4 2. b 6 - c 5 Since the standard starting position is drawish, tournament checkers uses the so-called "three move ballot." The first three moves of the game (two by Black, one by White) are chosen randomly, resulting in some interesting, lop-sided starting positions. Hence two games are played for each opening, with the opponents switching colors after the first game. At the time of this match, 142 openings had been approved for tournament play. Chinook has to have opening knowledge for both sides of all the openings. 2 .... b 4 - a 5 3. g 5 - f 4 g3 x e5 4. d6 x f4 e3 x g5 5. f6 x h4 d 2 - e 3 6. g 7 - f 6 c 3 - d 4 7. e 7 - d 6 d4 x b6 8. a7 x c5 b 2 - c 3 9. h 8 - g 7 c 3 - d 4 Chinook is now out of its opening book. The program reports having a small advantage. 10. c 5 - b 4 e l - d 2 11. d 6 - e 5 d 4 - c 5 12. g 7 - h 6 I once laughed at [grandmaster Willie] R y a n f o r f o r g e t t i n g his own published play, but no more. r Back in 1948, I gave the better f 6 - g 5 to draw ... Pat McCarthy [a top British player] later asked me why I didn't take this simple route. The answer? I had simply forgotten it.pTinsley seems to think g 7 - h 6 is
a bad move, but Chinook sees nothing wrong with it. 12 .... h2-g3 Chinook expects c 7 - d 6 in reply, with an even game. 13. h 6 - g 5 Tinsley has no comment on this move, but Chinook thinks it is a major mistake. Analysis conducted months after the game indicates that after h 6 g5 the game is probably lost. Tinsley attaches the blame to his previous move which apparently leads to a position that he feels uncomfortable with, even though it leads to a draw (assuming perfect play). 13 .... a l - b 2 14. b 4 - a 3 e 5 - b 6 Suddenly Chinook believes it has a huge advantage (over one-half of a checker). This may not be obvious by looking at the position; it is the result of searching over 20 plies into the future. 15. f 8 - g 7 b 6 - a 7 This illustrates an important lesson about putting knowledge into a program: every piece of knowledge has its exceptions. Normally, putting a man into the dog-hole (a7 for White; h2 for Black) is bad, but here it turns out to be very strong. Unfortunately, Chinook always penalizes the doghole; it does not understand any of the exceptions. After this b6-a7, I never saw a glimpse o f a draw.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
229
16. e7-d6 a 5 - b 6 See Fig. 10a. Incredible! Chinook is sacrificing a checker against Tinsley. Prior to this match, Chinook had a history of misassessing these type of sacrifices, resulting in some bad losses. Tinsley himself identified this as a serious weakness in the program. Fortunately, a few days before the match began, a serious problem in Chinook's evaluation function was uncovered (and fixed) that explained the program's poor,play in these type of positions. 17. b 8 - e 7 a 7 - b 8 = k Now the preceding moves make sense. This sacrifice has been in the works for a few moves now and Tinsley has been avoiding it. Now he has run out of safe moves and is forced to accept it. It is hard to believe that Black can survive. White has a mobile king and strong back rank (making it hard for Black to get a king). What is Black to do? 18. c7 x a5 b 8 - a 7 19. d 8 - e 7 a 7 - b 8
Is Chinook winning? The position looks very strong, but Chinook reports only a moderate advantage. The program intended to play b2-c3, but at the last minute switches to a 7 - b 8 by a narrow margin. The alpha-beta search differentiates between the moves by an insignificant (and likely random) 3/100ths of the value of a checker. After the game, Tinsley revealed that he was praying for b2-c3, as he claimed that it led to a draw. a 7 - b 8 may be the only winning move. 20. g7-h6 g l - h 2 21. d 6 - e 5 b 8 - e 7 22. e 5 - d 4 e7-d6 Winning back the checker with a dominating position. Unfortunately, Chinook's score does not reflect this. It was searching so deep that it must have found a way for Black to extricate himself. 23. f 6 - e 5 d6 • f8 24. g 5 - f 4 e3 • g5 25. h6 • f4 f 8 - e 7 26. c 5 - b 4 d 2 - c 3
At this point, Chinook's analysis revealed why the assessment had been low over the past few moves: it saw that it was winning a checker but thought Black could draw despite the checker deficit. Now the search is able to see far enough ahead to realize that the draw is unlikely.
FIG. 10. Tinsley (Black) versus Chinook (White).
230
JONATHAN SCHAEFFER
27. b4 x d2 el x e3 x g5 28. a3 x el = k e7-d6 29. d4-c3 d6 x f4 30. c3-d2 g5-f6 Chinook announces it has found a forced win in its endgame database. For this match, the program had access to all the seven-piece positions, and a small portion of the eight-piece endgames. 31. d 2 - e l = k f6-e7 32. c l - b 2 f4-e3 33. b2-c3 e7-f8 = k 34. c3-b4 e3-d4 Tinsley resigns. See Fig. 10b. The winning line goes as follows: b 4 - a 3 fS-e7 a 5 - b 4 d4-e3 b4-c3 e7-d6 c 3 - b 2 d6-e5 b 2 - c l = k, and now g3-f4 frees White's checkers. After el x g3, then f4-g5 surprisingly traps the king. The dominant White kings control the center of the board, keeping Black's pieces at bay. After Tinsley extended his hand in resignation, the crowd rushed forward to congratulate Tinsley. Congratulate Tinsley? "That's a fine draw," exclaimed Grandmaster Con McCarrick, the match referee. Once the truth was revealed, the spectators were stunned. The audience thought that Tinsley had found a beautiful drawing line! With Chinook leading 2-1, Tinsley looked like he was in trouble. However, Chinook forfeited game 18 due to technical problems and then was out-played in game 25. In the last game of the match, trailing 3 - 2 and needing a win, Chinook was preprogrammed to treat a draw as a loss. The program saw a draw, rejected it, and went on to lose the game and the match.
3.4
Chess
The progress of computer chess was strongly influenced by an article by Ken Thompson which equated search depth with chess-program performance [37]. Basically, the paper presented a formula for success: build faster chess search engines. The milestones in chess program development become a statement of the state-of-the-art in high-performance computing: 9 1980-82: Thompson's Belle, the first program to earn a US master title, was a machine built to play chess. It consisted of 10 large wirewrapped boards using LSI chips [73]. 9 1983-84: Cray Blitz used a multi-processor Cray supercomputer [74]. 9 1985-86: The Hitech chess machine was based on 64 special-purpose VLSI chips (one per board square) [28, 75]. 9 1985-86: Waycool used a 256-processor hypercube [76]. 9 1987-present: ChipTest (and its successors Deep Thought and Deep Blue) took VLSI technology even further to come up with a chess chip [38, 77, 78].
THE GAMES COMPUTERS (AND PEOPLE) PLAY
231
In 1987, ChipTest shocked the chess world by tying for first place in a strong tournament, finishing ahead of a former world champion and defeating a grandmaster. The unexpected success aroused the interest of world champion Garry Kasparov, who played a two-game exhibition match against the program in 1989. Man easily defeated machine in both games. The Deep Blue team worked for seven years on improving the program, including designing a single-chip chess search engine and making significant strides in the quality of their software. In 1996, the chess machine played a six-game exhibition match against Kasparov. The world champion was stunned by a defeat in the first game, but he recovered to win the match, scoring three wins and two draws to offset the single loss. The following year, another exhibition match was played. Deep Blue scored a brilliant win in game two, handing Kasparov a psychological blow that he never recovered from. In the final, decisive game of the match, Kasparov fell into a trap and the game ended quickly. This gave Deep Blue an unexpected match victory, scoring two wins, three draws and a loss. It is important to keep this result in perspective. First, it was an exhibition match; Deep Blue did not earn the right to play Kasparov. 1~ Second, the match was too short to accurately determine the better player; worldchampionship matches are typically at least 20 games long. Although it is not clear just how good Deep Blue is, there is no doubt that the program is a strong grandmaster. What does the research community think of the Deep Blue result? Many are filled with admiration at this feat of engineering. Some are cautious about the significance. John McCarthy writes that [79]: In 1965, the Russian mathematician Alexander Kronrod said, "Chess is the Drosophila 11 of artificial intelligence." However, computer chess has developed much as genetics might have if the geneticists had concentrated their efforts starting in 1910 on breeding racing Drosophila. We would have some science, but mainly we would have very fast fruit flies. In retrospect, the chess "problem" turned out to be much harder than anyone thought in Samuel's time. The Deep Blue result is a tremendous achievement, and a milestone in the history of both artificial intelligence and computing science. 10To be fair, it is unlikely that the International Chess Federation will ever allow computers to compete for the world championship. 11Drosophila is the fruit fly. The analogy is that the fruit fly is to genetics research as games are to artificial intelligence research.
232
JONATHAN SCHAEFFER
From the scientific point of view, it is to be regretted that Deep Blue has been retired, the hardware unused, and the programming team disbanded. The scientific community has a single data point that suggests machine might be better than man at chess. The data is insufficient and the sample size is not statistically significant. Moreover, given the current lack of interest in Deep Blue from IBM, it is doubtful that this experiment will ever be repeated. Of what value is a single, non-repeatable data point?
3.4. 1 Deep Blue Deep Blue and its predecessors represents a decade-long intensive effort by a team of people. The project was funded by IBM, and the principal scientists who developed the program were Feng-hsiung Hsu, Murray Campbell, and Joe Hoane. Deep Blue's speed comes from a single-chip chess machine. The chip includes a search engine, a move generator, and an evaluation function [38]. The chip's search algorithm is alpha-beta, but it is restricted to always use a minimal window. Transposition tables are not implemented on the chip (it would take too much chip real estate). The search is capable of doing a limited set of search extensions. The evaluation function is implemented as small tables on the chip; the values for these tables can be downloaded to the chip before the search begins. These tables are indexed by board features and the results summed in parallel to provide the positional score. A single chip is capable of analyzing over 2 million chess positions per second. It is important to note that this speed understates the chip's capabilities. Some operations that are too expensive to implement in software can be done with little or no cost in hardware. For example, one capability of the chip is to selectively generate subsets of legal moves, such as all moves that can put the opponent in check. These increased capabilities give rise to new opportunities for the search algorithm and the evaluation function. Hsu estimates that each chess chip position evaluation roughly equates to 40 000 instructions on a general-purpose computer. If so, then each chip translates to a 100 billion instructions per second chess supercomputer [38]. Access to the chip is controlled by an a l p h a - b e t a search algorithm that is implemented on the host computer (an IBM SP-2). Deep Blue uses a l p h a - b e t a with iterative deepening and transposition tables. Considerable effort was devoted to researching search extensions. The Deep Blue team pioneered the idea of singular extensions, using local search to identify forced moves [25]. Other extensions include those for
THE GAMES COMPUTERS (AND PEOPLE) PLAY
233
threats and piece influence [27]. Extensive tuning was done to find the right combination of extensions that maximized the benefits while not causing an explosion in search effort. In Deep Blue, a search extension would increase the search depth by an amount up to 2-ply. The algorithm used fractional extensions (e.g., a threat might increase the search depth by 0.5-ply), allowing several features to combine to cause a search extension. The search has been parallelized. For the 1997 Kasparov match, Deep Blue used a 30-processor IBM SP-2, with each processor connected to 16 chess chips. The parallel search algorithm uses a three-level hierarchy. The first 4-ply are done by a single master process. The leaves of the master's tree are searched an additional four or more ply deeper by the other SP-2 processors. The parallel search running on the SP-2 uses a variant on Hsu's Delayed Branch Tree Expansion algorithm [80]. The leaf nodes of these searches are passed off to the chess chips for additional search. In effect, one could view the chips as performing a sophisticated evaluation using at least a 4-ply search, plus extensions. During the Kasparov match, Deep Blue "only ~' searched 200 million positions per second on average. The maximum hardware speed is roughly 1 billion positions per second (30 processors x 16 chips per processor x 2 million positions per second). The difference reflects both the difficulty of achieving a high degree of parallelism with a l p h a - b e t a , and the team decision that more efficient searching was unlikely to have an impact against Kasparov. The biggest difference in Deep Blue's performance in 1997 compared to 1996 was undoubtedly due to improved chess knowledge. Chess grandmaster Joel Benjamin worked with the team to identify weaknesses in the program's play. The evaluation function consists of over 8000 tunable parameters. Most of the terms are combined linearly to arrive at a position value, but some terms are scaled to create a non-linear relationship. Although several attempts were made to tune the parameters automatically, in the end the tuning was primarily done by hand. The program uses an endgame database of all positions with five or fewer pieces on the board, although this is rarely a factor in a game. The opening book is small, but Deep Blue has access to a large database of grandmaster games. When the program is out of its book, it will query the grandmaster database to find all games where the board position has arisen. All moves for this position retrieved from the database are assessed, based on who played it and how favorably the game ended. These moves receive a positive or negative bias that influences the evaluation of a line of play. Essentially, moves with a history of success are favored, and those with a bad track record are discouraged.
234
JONATHAN SCHAEFFER
3.4.2 The Best o f Computer Chess In the 1997 match against world champion Garry Kasparov, Deep Blue lost the first game, causing many to predict an easy victory for man over machine. The second game astounded the chess world, convincingly demonstrating Deep Blue's championship-caliber skills. In the game notation, columns are labeled a-h from left to right and the rows are labeled 1-8 from bottom to top. An x indicates a capture move.
White: Deep Blue--Black: Gary Kasparov 1. e4 e5 2. ~ f 3 ~)c6 3. )~b5 a6 4. : a 4 5 f6 5. O - O i~e7 6. ~el b5 7 . ~ b 3 d 6 8. c 3 0 - O 9. h3h6 10. d41;e8 ll. Sbd2r 12.5:fl =~d7 13. ~ g 3 ~ a 5 14. ~c2 c5 15. b3 ~ c6 16. d5 ;ie7 17. j~e3 ~%g6 18. ~ d2 Until now, standard opening theory. Kasparov repeatedly claimed that he understood how to play against computers. Indeed this position is consistent with the common perception that computers are weak in closed (blocked) positions. Nevertheless, with his next move Kasparov begins to appear complacent, perhaps underestimating his dangerous foe. 18 .... ~:h7 19. a4 ~: h4 20. ~; x h4 :~:ix h4 21. :;~::e2 ~:d8 Black's last few moves have accomplished nothing except to exchange a pair of knights on the opposite side of the board from all the action. 22. b4 ~ c 7 23. ~ecl With this move, the audience began to sense that something was different with Deep Blue's play, compared to the quality of play seen in the 1996 match. From the human's point of view, this move shows extraordinary insight into the position. At first glance, it looks as if the rook is being moved to a useless square. However, this is a "prophylactic move" that subtly restricts Black's options. The move becomes very strong if Black allows the c-file to become open. 23 .... c4 24. ~a3 See Fig. l la. Another strong move, intending to double the rooks on the a-file. Most computer programs would immediately exchange a-pawns. Joel Benjamin revealed afterward that Deep Blue had a common computer tendency to release tension by exchanging pawns. Special knowledge was added to refrain from these exchanges, thereby maximizing the computer's options in the position. 24 .... ~ec8 25. ~ c a l ~ d 8 26. f4 Another strong positional move that is "obvious" to humans, but usually difficult to find for computers. Having secured the advantage on the queenside, the program now strives to dominate the king-side. Subsequent
THE GAMES COMPUTERS (AND PEOPLE) PLAY
235
FIG. 11. Deep Blue (White) versus Kasparov (Black).
analysis showed that this move may not have been strongest. Although the idea is good, 26. a x b5 a x b5 27. j~a7 would allow White to make inroads on the queen-side. 26 .... ,~ f6 27. f x e5 d x e5 28. -:;,,:fl Another human-like move. 28. '~'f2 may have been even stronger. 28 .... ::;:,e8 29. ~ f 2 :~d6 30. ~ b 6 'i~:e8 31. ~3a2 XLe7 Kasparov has drifted into a horribly passive position. He can only wait for Deep Blue to attack. ~ xd6 36. 32. ~ c 5 ~ f 8 33. 5 f5 + x f5 34. e x f 5 f6 35. + x d 6 axb5 At first glance, '~b6 seems to win material. However, e4 gives Black counter-play. 36 .... a x b 5 37. :~e4 Black's position is miserable, and everyone expected the seemingly crushing 37. ~ b 6 . However, there is a hidden trap: 37 ~ x a 2 38. ~ x a 2 Ra8 38. ~ x a 8 ~ ' x a 8 39. '~'xd6 '~/al+ 40. ::i;h2 '(,~cl with a probable draw. In some lines , Black can play e4 and get (limited) counter-play. 37. ~~e 4 upset Kasparov: the move eliminates all counter-chances. Kasparov couldn't believe that the program would pass up the chance to win material. This position gave rise to considerable controversy after the match. Kasparov's disbelief that a computer was capable of this level of sophistication resulted in his leveling unfounded accusations of cheating against the Deep Blue team. 37 ......~( x a2 38. '~ x a2 Qd7 39. '(~i:a7 L~c7 40. '~4~:b6 ~2b7 41. 7~a8+ @f7 42. '~'a6 ~ c 7 43. ~}yc6 '~'b6+ 44. :;~fl An error, but no one knew it at the time .... 44 .... ~b8 45. ~a6 Kasparov resigns. See Fig. l lb. The audience erupted in applause. History was made! But--incredible as it seems--the final position is a draw! The analysis is long and difficult, but the amazing W ~ e 3 secures a miraculous draw Even the incredible search
236
JONATHAN SCHAEFFER
depths of Deep Blue were incapable of finding this within the time constraints of a game. Much has been made of Kasparov's missed opportunity. However, this distracts the discussion from the real issue: Deep Blue played a magnificent game. Who cares if there is a minor imperfection in a masterpiece? Most classic games of chess contain many flaws. Perfect chess is still an elusive goal, even for Kasparov and Deep Blue. Despite the defeat, even Kasparov had grudging respect for his electronic opponent [81]: In Deep Blue's Game 2 we saw something that went well beyond our wildest expectations of how well a computer would be able to foresee the long-term positional consequences of its decisions. The machine refused to move to a position that had a decisive short-term advantage--showing a very human sense of danger. I think this moment could mark a revolution in computer science ... Kasparov pressed hard for a win in games 3, 4, and 5 of the match. In the end, he seemed to run out of energy. In game 6, he made an uncharacteristic mistake early in the game and Deep Blue quickly capitalized. The dream of a world-class chess-playing program, a 50-year quest of the computing science and artificial intelligence communities, was finally realized.
3.5
Othello
The first major Othello program was Paul Rosenbloom's Iago [82]. It achieved impressive results given its early-1980s hardware. It played only two games against world-class players, losing both. However, it dominated play against other Othello programs of the time. Based on the program's ability to predict 59% of the moves played by human experts, Rosenbloom concluded that the program's playing strength was of world-championship caliber. By the end of the decade, Iago had been eclipsed. Kai-Fu Lee and Sanjoy Mahajan's program Bill represented a major improvement in the quality of computer Othello play [83]. The program combined deep search with extensive knowledge (in the form of precomputed tables) in its evaluation function. Bayesian learning was used to combine the evaluation function features in a weighted quadratic polynomial. Statistical analysis of the program's play indicated that it was a strong Othello player. Bill won a single game against Brian Rose, the highest rated American Othello player at the time. In test games against Iago, Bill won every game. These results led Lee and Mahajan to conclude that "Bill is one of the best, if not the best, Othello player in the world." As usual, there is danger in extrapolating conclusions based on limited evidence.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
237
With the advent of the Internet Othello Server (IOS), computer Othello tournaments became routine. In the 1990s they were dominated by Michael Buro's Logistello. The program participated in 25 tournaments, finished first 18 times, second 6 times, and fourth once. The program combined deep search with an extensive evaluation function that was automatically tuned. This was combined with an extensive opening book and a perfect endgame player. Although it was suspected that in the mid-1990s, computers surpassed humans in their playing abilities at Othello, this was not properly demonstrated until 1997, when Logistello played an exhibition match against world champion Takeshi Murakami. In preparation for the match, Buro writes that [84]: Bill played a series of games against different versions of Logistello. The results showed that Bill, when playing 5-minute games running on a PentiumPro/200 PC, is about as strong as a 3-ply Logistello, even though Bill searches 8 to 9 plies. Obviously, the additional search is compensated for by knowledge. However, the 3-ply Logistello can only be called mediocre by today's human standards. Two explanations for the overestimation of playing strength in the past come to mind: (1) during the last decade human players have improved their playing skills considerably, and (2) the playing strength of the early programs was largely overestimated by using ... non-reliable scientific methods. Logistello won all six games against M u r a k a m i by a total disc count of 264 to 120 [84]. This confirmed what everyone had expected about the relative playing strengths of man and machine. The gap between the best human players and the best computer programs is believed to be large and effectively unsurmountable.
3.5. 1 Logistello Outwardly, Logistello looks like a typical a l p h a - b e t a - b a s e d searcher. The program has a highly-tuned search algorithm, sophisticated evaluation function, and a large opening book. 12 The architecture of the program is illustrated in Fig. 12. The search algorithm is standard a l p h a - b e t a (NegaScout) with iterative deepening, a large transposition table, and the killer heuristic. Corners are a critical region of the board. The program does 12Note that endgame databases are not possible in Othello because, unlike chess and checkers, the number of pieces on the board increases as the end of game approaches.
238
JONATHAN SCHAEFFER
Internet Othello server
Graphical user interface
Public games
Oame-,re searcher 1 I I I
Evaluati~ function
l q
Oame ana,y e co ecto
~ ~
[ M~
Openingbookp l a y ~
a~uat~
I
~ O p ~ / " eningbookconsistingof
/ /
L ~
Self-play games
~ ~
over23 000 tournamentgames+ evaluationsof "best"move alternatives c update
Estimationof feP ethgw ie r i s ~ ~
\
/:
and valuesof patterninstance Trainingset consistingof over 11millionscoredpositions Periodiccorrection FIG. 12. Logistello'sarchitecture [85].
a small quiescense search if there is some ambiguity about who controls a corner. Buro introduced his ProbCut algorithm in Logistello [24]. This search enhancement takes advantage of the Othello property that the results of a shallow search (s-ply) are correlated with the results of a deep search (t-ply). An s-ply search produces a value vs. This value is extrapolated to give the value of a t-ply search yr. The deeper search's estimated value is
THE GAMES COMPUTERS (AND PEOPLE) PLAY
239
then compared to the a l p h a - b e t a search window, and the likelihood that vt will be relevant to this window is computed. If vt is likely to be irrelevant (e.g., is unlikely to reach alpha), then the search is pruned on the basis of the shallow search. The deeper search value can be viewed as being vt=ax
v;+b+e
where a and b are constants and e is the error (normally distributed with a mean of 0 and variance or2). Given s and t, the parameters a, b, and e are determined using linear regression on a large number of samples. For each sample position, searches to depth s and t are performed. In a game, this information can be used to probabilistically eliminate trees using the relationship vr ~
value -
~
'*'p. i x f , . i
(2)
i=1
where p is the phase of the game. 13N o t e that there is no need for a phase for less than 13 discs on the board, since the search from the first move easily reaches 13 or more discs.
-2~~
~'
~ ~ =
=~ ~ o ~
==~.~
~ ~,~ ~ o,~ ,~ ~ ~
~.=
~
-
~
9
.~
~ o~ ~'~-~o~
~=4 ~o ~
~ o o o ~ ~=. ~~~ '~~ ~< ~ . ~ ~ o ~
~
!
~'~
~" ~
~
~ ~~~
oo....~.
~
~
~
o
~-
~-~
~
. ~
-~- ~
~ ~"~~~
~'~.~ ~ ' , ~ ' -
A
. ~~
=
-~ ~
,<
~
"
~
~
~
~
-
~-~
~ x
s
~g.
r.~
o
oO
~- ~ ~
o ~.~.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
241
Deep searches, good evaluation, and a strong opening book are a winning recipe for Othello. Michael Buro comments on the reasons why Logistello easily won the Murakami match [84]: When looking at the games of the match the main reasons for the clear outcome are as follows: 1. Lookahead search is very hard for humans in Othello. The disadvantage becomes very clear in the endgame phase, where the board changes are more substantial than in the opening and middlegame stage. Computers are playing perfectly in the endgame while humans often lose discs. 2. Due to the automated tuning of the evaluation functions and deep selective searches, the best programs estimate their winning chance in the opening and middlegame phase very accurately. This leaves little room for human innovations in the opening, especially because the best Othello programs are extending their opening books automatically to explore new variations.
3.5.2
The Best of Computer Othello
In August 1997, the World Champion Takeshi Murakami played a sixgame exhibition match with Logistello. Having lost the first five games, Murakami fought hard for a win in the last game. Michael Buro, author of Logistello, annotates this game (comments are in italics) [89]. Logistello's analysis is enclosed in [ ]s giving the main lines of play and the final predicted result from Logistello's point of view. Moves are given by specifying the column from left to right, A - H , and the row from top to bottom, 1-8.
White: Logistello~ Black: Takeshi Murakami 1. F5 D6 2. C4 D3 3. C3 F4 4. C5 B3 5. C2 E6 6. C6 B4 7. B5 D2 8. E3 A6 9. C1 B6 10. F3 F6 11. F7 E1 12. E2 F1 13. E7 G3 14. C7 G4 Logistello prefers D7 over Mr. Murakami's G4. tilter D7 the position seems to be quite close. Mr. Murakami's opening and early midgame were flawless in Logistello's view. 15. G5 D l See Fig. 13a According to Logistello's 26-ply selective search, Mr. Murakami's D1 is probably two discs worse than playing F2. 16. G1 F2 17. H3 H4 One move earlier, Mr. Murakami missed the last opportunity to deprive Logistello of its free move to B1. While H5 ,[lips F5 and thereby denies Logistello access to B1, it also leads to a risky edge configuration after the moves H6 and H7. This may be the reason ~rhv Mr. Murakami prefered H4, which, however, loses two discs. Today's best programs would start selective
242
JONATHAN SCHAEFFER
FIG. 13. White: Logistello--Black: Takeshi Murakami. win-loss-draw searches in this position after conducting a >124-ply "midgame" Multi-ProbCut search. This leaves human players with very little room for the slightest errors. [H5 H6 H7 = 4; H4 H5 D8 = 6] 18. H5 C8 Here, Mr. Murakami does not want to break Logistello's wall which would create additional moves for Logistello. One plan is to move into the south-west region (C8 or D8) or to exploit regional parity in the north-east by playing G2. Although Mr. Murakami's G2 gives up corner H I it leaves him with the last move in this region. Mr. Murakami chose C8, losing two discs. [G2 B1 D8 = 6; D8 C 8 D 7 = 6 ; C8 B 1 D 7 = 8] 19. B1 D7 20. H2 E8 21. D8 G6 22. F8 G8 23. H6 G7 24. A2 A3 25. A4 B7 26. A8 B8 27. B2 See Fig. 13b In this game Mr. Murakami never thought he was losing until the late endgame where he was faced with a swindle threat. At first glance, Mr. Murakami seems to have the advantage because the eastern and northern edge configurations look weak for Logistello. However, the only losing move in this position is G2 which allows Mr. Murakami to grab H I and to secure enough edge and interior discs later on. The optimal move is B2 which Mr. Murakami had not anticipated in his earlier calculations. 27 .... A1 This move creates a so-called swindle in the south-east corner region, meaning that Logistello gets both remaining moves (H8 and H7) there--winning by 10. A little better is A5 which loses by 8. [A5 A7 A1 H8 = 8; A1 H8 HI H7 = 10] 28. H8 H1 29. H7 G2 30. A5 A7 Logistello wins by 10 discs." 37-27.
3.6
Poker
There are many popular poker variants. Texas Hold'em is generally acknowledged to be the most strategically complex variant of poker that is widely played. It is the premier event at the annual world series of poker.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
243
Until recently, poker has been largely ignored by the computing academic community. However, poker has a number of attributes that make it an interesting domain for mainstream artificial-intelligence research. These include:
9imperfect knowledge (the opponent's hands are hidden), 9multiple competing agents (more than two players), 9risk management (betting strategies and their consequences), 9agent modeling (identifying patterns and weaknesses in the opponent's strategy and exploiting them), 9deception (bluffing and varying your style of play), and 9dealing with unreliable information (taking into account your opponent's deceptive plays). All of these are challenging dimensions to a difficult problem. There are two main approaches to poker research [90]. One approach is to use simplified variants that are easier to analyze. However, one must be careful that the simplification does not remove challenging components of the problem. For example, Findler worked on and off for 20 years on a poker-playing program for 5-card draw poker [91]. His approach was to model human cognitive processes and build a program that could learn, ignoring many of the interesting complexities of the game. The other approach is to pick a real variant, and investigate it using mathematical analysis, simulation, and/or ad-hoc expert experience. Expert players with a penchant for mathematics are usually involved in this approach. None of this work has led to the development of strong pokerplaying programs. There is one event in the meagre history of computer poker that stands out. In 1984 Mike Caro, a professional poker player, wrote a program that he called Orac (Caro spelled backwards). It played one-on-one, no-limit Texas Hold'em. Few technical details are known about Orac other than it was programmed on an Apple II computer in Pascal. However, Caro arranged a few exhibitions of the program against strong players [92]: It lost the TV match to casino owner Bob Stupak, but arguably played the superior game. The machine froze on one game of the two-out-of-three set when it had moved all-in and been called with its three of a kind against Stupak's top two pair. Under the rules, the hand had to be replayed. In the [world series of poker] matches, it won one (from twice world champion Doyle Brunson--or at least it had a two-to-one chip lead after an hour and a quarter when the match was cancelled for a press conference) and lost two (one each to Brunson and then-reigning world champion Tom McEvoy), but--again--was
244
JONATHAN SCHAEFFER
fairly unlucky. In private, preparatory exhibition matches against top players, it won many more times than it lost. It had even beaten me most of the time. Unfortunately, Orac was never properly documented and the results never reproduced. It is highly unlikely that Orac was as good as this small sample suggests. No scientific analysis was done to see whether the results were due to skill or luck (as was done, for example, in the BKG9.8-Villa match; see Section 3.1). As further evidence, none of the commercial efforts can claim to be anything but intermediate-level players. In the 1990s, the creation of an Internet Relay Chat poker server gave the opportunity for humans (and computers) to play interactive games over the internet. A number of hobbyists developed programs to play on IRC. Foremost among them is R001bot, developed by Greg Wohletz. The program's strength comes from using expert knowledge at the beginning of the game, and doing simulations for subsequent betting decisions. The University of Alberta program Loki, authored by Darse Billings, Denis Papp, Lourdes Pefia, Jonathan Schaeffer, and Duane Szafron, is the first serious academic effort to build a strong poker-playing program. Loki plays on the IRC poker server and, like R001bot, is a consistent big winner. Unfortunately, since these games are played with fictitious money, it is hard to extrapolate these results to casino poker. At best, Loki and R001bot are strong intermediate-level poker players. A considerable gap remains to be overcome before computers will be as good as the best human players.
3.6. 1
Loki
Most readers will be familiar with one or more variants of poker. To avoid confusion, the following gives a brief summary of Texas Hold'em. 9 A hand begins with the pre-flop, where each player is dealt two cards face down (the hole cards), followed by the first round of betting. 9 Three community cards are then dealt face up on the table, called the flop, and a second round of betting occurs. 9 On the turn, a fourth community card is dealt face up and another round of betting ensues. 9 Finally, on the river, a fifth community card is dealt face up and there is a final round of betting. All players still in the game reveal their two hole cards for the showdown. 9 The best five-card poker hand formed from the two hole cards and the five community cards wins the pot. If a tie occurs, the pot is split. 9 Texas Hold'em is typically played with 8-10 players.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
245
Loki is named after the Norse God of mischief [93]. 14 Figure 14 shows the program's architecture and how the major components interact [94]. In the diagram, rectangles are major components, rounded rectangles are major data structures, and ovals are actions. The data follows the arrows between components. An annotated arrow indicates how many times data moves between the components for each of the program's betting actions. The architecture revolves around generating and using probability triples [50]. A probability triple is an ordered set of values, PT-- If, c, r], such that f+c+r-1.0, representing the probability distribution that the next betting action in a given context should be a fold, call, or raise, respectively. The Triple Generator contains the poker knowledge, and is analogous to an evaluation function in two-player games. The Triple Generator calls the Hand Evaluator to evaluate any two-card hand in the current context. It uses the resulting hand value, the current game state, and expert-defined betting rules to compute the triple. To evaluate a hand, the Hand Evaluator enumerates over all possible opponent hands and counts how many of them would win, lose, or tie the given hand.
FIG. 14. Loki's architecture.
14This section is largely based on previously-published descriptions of Loki [50].
246
JONATHAN SCHAEFFER
Each time it is Loki’s turn to bet. the Action Selector uses a single probability triple to decide what action to take. For example, if the triple [O.O, 0.8, 0.21 were generated. then the Action Selector would never fold, call 80% of the time and raise 20% of the time. A random number is generated to select one of these actions so that the program varies its play, even in identical situations. After the flop, the probability for each possible opponent hand is different. For example, the probability that Ace-Ace hole cards are held is much higher than the cards 7-2, since most players will fold 7-2 before the flop. There is a Weight Table for each opponent. Each Weight Table contains a value for each possible two-card hand that the opponent could hold (47 choose 2 = 1081 possibilities). The value is the probability that the hand would be played exactly as that opponent has played so far. After an opponent action, the Opponent Modeler updates the Weight Table for that opponent in a process called re-weighting. The value for each hand is increased or decreased to be consistent with the opponent’s action. The Hand Evaluator uses the Weight Table in assessing the strength of each possible hand, and these values are in turn used to update the Weight Table after each opponent action. For example, suppose the weight for Ace-Ace is 0.7. That is, if these cards have been dealt to an opponent. there is a 70% chance that they would have played it in exactly the manner observed so far. What happens if the opponent now calls? Loki calculates the probability triple for these cards in the current context (as it does for all possible two-card holdings). Assume that the resulting triple is [O.O, 0.2, 0.81. The updated weight for this case would be 0.7 x 0.2 = 0.14. The relative likelihood of the opponent holding Ace-Ace has decreased to 14% because they did not raise. The call value of 0.2 reflects the possibility that this particular opponent might deliberately try to mislead us by calling instead of raising. Using a probability distribution allows us to account for uncertainty in our beliefs. The Triple Generator provides good betting decisions. However, better results can be achieved by augmenting the evaluation with simulation. Loki can play out many likely scenarios to determine how much money each decision will win or lose. Every time it faces a decision. Loki invokes the Simulator to get an estimate of the expected value (EV) of each betting action (see the dashed box in Fig. 14 with the Simulator replacing the Action Selector). A simulation consists of playing out the hand a specified number of times, from the current state of the game through to the end. Folding is considered to have a zero EV, because there is no future profit or loss. Each trial is played out twice-once to consider the consequences of a check/call and once to consider a bet/raise as Loki’s first action. In each trial, cards are dealt to each opponent (based on the probabilities
THE GAMES COMPUTERS (AND PEOPLE) PLAY
247
maintained in the Weight Table), the resulting game is simulated to the end, and the amount of money won or lost is determined. Probability triples are used to approximate the actions of the opponents and Loki's subsequent actions based on their two cards assigned for that trial. The average amount won or lost over all of the trials is taken as the EV of each action. In the current implementation, the action with the greatest expectation is selected, folding if both expectations are negative. To increase the program's unpredictability, the selection of betting actions whose EVs are close in value can be randomized. It should be obvious that the simulation approach must be better than the simple evaluation approach, since simulation essentially uses a selective search to augment and refine a static evaluation function. Barring a serious misconception (or bad luck on a limited sample size), playing out relevant scenarios will improve the default values obtained by heuristics, resulting in a more accurate estimate. For example, a simulation contains implicit knowledge such as: 9hand strength (fraction of trials where Loki's hand is better than the one assigned to the opponent), 9hand potential (fraction of trials where Loki's hand improves to the best, or is overtaken), and 9subtle implications not addressed in the simplistic betting strategy (e.g. "implied odds", extra bets won after a successful draw).
It also allows complex strategies to be uncovered without providing additional expert knowledge. For example, simulations can result in the emergence of advanced betting tactics like a check-raise, even if the basic strategy without simulation is incapable of this play. In strategic games like chess, the performance loss by ignoring opponent modeling is small, and hence it is usually ignored. In contrast, not only does opponent modeling have tremendous value in poker, it can be the distinguishing feature between players at different skill levels. If a set of players all have a comparable knowledge of poker fundamentals, the ability to alter decisions based on an accurate model of the opponent may have a greater impact on success than any other strategic principle. The Weight Table is the first step toward opponent modeling since the weights for opponent cards are changed based on the dynamics of the games. The simplest approach to determining these weights is to treat all opponents the same, calculating a single set of weights to reflect reasonable behavior, and use them for all opponents. An initial set of weights was determined by ranking the starting hands (as determined by off-line simulations) and assigning a probability commensurate with the average
248
JONATHAN SCHAEFFER
return on investment of each hand. These weights closely approximate the ranking of hands by strong players. In Loki, the Opponent Modeler uses probability triples to update the Weight Table after each opponent action (re-weighting). To accomplish this, the Triple Generator is called for each possible two-card hand. It then multiplies each weight in the Weight Table by the entry in the probability triple that corresponds to the opponent's action. The above scheme is called Generic Opponent Modeling (GOM) [95]. Each hand is viewed in isolation and all opponents are treated as the same player. Each player's Weight Table is initially identical, and gets modified based on their betting action. Although rather simplistic, this model is quite powerful in that it does a good job of skewing the hand evaluations to take into account the most likely opponent holdings. Obviously, treating all opponents the same is clearly wrong. Each player has a different style. Specific Opponent Modeling (SOM) customizes the probability triple function to represent the playing style of each opponent. For a given game, the re-weighting factor applied to the entries of the Weight Table is adjusted by betting frequency statistics gathered on that opponent from previous hands. This results in a shift of the assumed call and raise thresholds for each player. During each round of a game, the history of previous actions by the opponent is used to influence the probability triple generated for that opponent. In competitive poker, opponent modeling is much more complex than portrayed here. For example, players can act to mislead their opponents into constructing an erroneous model. Early in a session a strong poker player may try to create the impression of being very conservative, only to exploit that image later in that session when the opponents are using an incorrect opponent model. A strong player has to have a model of each opponent that can quickly adapt to changing playing styles. An important part of strong poker is bluffing. Although mastering this is difficult for humans, it is not an obstacle for a poker program. The computer can extend the range of hands it will play to include a few that have small negative expectations.
3.6.2
The Best of Computer Poker
The hand shown in Fig. 15 was played on IRC against six opponents. The following abbreviations are used to show the betting in each round: sb, small blind; bb, big blind; c, call; k, check; b, bet; f, fold; r, raise. Instead of an ante, Texas Hold'em uses blinds to initially seed the pot. The sample game is $10/$20 Hold'era. Here the first player puts in $5 (the small blind), and the second player puts in $10 (the big blind). The first two betting rounds use
THE GAMES COMPUTERS (AND PEOPLE) PLAY Events
oppl opp2 opp3 opp4 opp5 opp6 Loki
Hole cards 4 r Qr Preflop betting Flop cards 5r J ~ Flop betting Turn card 3 ~ Turn bett ing
249
7r
sb c
bb k
C
C
C
C
C
k c
b
f
c
f
c
c
k
k
k
k
b
c
r
f
c
c
f
r c
c
River c a r d Jr River betting Showdown
k f
b r
opp2 shows 64& 44& Loki shows QO 4 0 Loki wins 8400 FIG. 15. L o k i in action.
$10 bets; the last two use $20 bets. There are a maximum of three raises per betting round. The following annotations in italics are by Darse Billings, co-author of Loki and a former professional poker player. Loki makes a "loose call" with a f a i r l ) ' ~'eak hand before the flop (4~ Q~), because the conditions are otherl~'ise ideal (last position, no raises, and a suited hand with 5 or 6 opponents). In slightly less favorable circumstances, Loki would fold this hand before the flop. The flop yields a good flush draw ~'ith an overcard to the board. After the bet and two calls, a raise is a viable option, since it ~'ould have positive expectation against three opponents (>25% o[ winning), and might also earn a 'free card" (no bet on the turn). Loki opts for the quieter alternative, which gains an additional caller in the small blind (~l'hich is favorable in this situation). A different random number in the calculations might have resulted in a raise. The turn card adds a straight possibility to the draw, and after everyone shows weakness by checking, Loki decides to "semi-blufs Unfortunately, the big blind was playing possum and check-raises ~l'ith the best possible hand (a straight). In hindsight, this was a very risky play on his p a r t - - i f Loki had checked, he would have failed to earn anythingji'om the other players with his very strong hand, and would have given a~'ay a ~'ee chance to make a better hand. After Loki's bet, he is happily able to build a large pot.
250
JONATHAN SCHAEFFER
Loki is lucky enough to make the flush, and raises on the river. After the reraise, the opponent's betting pattern suggests a full house (at least as likely as a straight) and Loki calls. Loki's flush wins against the opponent's straight. Loki wins $400. Were this only real money ...
3.7
Scrabble
The first documented Scrabble program appears to have been written by Stuart Shapiro and Howard Smith and was published in 1977 [96]. In the 1980s a number of Scrabble programming efforts emerged, and by the end of the decade it was apparent that these programs were strong players. With access to the entire Scrabble dictionary (now over 100 000 words), the programs held an important advantage in any games against humans. At the First Computer Olympiad in 1989 the Scrabble winner was Crab, written by Andrew Appel, Guy Jacobson, and Graeme Thomas [97]. Second was Tyler written by Alan Frank. Subsequent Olympiads saw the emergence of TSP (Jim Homan), which edged out Tyler in the second and third Olympiads. TSP later became the commercial program Crosswise. All of these programs were very good, and quite possibly strong enough to be a serious test for the best players in the world. Part of their success was due to the fast, compact Scrabble move generator developed by Andrew Appel [98]. Steven Gordon subsequently developed a move generator that was twice as fast, but used five times as much storage [99]. Brian Sheppard began working on a Scrabble program in 1983, and started developing Maven in 1986. In a tournament in December 1986, Maven scored eight wins and two losses over an elite field, finishing in second place on tie-break. Sheppard describes the games against humans at this tournament [51]: Maven reels off JOUNCES, JAUNTIER, and OVERTOIL on successive plays, each for exactly 86 points, to come from behind against future national champion Bob Felt. Maven crushed humans repeatedly in offhand games. The human race begins to contemplate the potential of computers. In the following years, Maven continued to demonstrate its dominating play against human opposition. Unfortunately, since it did not compete in the Computer Olympiads, it was difficult to know how strong it was compared to other programs at the time. In the 1990s, Sheppard developed a pre-endgame analyzer (for when there were a few tiles left in the bag) and improved the program's ability to simulate likely sequences of moves. These represented important advances
THE GAMES COMPUTERS (AND PEOPLE) PLAY
251
in the program's ability. It was not until 1997, however, that the opportunity arose to properly assess the program's abilities against world-class players. In 1997, a two-game match between Maven and Adam Logan, one of the best players in North America, ended in two wins for the human. Unfortunately, the match was not long enough to get a sense of who was really the best player. In March 1998, the New York Times sponsored an exhibition match between Maven and a team consisting of world champion Joel Sherman and the runner-up Matt Graham. It is not clear whether the collaboration helped or hindered the human side, but the computer won convincingly by a score of six wins to three. The result was not an anomaly. In July of that year, Maven played another exhibition match against Adam Logan, scoring nine wins to five. Shortly after the Logan match, Brian Sheppard wrote: The evidence right now is that Maven is far stronger than human players .... I have outright claimed in communication with the cream of humanity that Maven should be moved from the ~'championship caliber" class to the "abandon hope" class, and challenged anyone who disagrees with me to come out and play. No takers so far, but maybe one brave human will yet venture forth. No one has.
3. 7. 1
Maven
The following description of Maven is based on information provided by Maven's author, Brian Sheppard [100]. Maven divides the game into three phases: early game, pre-endgame, and endgame. The early game starts at move one and continues until there are nine or fewer tiles left in the bag (i.e., with the opponent's seven tiles, this implies that there are 16 or fewer unknown tiles). From there, the preendgame continues until there are no tiles in the bag. In the endgame, all the tiles in the opponent's rack are known. Maven uses the following techniques in regular play, before the preendgame is reached. The program uses the simulation framework described in Section 2.3, with some important Scrabble-specific refinements. Whereas for other games, such as bridge and poker, the number of candidate moves is small, for Scrabble there can be many moves to consider. On average there are over 700 legal moves per position, and the presence of two blanks in the rack can increase this figure to over 5000! Thus, Maven needs to pare the list of possible moves (using the move generator algorithm described in [98]) down to a small list of likely moves.
252
JONATHAN SCHAEFFER
Omitting an important move from this list will have serious consequences; it will never be played. Consequently, Maven employs multiple move generators, each identifying moves that have important features that merit consideration. These move generators are:
9 Score and Rack. This generator finds moves that result in a high score and a good rack (tiles remaining in your possession). Strong players evaluate their rack based on the likeliness of the letters being used to aid upcoming words. For example, playing a word that leaves a rack of QXI would be less preferable than leaving QUI; the latter offers more potential for playing the Q effectively. 9Bingo Blocking. Playing all seven letters in a single turn leads to a bonus of 50 points (often called a bingo). This move generator finds moves that reduce the chances of the opponent scoring a bingo on their next turn. Sometimes it is worth sacrificing points to reduce the opponent's chances of scoring big. 9Immediate Scoring. This generates the moves with the maximum number of points (this becomes more important as the end of the game nears). Each routine provides up to 10 candidate moves. Merging these lists results in typically 20-30 unique candidate moves to consider. In the early game only the Score and Rack generator is used. In the pre-endgame there are four: the three listed above plus a pre-endgame evaluator that "took years to tune to the point where it didn't blunder nearly always" [101]. In the endgame, all possible moves are considered. The move generation routines are highly effective at filtering the hundreds or thousands of possible moves [101]: It is important to note that simply selecting the one move preferred by the Score and Rack evaluator plays championship caliber Scrabble. My practice of combining 10 moves from multiple generators is evidence of developing paranoia on my part. ~Massive overkill" is the centerpiece of Maven's design philosophy. Sheppard points out that his program is missing a fishing move generator. Sometimes it is better to pass a move or play a small word (one or two letters), so that you can exchange some of your tiles. For example, with the opening rack of AEINQST, you can play QAT for 24 points. Instead, you can fish by not playing a word and exchanging the Q. Of the 93 remaining tiles, 90 will make a bingo. For the simulations, Maven does a 2-ply search to evaluate each candidate move (in effect, this is a 3-ply search). It could use a 4-ply search for the evaluation, but this results in fewer simulation data points. Sheppard
THE GAMES COMPUTERS (AND PEOPLE) PLAY
253
discusses the trade-offs: If you compare a four-ply horizon and a two-ply horizon, you find that each iteration of the four-ply horizon takes twice as long, and the variance is twice as large, so you need 2 x v/2 times as much time to simulate to equal levels of statistical accuracy. Since Scrabble has only limited long-term issues, it makes sense to do shallow lookaheads. The limited long-term issues mentioned are a consequence of the rapid turnover in the rack. Maven averages playing 4.5 tiles per turn. After a 2-ply lookahead, there are few (if any) tiles left from the original rack. Consequently, positions being evaluated at the leaves of a two-ply search are very different than the root node. Typically, 1000 2-ply simulations are done when making a move decision. The move leading to the highest average point differential is selected. After a few simulations, it may become statistically obvious that some of the candidate moves have little or no chance of being selected because their expected values are too low. If a move's score is at least 2 standard deviations below that of the best move, and at least 17 simulation iterations have been performed then the low-scoring move is eliminated from consideration. The assignment of tiles to opponent hands is done in a way that guarantees a uniform distribution. A minimum of 14 iterations are needed to place all tiles in an opponent's rack at least once. The 17 iterations comes from 14 being rounded up to a power of 2 (16) and then an inadvertent off-by-1 error giving 17. Other pruning schemes are used to refine the move list. First, nearly identical plays usually lead to almost identical scores. For example, an opening move of " P L A Y " versus " P A L Y " makes no difference in the simulation results. After 101 simulations, the lower rated of almost-identical moves is pruned. Secondly, if it becomes impossible for a low-scoring move to catch up to the best-scoring move given the number of trials remaining, then that move is pruned without any risk. In the pre-endgame, the program's emphasis changes from scoring points to scoring wins. With fewer moves to consider, the simulations are extended to reach the end of the game to determine which side wins. The simulations contain additional pruning. If a candidate move is generating significantly fewer points than the best move and its frequency of wins is less, then that move is eliminated. Using the simulations to count the frequency of wins and points can cause a dilemma. It may be ambiguous as to what the best move to play is [101]: Sometimes one move is the winner both on points and wins, so the choice is clear. But sometimes it is not clear, because wins and points do not agree. In that case Maven "mixes" wins and points on a linear basis. There are two important
254
JONATHAN SCHAEFFER
practical reason for this. First, the simulation might not be representative of the actual play of the game, either because the opponent is incapable of playing as well as Maven (the good case), or because Maven's simulations are mishandling the situation (the bad case). In either of these cases extra points may come in handy. Second, in tournaments it is important to have a high point differential, since that is used to break ties. My calculation shows that a 1% higher chance of winning a game is worth roughly a three to four point sacrifice of point spread. We don't want to go overboard on defensive gestures at the end of a game. It is better to lose occasionally to keep a high differential. A special case occurs when there are only eight unknown tiles. In this case, the opponent can have only one of eight possible tile holdings, so Maven searches each case to the end of the game to determine the final result. Sheppard has recently extended his program to handle up to 12 unknown tiles (924 combinations). When there are no tiles left to be drawn, Scrabble reverts to a game of perfect information (all missing tiles are in the opponent's rack). A l p h a beta would take too long to exhaustively search this, since the branching factor is large, and the program (move generation) is slow. Instead, Maven uses the B* algorithm (see Section 2.1.6). The success of B* hinges on assigning good upper and lower bounds to the moves. Considerable heuristic code is devoted to determining these bounds. Although Maven is capable of making an error in the search (e.g., poor bounds, or limits on space), in practice this is rarely seen. This may be the only example of a real system where B* is to be preferred to a l p h a - b e t a . The Scrabble community has extensively analysed Maven's play and found a few minor errors. Postmortem analysis of the Logan match showed that Maven made mistakes that averaged 9 points per game. Logan's average was 40 points per game. Maven missed 7 fishing moves (69 points lost), some programming errors (48 points lost), and several smaller mistakes (6 points lost). The programming errors have been corrected. If a future version of Maven included fishing, the points per game error rate would drop to less than one per game. Maven would be playing nearly perfect Scrabble. Of the points lost due to programming errors, Brian Sheppard writes: It just drives me crazy that I can think up inventive ways to get computers to act intelligently, but I am not smart enough to implement them correctly. And that is the soliloquy of every games programmer.
3.7.2
The Best of Computer Scrabble
In July 1998, at the annual conference of the American Association for Artificial Intelligence, Maven played an exhibition match against Adam
THE GAMES COMPUTERS (AND PEOPLE) PLAY
255
Logan, one of the top Scrabble players in North America. Logan won three of the first four games of the match, but Maven won six of the next seven games. Going into the critical twelfth game, Maven led by a score of seven wins to four. The following annotations are by Brian Sheppard and originally appeared in the Scrabble Players Ne~'s. 15 The columns of a Scrabble board are specified from left to right by the letters a-o. Rows are specified from top to bottom using the numbers 1 to 15. Moves are specified by giving the square of the first letter of the word. If the coordinate begins with a number, then the word is placed horizontally. If the coordinate begins with a letter, then the word is placed vertically. The blank is denoted by "?".
Maven~Adam
Logan
M a v e n ( A C N T V Y Z ) plays C A V Y at 8t, 24 pts, Maven = 24 Logan = 0. The alternative is Z A N Y for 32, but the C V T rack is poor. Much better is 24 points with an N T Z rack. As to placement, a better choice than Maven's is probably CA V Y 8G. This version of Maven ~l'as not ideal at first-turn placement, for inexcusable internal reasons. Fortunately this is not a significant skill factor compared to scoring and keeping good tiles. Maven is almost ideal at those skill factors. L o g a n ( E G L N O R Y ) plays Y E A R L O N G at g6, 66 pts, Maven = 24 Logan = 66. Adam finds the only bingo. M a v e n ( A D N N O T Z ) plays D O Z Y at 6d, 37 pts, Maven = 61 Logan = 66. It is D O Z Y ( 6 d , 3 7 , A N N T ) versus A Z L O N ( l O e , 3 4 , N T D ) or Z O O N ( l l e , 2 6 , A D N T ) . D O Z Y ' s extra points and retention of a vowel win despite duplicate Ns. L o g a n ( A D E F O T V ) plays O F T at h13, 21 pts, Maven = 61 Logan = 87. Adam's choice is best. He also has VOTED(5A,27,AF), OVA(H13,21, DEFT), F O V E A L ( l O b , 2 2 , D T ) , and ADVENT(12c,22,FO). Adam didn't think long, and since the choices are so close it doesn't pay to think long~ M a v e n ( A E N N N O T ) plays N E O N at 5b, 15 pts, Maven = 76 Logan = 87. N E O N ( 5 b , 1 5 , A N T ) edges A N O N ( 5 b , 1 5 , E N T ) . I am not sure why, but clearly E N T and A N T are both good rack leaves, and there must be some benefit to avoiding a F A N O N hook in the "a" column. It may also be that A N O N ' s vowel-consonant-vowel-consonant pattern is easier to overlap than NEON.
~5Reproduced with permission. Minor editing changes have been made to conform with the style of this chapter.
256
JONATHAN SCHAEFFER
Logan(ACDEEIV) plays D E V I A N C E at 12b, 96 pts, Maven = 76 Logan = 183. Adam finds the only bingo. M a v e n ( A H I N R T U ) plays H U R T at 4a, 34 pts, Maven = 110 Logan = 183. H U N T would usually surpass HURT, because R is better than N, but here there are three Ns already on the board versus one R. It is important to note that Maven did not choose H U N T for the reason I gave," Maven chose H U N T because in 1000 iterations of simulation it .found that H U N T scored more points than HURT. The reason I am giving (that three N's have been played versus one R) is my interpretation of that same body of data. L o g a n ( D D E E M M N ) plays E M E N D E D at e7, 26 pts, M a v e n = 110 Logan = 209. E M E N D E D is a good play, following sound principles." score points, undouble letters. Simulations give a two-point edge to MEM(13a,25,EDDN), however. Possibly the 8a-8d spot weighs against E M E N D E D , plus keeping an E is a valuable benefit for M E M . These advantages out~'eigh the extra point and duplicated Ds. M a v e n ( A B E I N N P ) plays IAMB at 8a, 33 pts, Maven = 143 Logan = 209. l A M B is really the only play, doubled Ns not~,ithstanding. L o g a n ( A I L M T T U ) plays M A T H at al, 27 pts, Maven = 143 Logan = 236. M A T H ( a l , 2 7 , I L T U ) is best, with UTA(3a,20,ILMT) second. The advantage of M A T H over UTA is its seven extra points, but the disadvantage is keeping a U. These almost wash, ~'ith an edge to MA TH. M a v e n ( E F G N N P S ) plays FEIGN at el0, 18 pts, Maven = 161 Logan = 236. FEIGN is the only good move. FENS(j9,24,GNP) is higher scoring, but FEIGN keeps better tiles," N P S easily makes up the scoring deficit plus a lot more on top. Logan(AILORTU) plays T U T O R I A L at 15h, 77 pts, Maven = 161 Logan = 313. Adam finds the only bingo. (Actually, T U T O R I A L also plays at 15f, but scores only 59 there.) Maven(?ABNOPS) plays B O S at jl0, 26 pts, Maven = 187 Logan = 313. See Fig. 16.
Maven made a great draw from the bag, and then made one of the most difficult plays of the game. Maven has no bingos, and has to choose how to make one. Playing off the B and P is indicated, so plays like BAP or BOP (7i,20) come to mind. But Maven finds two stronger, and surprising, alternatives." BOS~]lO,26,?ANP) and BOPS(j9,25,?AN). These plays score a few extra points as compensation for pla)'ing the S, and they open the "k" column for bingo-making. I would have thought that BOPS would win out, but BOS is better. BOS does show a higher point differential, but that is not why it is better. It is better because the chance of getting a big bingo is higher owing to the creation of a spot where a bingo can hit t~'o double word squares. I believe that the great majoriO' of human masters ~'ould have rejected BOS without a
THE GAMES COMPUTERS (AND PEOPLE) PLAY
257
FIG. 16. Maven plays BOS (jl0) scoring 26 points.
second thought, probably choosing BOP. B O S is a.fantastic play, and yet, there are two plays still to come in this game that are more d!fficult still. Logan(IILPRSU) plays PILIS at 15a, 34 pts, Maven = 187 Logan = 347. PILLS, PULIS, PILUS, and PURLS are all good. Adam's choice is best because there are only two Us le[t, and Adanl doesn't ~t'ant to risk getting a bad Q. When you lead the game 3'ou have to guard against extreme outcomes. Maven(?AKNPRS) plays S P A N K E R at k5, 105 pts, Maven = 292 Logan = 347. This is the only bingo, and a big boost to Maven's chances. I saw S P A N K E R but I wasn't sure it was legal, so I ~t'as sitting on the edge o f my seat. Being down 160 points is depressing. Worse than depressing." it is nearly impossible to come back from that far behind. The national championship tournament gives a prize to the greatest comeback, and in this 31-round, 400-player event there is often only one game that features such a comeback. L o g a n ( E E E O R U S ) plays OE at bl, 12 pts, Maven = 292 Logan = 359. Adam plays the best move again. This play scores ~t'ell, as his highest-scoring play is just 13 points ( E R E L6). OE dumps vo~t'els ~t'hile keeping all his
258
JONATHAN SCHAEFFER
consonants (an edge over E R E ) . It also keeps the U as "Q-insurance," an edge over M O U E ( l a , 7 , E E R S ) . And it blocks a bingo line. Not bad value, and a good example o f ho~t' to make something positive happen on every rack. M a v e n ( ? H J T T W W ) plays J A W at 7j, 13 pts, Maven = 305 Logan = 359. Maven's draw is bad overall, but at least there is hope ~f Maven can clear drek. Any play that dumps two of the big tiles is ~t'orth considering, with JA 14I, W O R T H ( l l i , 1 6 , ? J W ) , and WAW(b7,19,?JHTT) as leading contenders. JA W wins because the W H and TH are bearable combinations, and the T T isn't too bad either. Man), players ~t'ould exchange this rack, but Maven didn't consider doing so. I don't know ho~t' exchanging (keeping ?HT, presumably) would fare, but I suspect it wouldn't do well," there are few good tiles remaining, and drawing a Q is a real risk. Logan(AEEGRSU) plays G R E A S E at m3, 31 pts, Maven = 305 Logan = 390. Simulations show A G E R ( L 9 , 2 4 , E S U ) as three points superior to GREASE, but I suspect that G R E A S E does at least as good a job of winning the game, since it takes away S bingos off of JA W. It also pays to score extra points, which provide a cushion if Maven bingos. And it pays to turn over tiles, which gives Maven fewer turns to come back. M a v e n ( ? H R T T W X ) plays AX at 6m, 25 pts, Maven = 330 Logan = 390. Maven's move is brilliant. Who would pick A X over GOX(13G,36) ? Would you sacrifice 11 points, while at the same time creating a huge hook on the "o" column for an A X E play? And do so ~t'hen there are two E's unseen among only 13 tiles and you don't have an E and you are only turning over one tile to draw one? It seems crazy, but here's the point." among the unseen tiles ( A A E E I I I I L O Q U U ) are only t~t'o consonants, and one of them is the Q, which severely restricts the moves that can be made on the "o" column. I f Adam has E Q U A L then Maven is dead, of course, but otherwise it is hard to get a decent score on the "o" column. In effect, Maven is getting a free shot at a big "o" column play. A X is at least 10 points better than an)' other move, and gives Maven about a 20% chance of ~t'inning the game. The best alternative is HA W(b7,19). G O X is well back. Logan(EIIILQU) plays LEI at o5, 13 pts, Maven = 330 Logan = 403. Adam sensibly blocks, and this is the best play. The unseen tiles from Adam's perspective are ? A A E H I O R T T U W , so Adam's volt'elitis stands a good chance of being cured by the draw. M a v e n ( ? A H R T T W ) plays WE at 9b, 10 pts, Maven = 340 Logan = 390. Again a problem move, and again Maven finds the best play. In fact, it is the only play that offers real winning chances. Maven calculates that it will win if it draws a U, with the unseen tiles AEIIIOQUU. There may also be occasional wins when Adam is stuck ~t'ith the Q. This move requires fantastic depth of calculation. What will Maven do (f it draft's a U?
THE GAMES COMPUTERS (AND PEOPLE) PLAY
259
L o g a n ( A l l l O Q U ) plays QUAI at j2, 35 pts, Maven = 340 Logan = 438. Adam's natural play wins unless there is an E in the bag. AQUA(N12,26), Q U A I L ( O I l , 1 5 ) , QUAI(M12,26), and QUA(N13,24) also ~'in unless there is an E in the bag, but with much, much lo~l'er point d(fferential than Q UAI because these plays do not block bingos through the G in GREASE. There is no better play. I f an E is in the bag then Adam is lost. M a v e n ( ? A H R T T U ) plays M O U T H P A R T at l a, 92 + 8 pts, Maven = 440 Logan = 438. See Fig. 17. Maven scores 92 points for M O U T H P A R T , and eight points for the tiles remaining in Logan's rack. Maven ~'as fishing for this bingo when it played WE last turn. With this play Maven steals the game on the last move. Adam, of course, was stunned, as it seemed that there ~'ere no places for bingos left on this board. I f I hadn't felt so bad for Adam, ~l'ho played magnificently, I would have jumped and cheered. This game put Maven up by eight games to four, so winning the match was no longer in doubt. How often do you score 438 points in a game of Scrabble ... and lose?
FIG. 17. Maven--Logan, final position.
260
JONATHAN SCHAEFFER
3.8
Other Games
Conspicuously absent from this chapter is the Oriental game of Go. It has been resistant to the techniques that have been successfully applied to the games discussed in this chapter. For example, because of the 19 • 19 board and the resulting large branching factor, a l p h a - b e t a search alone has no hope of producing strong play. Instead, the programs perform small, local searches that use extensive application-dependent knowledge. David Fotland, the author of the Many Faces of Go program, identifies over 50 major components needed by a strong Go-playing program. The components are substantially different from each other, few are easy to implement, and all are critical to achieving strong play. In effect, you have a linked chain, where the weakest link determines the overall strength. Martin Mfiller (author of Explorer) gives a stark assessment of the reality of the current situation in developing Go programs [102]: Given the complexity of the task, the supporting infrastructure for writing Go programs should offer more than is offered for other games such as chess. However, the available material (publications and source code) is far inferior. The playing level of publicly available source code ..... though improved recently, lags behind that of the state-of-the-art programs. Quality publications are scarce and hard to track down. Few of the top programmers have an interest in publishing their methods. Whereas articles on computer chess or general game-tree search methods regularly appear in mainstream AI journals, technical publications on computer Go remain confined to hard to find proceedings of specialized conferences. The most interesting developments can be learned only by direct communication with the programmers and never get published. Although progress has been steady, it will take many decades of research and development before world-championship-caliber Go programs exist. At the other end of the spectrum to Go are solved games. For some games, computers have been able to determine the result of perfect play and a sequence of moves to achieve this play. 16 In these games the computer can play perfectly, in the sense that the program will never make a move that fails to achieve the best-possible result. Solved games include Nine Men's Morris [43], Connect-4 [103], Qubic [104], and Go Moku [104]. This chapter has not addressed one-player games (or puzzles). Singleagent search has been successfully used to optimally solve the 15-puzzle [14] and Rubik's Cube [105], and progress is being made on solving Sokoban 16This is in contrast to the game of Hex where it is easy to prove the game to be a first player win, but computers are not yet able to demonstrate that win.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
261
problems [106]. Recently, major advances have occurred in building programs that can solve crossword puzzles [107]. The last few years have seen research on team games become popular. The annual RoboCup competition encourages hardware builders and software designers to test their skills on the soccer field (~'~'w.robocup.com). Finally, other areas of games-related interest include commercial computer games, such as sports and role-playing games. The artificial intelligence work on these games is still in its infancy.
4.
Conclusions
Samuel was writing as a pioneer, one of the first to realize that computer games could be a rich domain for exploring the boundaries of computer science and artificial intelligence. Since his 1960 paper, software and hardware advances have led to significant success and milestones in the history of computing. With it has come a change in people's attitudes. Whereas in Samuel's time, understanding how to build strong game-playing program was at the forefront of artificial intelligence research, today, 40 years later, it has been demoted to lesser status. In part this is an acknowledgment of the success achieved in this f i e l d - - n o other area of artificial intelligence research can claim such an impressive track record of producing high-quality working systems. But it is also a reflection on the nature of artificial intelligence itself. It seems that as the solution to problems become understood, the techniques become less "AIish." The work on computer games has resulted in advances in numerous areas of computing. One could argue that the series of computer chess tournaments that began in 1970 and continue to this day represents the longest running experiment in computing science. The games research has demonstrated the benefits of brute-force search, something that has become a widely accepted tool for a number of search-based applications. Many of the ideas that saw the light of day in game-tree search have been applied to other algorithms. Building world-championship-caliber games programs has demonstrated the cost of constructing high-performance AI systems. Games have been used as experimental test beds for many areas of AI. And so on. Samuel's concluding remarks from his 1960 chapter are as relevant today as they were when he wrote the paper [72]: Just as it was impossible to begin the discussion of game-playing machines without referring to the hoaxes of the past, it is equally unthinkable to close the discussion without a prognosis. Programming computers to play games is but one stage in the development of an understanding of the methods which
262
JONATHAN SCHAEFFER
must be employed for the machine simulation of intellectual behavior. As we progress in this u n d e r s t a n d i n g it seems reasonable to assume that these newer techniques will be applied to real-life situations with increasing frequency, and the effort devoted to games ... will decrease. Perhaps we have not yet reached this turning point, and we may still have much to learn from the study of games.
ACKNOWLEDGMENTS I would like to extend my deepest admiration to the brave human champions who accepted the challenge of a computer opponent. In most cases, the champion had little to gain, but everything to lose. Malcolm Davis, Garry Kasparov, Adam Logan, Zia Mahmood, Marion Tinsley, Michael Rosenberg, and Takeshi Murakami made it possible to scientifically measure the progress of game-playing programs. Many people made significant contributions to this chapter. I would like to extend my sincere appreciation to: Backgammon: Gerry Tesauro (author of TD-Gammon) and Malcolm Davis (world backgammon champ): bridge: Matt Ginsberg (author of GIB), Mike Whittaker (Bridge Magazine), and Onno Eskes (IMP magazine); chess" Murray Campbell (co-author of Deep Blue); Othello: Michael Buro (author of Logistello); poker: Darse Billings (co-author of Loki) and Mike Caro (Card Player Magazine): Scrabble: Brian Sheppard (author of Maven) and Joe Edley (Scrabble Players News). I am appreciative of the feedback from Darse Billings, Michael Buro, Andreas Junghanns, Lourdes Pefia, Jack van Ryswyck, and Roel van der Goot. Technical help was provided by Mark Brockington and Alice Nodelman. Financial support was provided by the Natural Sciences and Engineering Research Council of Canada (NSERC). Finally, I would like to thank Marvin Zelkowitz for the opportunity to contribute to this volume. It was an honor to follow in Arthur Samuel's footsteps.
REFERENCES [1] Shannon, C. (1950). Programming a computer for playing chess. Philosophical Magazine 41,256-275. [2] Turing, A. (1953). Digital computers applied to games. In Faster than Thought, ed. B. Bowden, Pitman, London, pp. 286-295. [3] Samuel, A. (1959). Some studies in machine learning using the game of checkers. IBM Journal of Research and Development 3, 210-229. [4] Samuel, A. (1967). Some studies in machine learning using the game of checkers: Recent progress. IBM Journal o[" Research and Development 11, 601-617. [5] Krol, M. (1999). Have we witnessed a real-life Turing test. Computer 32(3), 27-30. [6] Thompson, K. (1986). Retrograde analysis of certain endgames. Journal of the International Computer Chess Association 9(3), 131 - 139. [7] Lafferty, D. (1997). Chinook draws the hundred years problem. American Checker Federation Bulletin 269, 6. [8] Brudno, A. (1963). Bounds and valuations for abridging the search for estimates. Problems of Cybernetics 10, 225-241 (translation of the original that appeared in Problemy Kibernetiki 1D, 141-150, 1963). [9] Knuth, D. and Moore, R. (1975). An analysis of alpha-beta pruning. Artfficial Intelligence 6(4), 293-326.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
263
[10] Marsland, T. (1986). A review of game-tree pruning. Journal of the International Computer Chess Association 9(1), 3-19. [11] Slate, D. and Atkin, L. (1977). Chess 4 . 5 - - T h e Northwestern University chess program. In Chess Skill in Man and Machine, ed. P. Frey, Springer-Verlag, Berlin, pp. 82-118. [12] Plaat, A. (1996). Research re: search and re-search. PhD thesis, Erasmus University, Rotterdam, The Netherlands. ~13] Plaat, A., Schaeffer, J., Pijls, W., and de Bruin, A. (1996). Exploiting graph properties of game trees. In AAAI National Conference, pp. 234-239. [14] Korf, R. (1985). Iterative deepening: An optimal admissible tree search. Artificial Intelligence 27(1), 97-109. [15] Schaeffer, J. (1986). Experiments in search and knowledge. PhD thesis, University of Waterloo, Canada. [16] Schaeffer, J. (1989). The history heuristic and alpha-beta search enhancements in practice. IEEE Pattern Analysis and Machine Intelligence 11(11), 1203-1212. [17] Akl, S. and Newborn, M. (1977). The principle continuation and the killer heuristic. In A CM Annual Conference, pp. 466-473. [18] Reinefeld, A. (1983). An improvement of the Scout tree-search algorithm. Journal of the International Computer Chess Association 6(4), 4-14. [19] Reinefeld, A. (1989). Spielbaum Suchvelfahren. Inflormatik-Fachberichte 200, Springer Verlag, Berlin. [20] Marsland, T. and Campbell, M. (1982). Parallel search of strongly ordered game trees. Computing Surveys 14(4), 533-551. [21] Pearl, J. (1980). Scout: A simple game-searching algorithm with proven optimal properties. In AAAI National Conference, pp. 143-145. [22] Beal, D. (1990). A generalised quiescence search algorithm. Artificial Intelligence 43(1), 85-98. [23] Donninger, C. (1993). Null move and deep search: Selective-search heuristics for obtuse chess programs. Journal of the International Computer Chess Association 16(3), 137-143. [24] Buro, M. (1995). Probcut: An effective selective extension of the ct-3 algorithm. Journal of the International Computer Chess Association 18(2), 71-76. [25] Anantharaman, T., Campbell, M., and Hsu, F. (1990). Singular extensions: Adding selectivity to brute-force searching. Artificial hltelligence 43(1), 99-109. [26] Anantharaman, T. (1991). A statistical study of selective min-max search. PhD thesis, Carnegie Mellon University, Pittsburgh, PA. [27] Campbell, M., Hoane, J., and Hsu, F. (1999). Search control methods in Deep Blue. In AAAI Spring Symposium on Search Techniques for Problem Soh,ing Under Uncertainty and Incomplete Information, AAAI Press, pp. 19-23. [28] Ebeling, C. (1987). All the Right Moves. MIT Press, Cambridge, MA. [29] Feldmann, R. (1993). Spielbaumsuche mit massiv parallelen Systemen. PhD thesis, Universit/it-Gesamthochschule Paderborn, Germany. [30] Berliner, H. (1979). The B* tree search algorithm: the best-first proof procedure. Artificial Intelligence 12, 23-40. [31] Berliner, H. and McConnell, C. (1996). B * probability based search. Art!~'cial Intelligence 86(1), 97-156. [32] McAllester, D. (1988). Conspiracy numbers for rain-max searching. Artificial Intelligence 35, 287-310. [33] Lorenz, U., Rottmann, V., Feldmann, R., and Mysliwietz, P. (1995). Controlled conspiracynumber search. Journal of the International Computer Chess Association 18(3), 135-147. [34] Baum, E. and Smith, W. (1997). A Bayesian approach to relevance in game playing. Artificial Intelligence 97(1-2), 195-242.
264
JONATHAN SCHAEFFER
[35] Rivest, R. (1987). Game tree searching by min/max approximation. Artificial Intelligence 34(1), 77-96. [36] Russell, S. and Wefald, E. (1991). Do the Right Thing. MIT Press, Cambridge, MA. [37] Thompson, K. (1982). Computer chess strength. In Advances in Computer Chess 3, ed. M. Clarke, Pergamon Press, Oxford, pp. 55-56. [38] Hsu, F. (1999). IBM's Deep Blue chess grandmaster chips. IEEE Micro (March-April), 70-81. [39] Junghanns, A. and Schaeffer, J. (1997). Search versus knowledge in game-playing programs revisited. In International Johtt Conference on Arti[icial Intelligence, pp. 692697. [40] Scherzer, T., Scherzer, L. and Tjaden, D. (1990). Learning in Bebe. In Computers, Chess and Cognition, ed. T. Marsland and J. Schaeffer, Springer-Verlag, New York, pp. 197216. [41] Slate, D. (1987). A chess program that uses its transposition table to learn from experience. Journal of the htternational Computer Chess Association 10(2), 59-71. [42] Schaeffer, J. (1997). One Jump Ahead. Challenging Human Supremacy in Checkers. Springer-Verlag, New York. [43] Gasser, R. (1995). Efficiently Harnessing Computational Resources for Exhaustive Search. PhD thesis, ETH Ztirich, Switzerland. [44] Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning 3, 9-44. [45] Sutton, R. and Barto, A. (1998). Reil1[brcement Learnhlg: An Introduction. MIT Press, Cambridge, MA. [46] Dailey, D. (1999). Personal communication (email message), 27 May. [47] Baxter, J., Tridgell, A., and Weaver, L. (1998). Experiments in parameter learning using temporal differences. Journal of the International Computer Chess Association, 21(2), 8499. [48] Beal, D. (1997). Learning piece values using temporal differences. Journal of the International Computer Chess Association 2t)(3), 147-151. [49] Fiirnkranz, J. (1996). Machine learning in computer chess: The next generation. Journal of the International Computer Chess Association 19(3), 147-161. [50] Billings, D., Pefia, L., Schaeffer, J., and Szafron, D. (1999). Using probabilistic knowledge and simulation to play poker. In AAAI National Conference, pp. 697-703. [51] Sheppard, B. (1999). Personal communication (email message), 9 March. [52] Berliner, H. (1980). Backgammon computer program beats world champion. Artificial Intelligence 14, 205-220. [53] Berliner, H. (1980). Computer backgammon. Scientific American 242(6), 64-72. [54] Tesauro, G. (1989). Neurogammon wins computer olympiad. Neural Computation 1, 321-323. [55] Tesauro, G. (1998). Personal communication (email message), 14 August. [56] Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM 38(3), 58-68. [57] Tesauro, G. (1999). Personal communication (email message), 28 May. [58] Zadeh, N. and Kobliska, G. (1977). On optimal doubling in backgammon. Management Science 23, 853-858. [59] Tesauro, G. (1999). Personal communication (email message), 6 April. [60] Berlekamp, E. (1963). A program for playing double-dummy bridge problems. Journal of the ACM 10(4), 357-364. [61] Ginsberg, M. (1999). GIB: Steps toward an expert-level bridge-playing program. In International Joint Conference on Artfficial httelligence, pp. 584-589.
THE GAMES COMPUTERS (AND PEOPLE) PLAY
265
[62] Frank, I. (1998). Search and Planning under hwomplete hTjbrmation." A Stud)' Using Bridge Card Play. Springer-Verlag, New York. [63] Ginsberg, M. (1996). Partition search. In AAAI National Conference, pp. 228-233. [64] Smith, S., Nau, D., and Throop, T. (1998). Computer bridge: A big win for AI planning. AI Magazine 19(2), 93-105. [65] Smith, S., Nau, D. and Throop, T. (1998). Success in spades: Using AI planning techniques to win the world championship of computer bridge. AAAI National Conference, pp. 1079-1086. [66] Whittaker, M. (1998). The 1998 world computer bridge championships. Bridge Magazine 96(10), 22-24. [67] Eskes, O. (1997). GIB: Sensational breakthrough in bridge software. IMP 8(2) (www.imp-bridge. nl / articles / G I B 1Sens. h tml). [68] Strachey, C. (1952). Logical or non-mathematical programmes. Proceedings of the Association for Computing Machinery Meeting, pp. 46-49. [69] Truscott, T. (1979-1980). The Duke checkers program. Journal of Recreational Mathematics 12(4), 241-247. [70] Tinsley, M. (1980). Letter to the editor, Scientific American, August. [71] Schaeffer, J., Culberson, J., Treloar, N., Knight, B., Lu, P. and Szafron, D. (1992). A world championship caliber checkers program. Artflicial hltelligence 53(2-3), 273-290. [72] Samuel, A. (1960). Programming computers to play games. In Advances in Computers 1, ed. F. Alt, pp. 165-192. [73] Condon, J. and Thompson, K. (1982). Belle chess hardware. In Advances in Computer Chess 3, ed. M. Clarke, Pergamon Press, Oxford, pp. 45-54. [74] Hyatt, R., Gower, A. and Nelson, H. (1990). Cray Blitz. In Computers, Chess and Cognition, ed. T. Marsland and J. Schaeffer, Springer-Verlag, New York, pp. 111-130. [75] Berliner, H. and Ebeling, C. (1989). Pattern knowledge and search: The SUPREME architecture. Artificial Intelligence 38(2), 161 - 198. [76] Felten, E. and Otto, S. (1988). A highly parallel chess program. In Conference on Fifth Generation Computer Systems. [77] Hsu, F., Anantharaman, T., Campbell, M. and Nowatzyk, A. (1990). Deep Thought. In Computers, Chess, and Cognition, ed. T. Marsland and J. Schaeffer, Springer Verlag, New York, pp. 55-78. [78] Hsu, F., Anantharaman, T., Campbell, M. and Nowatzyk, A. (1990). A grandmaster chess machine. Scientific American 263(4), 44-50. [79] McCarthy, J. (1997). AI as sport. Science 276(6 June), 1518-1519. [80] Hsu, F. (1990). Large scale parallelization of alpha-beta search: An algorithmic and architectural study with computer chess. PhD thesis, Carnegie Mellon University, Pittsburgh, PA. [81] Kasparov, G. (1997). IBM owes me a rematch. Time Magazine (American edition), 26 May. [82] Rosenbloom, P. (1982). A world-championship-level Othello program. Artificial Intelligence 19(3), 279-320. [83] Lee, K-F. and Mahajan, S. (1990). The development of a world class Othello program. Artificial Intelligence 43(1), 21 - 36. [84] Buro, M. (1997). The Othello match of the year: Takeshi Murakami vs. Logistello. Journal of the International Computer Chess Association 20(3), 189-193. [85] Buro, M. (1997). Logistello--A strong learning Othello program (www.neci.nj.nec.com/ homepages / mic / ps /log-overview.ps. gz). [86] Buro, M. (2000). Experiments with Multi-ProbCut and a new high-quality evaluation function for Othello. In Games in AI Research, ed. J. van den Herik and H. Iida, University of Maastricht (To appear).
266
JONATHAN SCHAEFFER
[87] Buro, M. (1995). Statistical feature combination for the evaluation of game positions. Journal of Artificial Intelligence Research 3, 373-382. [88] Buro, M. (1999). Toward opening book learning. Journal of the International Computer Chess Association 22(2), 98-103. [89] Buro, M. (1999). Personal communication (email message), 17 May. [90] Billings, D. (1995). Computer poker. MSc thesis, University of Alberta, Canada. [91] Findler, N. (1977). Studies in machine cognition using the game of poker. Communications of the A C M 20(4), 230-245. [92] Caro, M. (1999). Personal communication (email message), 13 March. [93] Papp, D. (1998). Dealing with imperfect information in poker. MSc thesis, University of Alberta, Canada. [94] Schaeffer, J., Billings, D., Pefia, L. and Szafron, D. (1999). Learning to play strong poker. In ICML Workshop on Machine Learning hi Game Pla)'hlg. [95] Billings, D., Papp, D., Schaeffer, J. and Szafron, D. (1998). Opponent modeling in poker. In A A A I National Conference, pp. 493-499. [96] Shapiro, S. and Smith, H. (1977). A Scrabble crossword game-playing program. Technical Report 119, Department of Computer Science, State University of New York at Buffalo. [97] Levy, D. and Beal, D. (eds.) (1987). Heuristic Programming in Artificial Intelligence. Ellis Horwood, Chichester. [98] Appel A. and Jacobson, G. (1988). The world's fastest Scrabble program. Communications of the A C M 31(5), 572-578, 585. [99] Gordon, S. (1994). A faster Scrabble move generation algorithm. Software Practice and Experience 24(2), 219-232. [100] Sheppard, B. (1998). Personal communication (email message), 23 October. [101] Sheppard, B. (1999). Personal communication (email message), 1 June. [102] M~iller, M. (1999). Computer go: A research agenda. Journal of the International Computer Chess Association 22(2), 104-112. [103] Allis, V. (1988). A knowledge-based approach to connect-four. The game is solved: White wins. MSc. thesis, Vrije Universiteit, Amsterdam, The Netherlands. [104] Allis, V. (1994). Searching for solutions in games and artificial intelligence. PhD thesis, University of Limburg, The Netherlands. [105] Korf, R. (1997). Finding optimal solutions to Rubik's Cube using pattern databases. In A A A I National Conference, pp. 700-705. [106] Junghanns, A. and Schaeffer, J. (1999). Domain-dependent single-agent search enhancements. In International Joint Conference on Art(ficial Intelligence, pp. 570-575. [107] Keim, G., Shazeer, N., Littman, M. et al. (1999). Proverb: The probabilistic cruciverbalist. In A A A I National Conference, pp. 710- 717.
From Single Word to Natural Dialogue N. O. BERNSEN AND L. DYBKJAER The Natural Interactive Systems Laboratory University of Southern Denmark Main Campus: Odense University Science Park 10 5230 Odense M Denmark
Abstract Spoken language dialogue systems represent the peak of achievement in speech technologies in the 20th century and appear set to form the basis for the increasingly natural interactive systems to follow in the coming decades. This chapter first presents a model of the task-oriented spoken dialogue system, its multiple aspects, and some of the remaining research challenges. In the context of this model, a first general model is presented of the complex tasks performed by dialogue managers in state-of-the-art spoken language dialogue systems. The dialogue management model is aimed to support best practice in spoken language dialogue systems development and evaluation.
1. 2.
3.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . Task-oriented Spoken Language Dialogue Systems . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Dialogue Context . . . . . . . . . . . . . . . . . . . . . . . 2.3 Dialogue Control . . . . . . . . . . . . . . . . . . . . . . . 2.4 Language . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Performance and Human Factors . . . . . . . . . . . . . . . 2.7 Systems Integration . . . . . . . . . . . . . . . . . . . . . . 2.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . Managing the Dialogue . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Goal of the Dialogue Manager . . . . . . . . . . . . . . . . . 3.3 System Varieties . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Input Speech and Language . . . . . . . . . . . . . . . . . . 3.5 Getting the User's Meaning . . . . . . . . . . . . . . . . . . 3.6 Communication . . . . . . . . . . . . . . . . . . . . . . . . 3.7 History, Users, Implementation . . . . . . . . . . . . . . . .
ADVANCES IN COMPUTERS, VOL. 52 ISBN 0-12-012152-2
267
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
268 271 271 273 273 274 275 276 277 278 278 278 281 281 283 285 303 317
Copyright C 2000 by AcademicPress All rights of reproduction in any form reserved.
268
N. O. BERNSEN AND L. DYBKJAER
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.
325 325 325
Introduction
It is commonplace to say, and think, that everything to do with computers moves very fast. Yet it is also a fact that the general field of computing is renowned for exaggerated claims and expectations about the swiftness of future progress. In the world of speech systems we have seen both. Thus, during the 1970s and 1980s repeated claims were made that near-perfect speech recognition had been achieved or was just around the corner. In 1960, promising recognition rates were reported for very small vocabulary (10 words), speaker-dependent, real-time recognition of isolated words [1]. Today, 40 years later, academic research in speech recognition is about to reach the end of the road, being replaced by steady progress through competitive industrial development [2]. Medium-sized vocabulary (5000 words or more), speaker-independent, real-time recognition of continuous (or spontaneous) speech has become commercial reality, and very large vocabulary (60 000 words or more) spoken dictation systems which only need a minimum of speaker-dependent training can be purchased for about 100 Euros (US$100) from companies such as IBM, Dragon Systems, and Philips. Swift or not, this impressive progress has shifted the perspective in speech technology research dramatically. Today, robust, unlimited vocabulary, real-time speaker-independent continuous speech recognition is within reach and speech recognition technology has become a component technology which is finding its way into all sorts of interfaces to computer systems. However, the maturition of speech recognition is not the whole story of speech technology during the past four decades. In itself, speech recognition is a transformation of the acoustic signal into an uninterpreted string of words which may or may not make sense to a human but does not make any sense to the machine. This enables applications such as the coveted "phonetic typewriter" [1] as well as spoken command applications in which the system executes in response to a spoken word or phrase rather than in response to the push of a button in the keyboard, mouse, or other device. Among humans, speech is much more than that, of course. Speech is the primary modality for interactive exchange of information among people. Although it was hardly visible--even as a long-term g o a l - - i n 1960, the past 10-15 years have seen the emergence of a powerful form of interactive speech systems, i.e. task-oriented spoken language dialogue systems [3].
FROM SINGLE WORD TO NATURAL DIALOGUE
269
These systems not only recognize speech but understand speech, process what they have understood, and return spoken output to the user who may then continue the spoken interaction with the machine until the task has been accomplished. In their most versatile form, spoken language dialogue systems (SLDSs) incorporate speaker-independent, spontaneous speech recognition in real time. It is task orientation that has made SLDSs possible at the present time. It is still too early to build fully conversational SLDSs which can undertake spoken interaction with humans in the same way humans communicate with one another using speech only--about virtually any topic, in free order, through free negotiation of initiative, using unrestricted vocabulary speech, and so on. However, a range of collaborative tasks are already being solved through speech-only dialogue with computers over the telephone or otherwise. One of the simplest possible examples is a system that asks if the user wants to receive a collect call. If the user accepts the call, the system connects the caller; if the user refuses the call, the system informs the caller that the call was rejected [4]. A more complex task, for which commercial solutions already exist, is train timetable information [5-7]. The user phones up the system to inquire, for instance, the times of trains from Paris Gare du Nord to Brussels on Thursday morning, and receives a spoken list of departures in return. As these examples show, task-oriented SLDSs constitute a powerful application paradigm for interactive speech technologies, which could be used for a virtually unlimited number of interactive user-system tasks. However, successful SLDSs remain difficult to build for reasons which often go beyond the purely theoretical and technical issues involved and which illustrate the general state of speech technology research at this point. Even if they are capable of working as systems in their own right, speech recognizers are increasingly becoming system components. The same is true of speech generators which perform stand-alone tasks as text-to-speech systems but which are increasingly becoming system components as well. The SLDS is probably the most important technology which integrates speech-to-text, text-to-speech and various other components, such as natural language understanding and generation, but it is not the only one. Other integrated systems technologies incorporating some form of speech processing include speech translation systems and multimodal systems having speech as one of their input/output modalities--other modalities being, for instance, mouse pointing input gesture, or output graphics such as information tables or a talking face. All of these integrated technologies represent a level of complexity which is comparatively new to the field of speech technology research. Together with the rapid increase in commercial exploitation of speech technology in general, those technologies have
270
N. O. BERNSEN AND L. DYBKJAER
introduced an urgent need for system integration skills, human factors skills, general software engineering skills, and skills in creative contents creation to be added to the skills of groups which used to work on the basic component technologies. The field, in other words, is now faced with having to specialize software engineering best practice to speech technologies, and to do so swiftly and efficiently. This reorientation process has only just started. This is why the development of, for instance, task-oriented SLDSs remains fraught with home-grown solutions, lack of best practice methodologies and tools, ignorance about systems evaluation, lack of development platforms and standards, etc. Only by solving problems such as these will it be possible to efficiently design and build task-oriented SLDSs which will achieve their ultimate purpose: to conduct smooth and effortless natural dialogue with their users during interactive task resolution. Arguably, the methodical achievement of natural dialogue in taskoriented SLDSs is one of the most important challenges for speech technologies research at this point. It may be illuminating to view this challenge as a pointer into the future. Humans not only use speech for collaborative and interactive task resolution, they engage in spoken conversation about everything. Humans interact through speech in different languages. And, when interacting face-to-face through speech, humans communicate in many other ways in parallel: through lip movement, facial expression, gesture, and bodily posture; the communication often makes use of objects which are present on-site and which themselves may have communicative contents, such as texts, maps, images etc. Task-oriented SLDSs, therefore, point all the way to systems which communicate with humans in the same way in which humans communicate with one another. Task-oriented SLDSs conducting natural dialogue merely represent the first step towards the ultimate goal of speech technology research, which is that of integrating speech into fully natural interactive systems. In the context just outlined, this article discusses progress towards the development of natural dialogue in task-oriented SLDSs. This is a large topic which comprises, at least, "final frontiers" in speech recognition, current challenges in speech generation, issues in natural language understanding and generation, dialogue management, human factors in SLDSs, system integration issues, and supporting tools, architectures, and platforms. Rather than attempting to cover all of these aspects of SLDSs, we have chosen to focus on recent progress in the understanding of dialogue management. The dialogue manager is, in most cases, the core of the SLDS, which orchestrates everything the system does and refers to most events which occur in an SLDS. Moreover, dialogue management continues to pose challenges in research and development on which the following will attempt to shed some light. Before going into detail with dialogue
FROM SINGLE WORD TO NATURAL DIALOGUE
271
management in Section 3, Section 2 outlines the system context in which a dialogue manager must operate. Section 4 concludes the chapter.
2. Task-oriented Spoken Language Dialogue Systems 2.1
Introduction
Speech recognition without speech understanding constitutes a broad and important, yet still fairly limited application area. As speech recognition improved sufficiently to enable recognition beyond the single word, it became interesting to post-process the recognized input string through adaptation of existing syntactic and semantic parsing techniques. The resulting understanding of spoken input pointed the way towards systems which could use their understanding of the input to help users carry out particular tasks through dialogue with the machine. However, the achievement of such systems required exploration of the challenges posed by sophisticated dialogue management. As long as only single-word utterances were recognized and no natural language understanding processing was involved, dialogue managers only had to match the recognized string against a set of possibilities and take the action associated with the best match. This process from speech recognition-only through inclusion of natural language understanding and dialogue management is reflected in the topics addressed in the DARPA Proceedings between 1989 and 1992 [8-11]. On the output side, most SLDSs still use coded speech, i.e. prerecorded words and phrases that are replayed to the user. For many languages, parametric speech (or fully synthetic speech) is still of insufficient quality for use in walk-up-and-use applications. More recently, the emerging field of dialogue engineering has come to include human factors issues as well as issues in systems integration. Figure 1 shows a model of the elements which are, or could be, relevant to the design and construction of an SLDS. No existing system incorporates all those elements but all systems incorporate some of them. The elements may be viewed as abstract building blocks from which to build particular SLDSs. The elements are organized into five layers. 9 At the bottom of the figure, the dialogue context layer includes aspects of the history of interaction, the domain model and the user model. 9 At the level above the context layer, the interaction control layer includes (system) state of attention as well as the structures defined by the interlocutors' intentions and structural aspects of the linguistic exchanges. System control is largely based on structures at this level. 9 The language layer describes the linguistic aspects of the interaction.
272
N. O. BERNSEN AND L. DYBKJAER
FIG. 1. Elements for building SLDSs. Gray boxes reflect the logical architecture of SLDSs. The gray band shows overall information flow among the layers.
9 Then follows the acoustic laver which includes the transformations between the speech signal and the symbolic expressions of language. 9 Finally, the performance laver is a function of the other layers taken together and includes some general aspects of the system's behavior. The grey horseshoe-shaped band in Fig. 1 indicates the overall processing flow among elements--from input through control to output and performance--in a context defined by contextual elements. Developers often refer to elements in terms of the system components which implement these, such as the speech recognizer, the lexicon, the parser, the dialogue manager, the database, the language generator, the speech generator, etc., system performance being replaced by an abstraction of the (physical) user and reflected in the human factors issues. Just as no existing SLDS includes all the elements of Fig. 1, many SLDSs do not include all the components just mentioned. Very simple SLDSs, for instance, only include a recognizer with a simple parser, a basic dialogue manager, and a speech generator. We will now briefly present the five layers of Fig. 1 to convey a sense of the context for which dialogue managers must be developed and in which
FROM SINGLE WORD TO NATURAL DIALOGUE
273
they have to work. Section 3 will revisit this context from the particular point of view of the dialogue manager.
2.2
Dialogue Context
Dialogue context is of crucial importance to language understanding and generation as well as to dialogue control, and plays a central role in SLDS development. The dialogue context provides constraints on lexical selection, speech act interpretation, reference resolution, task execution and communication planning, system focus and expectations, the reasoning that the system must be able to perform, and the utterances it should generate. Contextual constraints serve to remove ambiguity, facilitate search and inference, adapt to the user, and increase the information contents of utterances since the more context, the shorter the messages need to be [12]. Context specification is closely related to the particular task(s), application, and users of the system. In a sense, each element of Fig. 1 is part of the context of each other element. 9 The interaction histo~3' is primarily relevant to the local discourse and is built and used during interaction with the user. An interaction history is a selective record of information which has been exchanged during interaction. It is useful to distinguish between at least four types of interaction history as shown in Fig. 1. 9 The domain of an SLDS is the aspect of the world about which the system can communicate. An SLDS often acts as front-end to some application, such as an email system or a database. The domain model captures the concepts relevant to the application in terms of data and rules. For instance, there may be rules for expanding relative date indications, such as "today," into absolute dates. The domain data may be used partly for providing information requested by the user and partly for checking user input. For example, the system may check if a flight route indicated by the user actually exists.
9 User modeling is particularly important in SLDS development. The better the system can take aspects such as user goals, beliefs, skills, preferences and cognition into account, the more cooperative the system can be [13]. The general fragility of current SLDSs means that they must be carefully crafted to fit the behavior of their users. This remains a hard problem. 2.3
D i a l o g u e Control
Despite the very real problems in designing grammars and parsing techniques for spoken input (see 2.4), it is probably in dialogue processing
274
N. O. BERNSEN AND L. DYBKJ,tER
that current development of advanced SLDSs is furthest removed from theoretical and practical mastery in terms of best practise development and evaluation procedures, methods, tools, standards, and supporting concepts and theory. The main reason appears to be the comparative novelty and unexpected complexity of the problems encountered. The management by machine of spoken dialogue can only be investigated in running SLDSs or realistic simulations. Such investigations have only been possible during the last decade or so, whereas research on speech recognition and generation, and on (written) linguistic input analysis and (written) language generation has a much longer history. Many individual aspects of human-human conversation have been investigated for decades but theoretical results have proven difficult to transfer to spoken human-machine interaction [3]. This is because the machine is a highly inferior partner in dialogue compared to human interlocutors. Controlling the dialogue is a core function in SLDSs. Dialogue control determines what to expect from the user, how to interpret high-level input structures, which context elements to consult, what to output to the user, and generally when and how to do what. The nature of these control tasks implies that control has to operate on superordinate interaction structures and states. Following [14], the dialogue control layer distinguishes three types of superordinate interaction structure and state. The attentional state includes the entities in current interaction focus. The intentional structure addresses the multiple purposes of interaction, and the linguistic structure includes characterization of high-level structures in the input and output discourse.
2.4 Language The linguistic processing done by today's advanced SLDSs consists in (i) linguistic analysis of the input produced by the speech recognizer and (ii) generation of linguistic output to the speech generator from an underlying semantic representation. Natural language generation per se is often absent from current systems because the underlying output semantics, once chosen by the system, is directly linked to pre-designed system output phrases which simply have to be played to the user or passed through the speech synthesizer. As more advanced natural language generation becomes necessary, there does not seem to be any reason why SLDS designers could not draw upon existing knowledge about language generation in the natural language processing community. There are two major issues to take into account: 9 It is important that the system's messages are clear and precise in the dialogue context.
FROM SINGLE WORD TO NATURAL DIALOGUE
275
9 Moreover, as humans tend to model the system's output phrases, the system must be able to recognize its own output vocabulary and grammar. The problems are more serious on the linguistic input side. Many SLDSs need some form of linguistic input analysis. However, spoken language behaves very differently from written language (text) and its behavior remains poorly understood. This means that there is no easy way of transferring linguistic progress in written language understanding to the understanding of spoken input. Thus, there is still little consensus with respect to how to optimize grammars and parsing for SLDSs. Full written language parsing techniques do not work. What actually works, more or less, is commonly called "robust parsing," but this term does not have any clear meaning at present apart from referring to less-than-full written language parsing. Other issues include: 9 whether to use stand-alone grammar and lexicon(s) or build these into the speech recognizer 9 how to achieve spoken sub-language adequacy (lexicon and grammar) for language understanding and generation 9 whether to use morphology (declarative and principled, but slow processing) or a full-form lexicon (fast) 9 how to integrate syntax and semantics 9 how to efficiently separate resources from the procedures which use them (modularity) 9 how to add linguistic knowledge (grammar and vocabulary) to the system during or after development (extensibility) 9 how to build one shared grammar for analysis and generation (modularity) [15].
2.5 Speech The speech layer concerns the relationship between the acoustic speech signal and a, possibly enriched, text (lexical string). This relationship is not a simple one. Speech includes a number of prosodic phenomena--such as stress, glottal stops, and intonation--which are reflected in text only in a simplistic manner. Conversely, words and their different spellings, as we know them from text, do not have natural expressions in speech. Speech recognition must cater for extralinguistic noise and other phenomena, such as that the speech rate varies over time, the speech signal is mixed with environmental noise from other people speaking and may be of differing quality because of traffic, slamming doors, etc., the pronunciation
276
N. O. BERNSEN AND L. DYBKJ,ZER
varies with the speaker, and speech from different participants may overlap, for instance with the system's utterances [16, 17]. Speech recognizers and speech generators still need improvement in many respects, including: 9 basic recognition rate in adverse conditions 9 coping with spoken language specificities, such as hesitations, repetitions of words or syllables, ill-formed phrases, incomplete sentences, etc. 9 rejecting nonauthorized words or interpreting them using the context of the sentence or dialogue 9 dynamically adapting to the user's personal way of speaking (linguistic behavior, own stereotypes etc.) 9 voice output quality 9 ability to handle input prosody and output prosody in concatenated pre-recorded speech or speech synthesis. Meanwhile, the errors and misunderstandings that occur between user and system because of less-than-ideal speech recognition and generation can often be satisfactorily handled through spoken interaction. In fact, users can now speak to their computer systems in basically the same way as they speak to fellow humans, that is, by using continuous speaker-independent speech, and they can understand the machine's spoken response without significant difficulty, especially when pre-recorded speech is being used. This means that, despite several remaining challenges, the speech layer is in place for practical use. 2.6
P e r f o r m a n c e and H u m a n Factors
Any advanced SLDS incorporates many of the elements presented in Fig. 1. Together, these elements determine the observable behavior or performance of the system during interaction. The observable behavior strongly influences the user's impression of, and satisfaction with, the system. Those element properties which contribute to observable system behavior form part of the human factors aspect which is a fairly new discipline in SLDS design. Human factors cover all aspects of interactive system design which are related to the end-user's abilities, experience, goals, and organizational/cultural context [18]. Although the remit of this field is broad, theoretical and practical work tends to occupy a variety of small niches with few unifying approaches defining interactive system design on all of the dimensions noted.
FROM SINGLE WORD TO NATURAL DIALOGUE
277
However, along with the accelerating industrial exploitation of SLDSs it is becoming increasingly important to invest in human factors issues early in the development process in order to investigate what is needed to achieve a system which will satisfy the users' needs and expectations [19]. It is costly to partly redesign an implemented system which has been discarded by its users because it failed to take basic human factors into account. Human factors cut across the components of an SLDS. This means that human factors issues must be considered during development of any SLDS component as well as for the integrated system to ensure that the final SLDS will be adequate and satisfy user expectations as far as possible. Human factors issues include system cooperativity, dialogue initiative, dialogue structure, detection and correction of errors, user expectations and user background, user modeling, and how to address users and influence user behavior. Depending on the task(s) to be solved, natural dialogue may sometimes also require the possibility for the user to draw upon additional modalities, such as haptic input or output graphics. 2.7
Systems Integration
SLDS components must be integrated in two ways: (i) the individual components must be brought to work together (ii) the integrated components must somehow be connected to the world to allow users access to the system. The inner circle of Fig. 2 reflects (i) whereas the outer circle reflects (ii). The key components of SLDSs have been mentioned above. Individual components may integrate several smaller components. Any c o m p o n e n t - small or large--has to communicate with at least one other component. Various platforms are available which can help developers enable the communication in a relatively standardized way through application programming interfaces (APIs). The way in which the user accesses an SLDS differs from system to system. Some SLDSs are only accessible over the telephone. Others are internet-based [20]. Some systems run locally on the user's machine and are accessed via microphone. There are systems of which a major part runs on a server and only a minor part runs on the user's machine. Some S L D S s - client-based or client-server based--allow multimodal interaction. SLDSs may need access to external databases or collaborate with other applications. Thus, the environmental integration needed differs widely from one application to another and may be fairly complex.
278
N. O. BERNSEN AND L. DYBKJ,ZER
FIG. 2. Functional and environmental systems integration of SLDSs.
2.8
Conclusion
It has emerged from the brief introduction to SLDSs above that all layers of Fig. 1 as well as their integration into working SLDSs pose a variety of unsolved problems. Arguably, the major challenges, including some thorny human factors issues, are to be found in the area of dialogue management. The next section will explore the complexity of dialogue management in detail.
3.
Managing the Dialogue 3.1
Introduction
Dialogue management theory is still relatively uncharted territory. Current views on dialogue management tend to be heavily biased towards particular SLDSs, particular interactive tasks, and particular dialogue management solutions for those tasks. The reason is that few dialogue manager developers have extensive hands-on experience from developing different dialogue managers for a wider range of SLDSs and interactive tasks. Hands-on experience remains a major source of knowledge about dialogue management,
FROM SINGLE WORD TO NATURAL DIALOGUE
279
given the fact that the generalizations that are required in the field cannot yet be found in the literature. If considered as general truths, those views may induce sub-optimal solutions to the development of SLDSs and dialogue managers which, by their nature, require a different approach. The presentation of issues in dialogue management below is based on analyses of a series of significantly different SLDSs from research and industry. The authors hope that this underlying diversity has contributed some amount of generality to the observations made. Generality is a precondition for achieving a theory of dialogue management which can support the development of dialogue managers for SLDSs. Dialogue management is arguably the core functionality of spoken language dialogue systems (SLDSs). Figure 1 helps explain why this is the case. The figure shows the logical architecture of SLDSs as organized in a series of layers called performance, speech, language, control, and context, respectively. A user interacting with an SLDS produces speech input (speech layer) and receives speech output (speech layer) from the system. In addition, the regular user might figure out more abstract properties of the system's behavior, such as how cooperative it is during dialogue, the distribution of user and system dialogue initiative, or how the system attempts to influence the user's dialogue behavior through its choice of words and in other ways (performance layer). The rest of the system's workings are hidden from inspection by the user. Figure 1 classifies these workings in terms of a series of headers, such as "user utterances" or "domain model," and elements subsumed by each header. The elements are high-level references to the functionality that may be present in today's SLDSs. Most of today's SLDSs do not have all of the functionalities referred to in Fig. 1, but all SLDSs have some of the functionalities. Dialogue management is primarily located in the control and context layers in Fig. 1. When the user inputs an utterance to the system, the dialogue manager may receive a representation of the meaning of what the user said from the speech and language layers. It is the task of the dialogue manager to handle that representation properly, eventually producing an appropriate output utterance to the user. Depending on the complexity of the task which the system helps the user solve, the dialogue manager may have to make use of most or all of the elements listed in the control and context layers in Fig. 1. For instance, to determine if the user has provided a new piece of information which the system needs in order to complete the task, the dialogue manager may have to check the history of sub-tasks performed so far (context layer: task history). Or, to decide if shortcuts in the dialogue can be made because this particular user does not need a certain piece of information or advice, the dialogue manager would check if the user belongs to the class of experienced users (context layer: user group).
280
N. O. BERNSEN AND L. DYBKJAER
A dialogue manager should be designed based on careful consideration of its task(s) and users. What actually happens is that the designer (or developer) designs a system based on both design-time and run-time considerations. Some design decisions only affect design-time work whereas many others affect run-time operation as well. The system's task, for instance, is fixed at design-time and cannot be modified at run-time. On the other hand, what the user actually inputs at some point cannot be fully predicted at design-time and must be handled at run-time. The distinction between design-time and run-time considerations is inherent to the following account and we trust that the reader will be able to decide when a particular comment applies to design-time only or to run-time as well. This section takes a systematic look at the many issues that may have to be dealt with in dialogue manager design. Some of the issues to be considered go beyond anything encountered in current industrial design of dialogue managers. In research prototype development, on the other hand, any of those issues may be encountered although no existing research project appears to address all of them. This may soon change, however, given the rapid development of the field and, when this happens, it is in the nature of research progress that new projects will address not only every single issue to be presented below but additional issues as well. The following presentation has been sub-divided into 24 sections. The sections are structured under the following headings: 9 9 9 9 9 9
goal of the dialogue manager system varieties input speech and language getting the user's meaning communication history, users, implementation.
This structure roughly derives from analysis of a dialogue manager's main tasks and their ordering, and thus constitutes a partially ordered model of (possible) dialogue manager functionality. The model is based on in-depth analyses of the following dialogue manager exemplars: the Daimler-Chrysler Dialogue Manager [21], the dialogue manager in the Danish Dialogue System [3], the dialogue manager in Railtel/ARISE [22, 23], the dialogue manager in Verbmobil [24, 25, www.dfki.uni-sb.&/verbmobil/], and the dialogue manager in Waxholm [26, 27, www.speech.kth.se/ll'axholm/~l'axholm.html].The model expands on the dialogue manager-related parts of Fig. 1 and is summarized in Section 3.7.4 below. Let us assume that the model subsumes all existing and emerging dialogue managers for SLDSs, i.e., each of these dialogue managers
FROM SINGLEWORDTO NATURALDIALOGUE
281
represent a sub-set of the functionalities described in the model. Given that assumption, an obvious application of the model is to use it as a checklist during SLDS and dialogue manager design, to help identify the sub-set of functionalities needed in a particular case.
3.2
Goal of the Dialogue Manager
3.2. 1 Efficient Task Performance The central goal of an SLDS is to enable the interactive task to be done
efficiently and with a maximum of usability. This goal subsumes a number
of sub-goals which are common in discussions of SLDSs and their components, such as robustness, flexibility, naturalness, etc. As we will not be considering usability aspects in what follows (see [3, 28]), from the point of view of the present paper this goal reduces to efficient task performance. Ideally, efficient task performance means that the system takes the steps that are necessary and su[.'ficient to ah~'ays provide appropriate output at any given stage during the dialogue.
3.3
System Varieties
Our common idea of an SLDS is a unimodal speech input/speech output system which conducts a dialogue about a single task in a single language with a single user at a time. Such SLDSs are, indeed, the most common ones at the present time. But this should not blind us to the large space of alternative types of SLDSs, one of which might be just what is needed for a particular application.
3.3. 1 Multimodal Systems Including Speech Appropriate SLDS output to the user is not always spoken output. For some tasks, the system's output may be system actions which the user can perceive, such as connecting the user with the person the user asked to speak to. Obviously, SLDSs may produce many other nonspeech output actions in response to the user's input. Providing operator fallback is one such action that we will come back to later. Other nonspeech output actions are due to the fact that the system provides output in modalities other than speech. Some tasks require the system to output information which is most effectively presented in, e.g., textual or pictorial form. Long lists of flight connections, for instance, are much more efficiently conveyed by text than by speech, and it is well known that the contents of many pictures are virtually impossible to
282
N. O. BERNSEN AND L. DYBKJ,ZER
render through verbal means [29]. A bicycle repair guide, for instance, is not much use to the novice without ample pictorial illustration. In such cases, the SLDS needs to be enhanced through other modalities of (output) information presentation, such as text or images presented on a screen. This is the case in the Waxholm system which also uses an animated graphical speaking face whose lip movements help the user disambiguate the system's synthetic speech and whose eye movements help the user focus on newly added information on the screen. Through its "user speaks" input button, Waxholm also illustrates the fact that SLDSs do not need to take only speech as input. The user presses the button in order to make the system listen. For many other tasks too, an appropriate solution may be to combine spoken input with other input modalities, such as pointing gesture. So, speaking more generally, there is a fundamental question here about when to use speech input and/or speech output for a particular application, when not to use speech, and when to use speech in combination with other particular input/output modalities for information representation and exchange. The solution to this question about the functionalities, or roles, of modalities in particular cases not only depends on the task(s) to be solved by the system but on many other parameters as well, such as the users, the work environment, etc. Having investigated the issue of speech functionality for a long time [30], we have developed a web-based tool called SMALTO to support early design decisions on when (not) to use speech for particular applications [31, www.disk2.dk].
3.3.2 Multilingual Systems Appropriate SLDS output to the user is not always spoken output in the same language as the spoken input provided by the user. Spoken translation systems, such as Verbmobil, translate spoken input in one language into spoken output in another language. Spoken translation systems are basically different from standard SLDSs in that they do not conduct any dialogue with the user about the application domain: they do not provide the user with train departure times or help the user browse the web. Rather, they mediate dialogue on a particular task within a particular domain between users who speak different languages. At most, spoken translation systems conduct meta-communication dialogues with their users (see 3.6.2). Still, such systems share many issues of dialogue management with standard SLDSs. Another obvious possibility is that an SLDS accepts spoken input in different languages for the same task(s) and responds in the language in which it is being addressed.
FROM SINGLE WORD TO NATURAL DIALOGUE
283
3.3.3 Multitask, Multiuser Systems In what follows, we focus on the design of an SLDS for a single task. Already systems are being built to handle several more or less independent tasks, such as consulting one's diary and answering email over the phone. Applications such as these serve to increase an already existing need for task- and domain-independent dialogue managers, which would facilitate the rapid prototyping of SLDSs for new tasks and new domains. We will return to this issue later (Section 3.7.4). In what follows, we also focus on single-caller (or user), single-system dialogue. An obvious next step is for several callers to conduct dialogue with the system to solve one and the same task. There are several real issues involved in creating smooth multiuser spoken dialogue with machines, such as that of handling simultaneous input from several users. These issues will not be discussed in this chapter.
3.4 Input Speech and Language So far, in Sections 3.2-3.3, we have discussed SLDSs in general, their fundamental goals and varieties. From now on, we focus on dialogue management of "standard" task-oriented SLDSs, i.e. unimodal, monolingual, single-task, and single-user SLDSs.
3.4.1 Are the Speech and Language Layers OK? Suppose that we have identified a certain task, T1, for which we consider building an SLDS. Suppose, in addition, that T1 is already known not to generate problems which cannot be solved by following best practice in developing the speech and language layers of the application (cf. Figure 1). In other words, our speech recognizer can be expected to come to possess the necessary robustness for the application, the vocabulary will not be infeasibly large, we are able to develop the grammar and parser required, language generation and speech generation of sufficient quality can be developed, etc.
3.4.2 Do the Speech and Language Layers Need Support from the Dialogue Manager? Things in this world often do not come for flee. It may be that conditions have to be imposed on the dialogue manager in order to guarantee feasibility in the speech and language layers. The feasibility conditions may derive from, e.g., the need for real-time performance (see 3.4.3), task complexity (see 3.5.1), or achieving a sufficient recognition rate in a large-vocabulary
284
N. O. BERNSEN AND L. DYBKJ~ER
system. The dialogue manager may thus have to actively support the processing done in the speech and language layers. It is worth noting, however, that many existing systems use no dialogue manager support for the speech and language layers. In the future, we will see many more forms of dialogue manager support for the speech and language layers than those listed below. We distinguish between dialogue manager support for input prediction, input language processhlg control, and output control.
3.4.2. 1 Input prediction When doing input prediction, the dialogue manager uses knowledge about the dialogue to predict aspects of the next user input, thereby reducing the complexity which must be handled by the speech and/or language layers of the system. Input prediction may be possible if, for instance, the task has some structure to it (see 3.5.1 and 3.5.4). Thus, if the system knows that it is now going to ask the user a specific question, then the system can start making plausible guesses about, e.g., the vocabulary the user will be using in the following input utterance which will be a reply to the system's question to the user. Speech and/or language layer feasibility conditions could be that the dialogue manager must: 1. constrain the search space of the speech recognizer by constraining the set of words which are likely to occur in the next user utterance (subvocabulary prediction) 2. constrain the search space of the keyword or phrase spotting component by delimiting its search space to the most probable keywords or phrases (keyword prediction) 3. constrain the search space of the syntactic analysis component by narrowing the set of applicable grammar rules to a specific and probable sub-grammar (sub-grammar prediction) 4. constrain the search space of the parser by narrowing the set of semantically meaningful units it should be working with (semantic prediction). In other words, unless the dialogue manager performs successful input prediction based on some form of knowledge about the dialogue, the speech and/or language input processing components will, or may, perform too poorly for the application to be feasible (unless, of course, the problem can be circumvented in some other way).
3.4.2.2 Input language processing control Input language processing control will become increasingly important as users' input utterances grow in length as well as in lexical, grammatical, and linguistic context-determined complexity. Speech and/or language layer feasibility
FROM SINGLE WORD TO NATURAL DIALOGUE
285
conditions could be that the dialogue manager must: 5. constrain the search space of the semantic analysis component by, e.g., delimiting the set of possible antecedents in cross-sentence anaphora resolution.
3.4.2.3 Output control When doing output control, the dialogue manager follows principles or guidelines that may serve to reduce the complexity of certain aspects of system output, maximize output naturalness, etc. Output control is an important means of controlling the user's dialogue behavior (see 3.5.2). Speech and/or language layer feasibility conditions could be that the dialogue manager must: 6. control the prosody of the spoken output based on control layer information about the message to be produced, such as speech act information 7. control the lexical variation of dialogue expressions to be produced by the language generation component 8. control the grammar of the dialogue utterances to be produced by the language generation component 9. control the style of the dialogue utterances to be produced by the language generation component.
3.4.3
Real-time Requirements
It can be assumed that most or all SLDSs need to work in real time or close-to-real time, for such is the nature of spoken dialogue. In some cases of spoken h u m a n - h u m a n dialogue, we actually do tolerate waiting for a response from our dialogue partner, especially when we know that the partner has to search for information or do complex calculations or inferences before responding. Users may be expected to tolerate the same from machines but, generally speaking, close-to-real-time operation is a very important goal in SLDS design.
3.5
Getting the User's Meaning
3.5. 1 Task Complexity Ideally, we would like to develop SLDSs which simply let the users speak freely and then do whatever is necessary to complete the task. For some tasks, this is clearly feasible. Suppose, for instance, that our SLDS has to ask if the user wants to accept a collect call and, depending on the user's answer, simply
286
N.O. BERNSEN AND L. DYBKJ,ZER
routes the call to the user or does not. In this case, the risk of letting the users speak freely probably is as small as it ever gets. However, suppose that the task is about ordering VIP dinner arrangements at a large restaurant including date, time, duration, choice of rooms, many-course meals, special diets, wines, flowers, timing for speeches, entertainment, seating arrangements, payment arrangements, etc. It is easy to imagine that some users will have quite a bit to say about that. They may have so much to say, in fact, that even the dialogue manager support to the speech and language layers described in 3.4.2 will not be sufficient to guarantee that the system will always be able to provide appropriate output. Task complexity, therefore, probably is the single most important factor to consider for the dialogue manager developer. Unfortunately, at this point, task complexity cannot be measured in any objective way. The reason is that task complexity is a function of several factors, including: 9 the number of pieces of information which have to be communicated between the user and the system, such as departure airport, arrival airport, departure time, arrival time, etc. 9 the number of optional pieces of information which could be, but do not have to be, communicated between the user and the system, such as whether or not the caller wants to book a return flight 9 whether or not the task itself has some a priori structure in terms of a full or partial ordering of its component sub-tasks. For instance, train schedules are often different on workdays and Sundays, which means that the system must know the date before it can offer particular departure times 9 whether or not the sub-tasks constituting the task are independent. For instance, a user may want to trade departure time for a cheap fare. In that case, the sub-task of fixing the departure time is not independent of the sub-task of determining if the caller has an interest in discount 9 whether the flow of information is one-way or two-way. One-way information flow is the simpler, such that, for instance, the system asks all the questions and the user provides all the answers 9 whether or not negotiation is involved 9 whether or not substantial economic commitments are being made during dialogue, such as the purchase of a flight ticket or the ordering of a bank transfer 9 whether or not real-time updates are important, such as information about delayed flights or impending strikes And so on.
~ b, ~
~
"
~~
~
~
o~
~~~-
~ ~
~=~_~
~ ~ ~
~
~ C~~
~
~
~
~ o ~ ~-
o~.
~"
~
..
~
~
~
"
~=
~.
~~.
"~
~
9
O~o~
~ ~ ~
~-~~~o~s,
~=
~-~
~
o
~
.~.
"
o~ ~ o~ o-. ~.P~ =~ ~ ~ ~. (7) = ~ ~ ~= ~ ~ ,-. o ~ ~ ~.. ~ ~ = ~ . ~ o~ ~ ~ ~- o ~
~-~-
=
~ ~
~
~ ~ o
~
~'D
5.~ r.~ ~-~
~ -~ .~'~ . ~ ~ ~~ _.o ~ . ~ - o = ~. ~ ~ oO.~ ~~. = ~ ~ ~~ ~ ~. ~~.~ ~ o~ ~~.~ ~ 0 ~ ~ ~ ~ 0~ ~ : o ~ ~ ~ ~" ~ =o~ =~ ~.~ o~-5 ~ ~ ~ =. ~. .~~ =-~ ~. ~ ~ ~ e ~ ~ ~" ~ ~. ~. ~ ~
~
~~
~-,.
~=
~ ~ ~ ~ ' ~
0 <
~-
~
~
9~-
~ ~ ~-
~ ~
.
o
.,
i
o ,--
~
~-~
~'
=
o
o -<
~ ~"
~ n.~
0-o
~
o
o ~.5.
~-o
=-'~
~ 2 ~
~.~
~=
r~
=~ 9
0 o c m
u ]>
--I c :10
z
.-I 0
0 DO c~
m
00 z
7"1 DO 0
288
N. O. BERNSEN AND L. DYBKJ/ER
optional: 9 9 9 9 9 9
how many people will travel? their age categories will they need return tickets? will they all need return tickets or only some of them? will they need round-trip tickets? do they need information on the notions of green and red (i.e. different kinds of discount) departures, etc.
As, in such cases, letting users speak freely may not be feasible, the challenge for the dialogue designer becomes that of eliciting all relevant pieces of information in as elegant a way as possible.
3.5. 1.2 I l l - s t r u c t u r e d t a s k s What is the task structure, if any? Some tasks are ill-structured or have no structure at all. Consider, for instance, an SLDS whose database contains all existing information on flight travel conditions and regulations for a certain airline company. This information tends to be quite comprehensive both because of the many practicalities involved in handling differently made up groups of travellers and their luggage, but also because of the legal ramifications of travelling which may surface--if the flight crashes, for example. Some users may want many different individual pieces of information from the database whereas others may want only one piece. Which piece(s) of information a particular user wants is completely unpredictable. We call such user tasks ill-structured tasks: the user may want one, two, or several pieces of information from the database, and the order in which the user may want the information is completely arbitrary as seen from the system's point of view. The system must be prepared for everything from the user all the time. One way to try to reduce the complexity of large ill-structured tasks is to use, or invent, domain structure. That is, the domain may be decomposable into a number of sectors which themselves may be hierarchically decomposed, etc. So the system asks, for instance: "Do you want to know about travelling with infants, travelling with pets, travelling for the elderly and the handicapped, hand luggage, or luggage for storage?" And if the user says "Hand luggage," the system asks: "Do you want to know about volume of permitted luggage, electronic equipment, fragile items, or prohibited luggage?," etc. In principle, any information hierarchy, however deep a n d / o r broad, can be handled by SLDSs in this way. However, few users will find it acceptable to navigate through many hierarchical levels prompted by the system in order to find a single piece of information at the bottom of some deep domain hierarchy. Making the hierarchy shallower
FROM SINGLE WORD TO NATURAL DIALOGUE
289
will often make matters even worse. No user will find it acceptable to have to go through a shallow but broad hierarchy prompted by the system in order to find a single piece of information at the end of it. Just imagine a Danish train timetable inquiry system which asks the user: "Do you want to go from Aabenr~., Aalborg, Aarhus .... " mentioning the country's 650 or so train stations in alphabetical order and in chunks of, say, 10 at a time. Another important problem about ill-structured tasks is that their vocabularies tend to differ considerably from one user to another. Unlike, say, train timetable inquiries which can start uncontroversially from the "official" city names, or used car sales catalogues which can start from the car models and production years, there is no ~official" vocabulary, used by all or most users, for inquiring about pet transportation, high-volume luggage, or staff support for the elderly. We are all familiar with this problem from the (paper) yellow pages where the categories are often labeled differently from how we would have labeled them ourselves. SLDSs in fact offer an elegant solution to this problem, i.e. an electronic dictionary of equivalents. The user speaks freely about the entry of interest. The system has only to spot a keyword in the user's input corresponding to an entry in its electronic dictionary in order to direct the user to the right entry in its hierarchy, and the user does not have to worry about knowing or remembering the headings used to structure the hierarchy itself.
3.5. 1.3
Well-structured tasks
Other tasks have some structure to
them. Consider the flight ticket reservation task handled by the Danish Dialogue System. This task is a partially ordered one. It would normally make little sense, for instance, to ask for one of the morning departures until one has specified the date; and it normally makes little sense to expect the system to tell whether or not flights are fully booked on a certain date until one has indicated the itinerary. Moreover, ordinary users know that this is the case. Task structure is helpful if the task complexity makes it advisable to control the user's input (see 3.5.2). It is important to note, however, that partial-order-only is what one is likely to find in most cases. Moreover, sub-task interdependencies may interfere with the designer's pre-conceived ideas about '~the" task order. For instance, some users may want to know which departures are available at reduced fares before wanting to know about departure times, whereas others do not care about fare reductions at all. In this way, sub-task interdependencies may also introduce an element of negotiation into the dialogue. For instance, before accepting a certain departure, the caller may want to know if the subsequent departure is the cheaper of the two. This makes the dialogue a two-way exchange where the caller and the system take turns in asking questions of one another and answering those questions.
290
N.O. BERNSEN AND L. DYBKJ,ZER
Both elements, negotiation and two-way information flow, complicate task model design. Moreover, even the partial order that the task has may be by default only. If someone simply wants to leave town as soon as possible, the itinerary matters less than the departure time of the first outbound flight that has a free seat.
3.5. 1.4 Negotiation tasks Does the task require substantial negotiation? Some tasks involve a relatively low volume of information and in addition have some structure to them. Still, they may be difficult to manage if they involve a considerable amount of negotiation. The Verbmobil meeting-scheduling task is an example. Fixing a meeting simply requires fixing a date, a time or a time interval, and possibly a place, so the volume of information to be exchanged is relatively low. Unless one knows the date, it can be difficult to tell if one is free at a certain time, so the task has some structure to it. And for busy people, the date-time pair may matter more than the exact venue. The problem inherent to the Verbmobil task is that fixing meetings may require protracted, and ill-structured, negotiations of each sub-task. The outcome of a meeting d a t e - t i m e - v e n u e negotiation is not a simple function of calendar availability and prior commitments but also depends on the importance of the meeting, the importance of the prior commitments, the possibilities of moving, canceling, or sending apologies for other meetings, the professional and personal relationships between the interlocutors, etc. In addition, Verbmobil does nothing to impose structure on the ( h u m a n human) dialogue for which it provides translation support, allowing the dialogue to run freely wherever the interlocutors want it to go. In such cases, it makes little sense to judge the task complexity in terms of the volume of information to be exchanged or in terms of the task structure that is present, because the real problem lies in negotiating and eventually agreeing to meet. Had Verbmobil been a human-machine dialogue system product, with the machine representing the diary of someone absent from the conversation, state-of-the-art engineering practice would probably have dictated much stronger system control of the dialogue, effectively turning meeting d a t e time-venue negotiation into a booking exercise (see 3.5.2). Note also that negotiation is a natural human propensity. It seems likely that most SLDSs can be seen to potentially involve an element of negotiation--Verbmobil just involves a considerable amount of it! In the case of the Danish Dialogue System, for instance, the system may propose a list of morning departures and ask if the user wants one of them. If not, the joint search for a suitable departure time continues. So the reason why negotiation is less obvious or prominent in the dialogue conducted with the
FROM SINGLE WORD TO NATURAL DIALOGUE
291
Danish Dialogue System when compared to the dialogue conducted with Verbmobil is not that the task of the Danish system by itself excludes negotiation. Rather, the reason is that the system's dialogue behavior has been designed to constrain the way in which negotiations are done.
3.5. 1.5 S u m m a r y When developing SLDSs we want to let the users speak freely. However, if the task is a complex one, the SLDS project either has to be given up as a commercial undertaking, turned into a research project, such as Verbmobil, or requires some or all of the following strategies: 1. 2. 3. 4.
input prediction (see 3.4.2.1) input language processing control (see 3.4.2.2) output control (see 3.4.2.3) control of user input (see 3.5.2).
To the extent that they are applicable, all of 1-4 are of course useful in any system but their importance grows with increasing task complexity. 4 subsumes 3 as shown in the next section. Obviously, 1-4 can also be used for very simple tasks, making dialogue engineering easier to do for these tasks than if the users were permitted to speak freely in a totally unconstrained fashion. In fact, as we shall see in section 3.5.2, this language of "not permitting" users certain things, "constraining them," etc. is misleading. Elegant dialogue management solutions can often be found which do not in the least reduce the user's sense of being engaged in natural spoken communication.
3.5.2 Controlling User Input The dialogue manager can do many things to control the user's input in order to keep it within the system's technical capabilities of recognition and understanding. Among task-oriented SLDSs, extreme lack of user input control may be illustrated by a system which simply tells the user about the task it can help solve and then invites the user to go ahead. For instance: "Freddy's Used Cars Saloon. How can I help you?" Lack of user input control allows the user to produce unconstrained input. Clear signs of unconstrained user input are: 9 very long input sentences; topics are being raised, or sub-tasks addressed, in any order 9 any number of topics is being addressed in a single user utterance.
292
N. O. BERNSEN AND L. DYBKJA~R
Even for comparatively simple tasks, the handling of unconstrained user input is a difficult challenge which most SLDS projects cannot afford to try to meet. What user input control does is to impose, through a variety of explicit and implicit means, more or less strong constraints on what would otherwise have been unconstrained user input. Some of the user input control mechanisms available to the SLDS developer are:
3.5.2. 1 Information on system capabilities This section is about explicit information to users on what the system can and cannot do. For all but the simplest SLDSs, and for all but very special user populations, it is advisable that the system, up-front, clearly and succinctly tells the user things like what is its domain, what are its task(s), etc. This helps to "tailor" the user's expectations, and hence the user's input, to the knowledge the system actually has, thereby ultimately reducing the task of the dialogue manager. For instance, the following two systems both qualify in the generic sense as ferry timetable information systems (Ss): system S1 knows everything about the relevant, current, and planned timetables; $2 knows, in addition, about significant delays and is thus able to distinguish between planned and actual arrivals and departures. It is important to tell users whether the system they are about to use is an S1 or an $2. System capability information does not have to be given to users through speech, of course. Waxholm, for instance, provides this information as text on the screen as well as in synthetic speech. If the information is given through speech, it is virtually always important that it is expressed briefly and clearly because users tend to lose attention very quickly when they have to listen to speech-only information. It is often an empirical issue how much users need to be told about the system's capabilities in order that their mental model of the system's capabilities roughly matches what the system actually can and cannot do. If the story is longer than a few facts, it is advisable to make it possible for regular users to skip the story. Also, those parts of the story which only relate to some optional loop in the dialogue might be better presented at the start of that loop than up-front. Brief on-line information on the system's capabilities cannot be made redundant by any amount of paper information about the system. 3.5.2.2 Instructions on h o w to address the system This section is about explicit instructions to users on how to address the system. By issuing appropriate instructions, the SLDS may strongly increase its control of the way it is being used. For instance, the Danish Dialogue System tells its users that it will not be able to understand them unless they answer the
FROM SINGLE WORD TO NATURAL DIALOGUE
293
system's questions briefly and one at a time. If used at all, such operating instructions should be both very brief and eminently memorable. Otherwise, they will not work because too many users will forget the instructions immediately. Indications are that the quoted instruction from the Danish Dialogue System worked quite well. However, as part of the operating instructions for the Danish system, users were also instructed to use particular keywords when they wanted to initiate meta-communication (see 3.6.2). This worked less well because too many users forgot the keywords they were supposed to use.
3.5.2.3
Feedback on what the system understood
Feedback on
what the system has understood from what the user just said helps ensure that, throughout the dialogue, the user is left in no doubt as to what the system has understood (see also 3.6.6). All SLDSs need to provide this form of (information) feedback. A user who is in doubt as to whether the system really did understand what the user just said is liable to produce unwanted input.
3.5.2.4 Processing feedback Processing feedback is about what the system is in the process of doing. When the system processes the information received from the user and hence may not be speaking for a while, processing feedback keeps the user informed on what is going on (see also 3.6.6). Most SLDSs can benefit from this form of (processing) feedback. A user who is uncertain about what is going on inside the system, if anything, is liable to produce unwanted input. 3.5.2.5 Output control The aim of output control (see also 3.4.2) is to "prime" the user through the vocabulary, grammar, and style adopted by the system. There is ample evidence that this works very well for SLDSs. Humans are extremely good at (automatically, unconsciously) adapting their vocabulary, grammar, and style to those of their partners in dialogue or conversation [33, 34]. Just think about how easily we adapt linguistically to small children, the hard of hearing, or foreigners with little mastery of our mother tongue. It is therefore extremely useful to make the system produce output which only uses the vocabulary and grammar which the system itself can recognize, parse, and understand, and to make the system use a style of dialogue that induces the user to provide input which is terse and to the point. The unconscious adaptation performed by the users ensures that they still feel that they can speak freely without feeling hampered by the system's requirements for recognizable vocabulary, simple grammar, and terse style. A particular point to be aware of in this connection is that if the system's output to the user includes, for example, typed text on the screen, then the
294
N. O. BERNSEN AND L. DYBKJAER
textual output should be subjected to the same priming strategy as has been adopted for the system's spoken output. It is not helpful to carefully prime the user through the system's output speech and then undercut the purpose of the priming operation through a flourishing style of expression in the textual output.
3.5.2.6 Focused output and system initiative If the system determines the course of the dialogue by having the initiative (see 3.5.3) all or most of the time, for instance through asking questions of the user or providing the user with instructions which the user has to carry out, a strong form of user input control becomes possible. The system can phrase its questions or instructions in a focused way so that, for each question, instruction, etc., the user has to choose between a limited number of response options. If, for instance, the system asks a question which should be answered by a "yes" or "no," or by a name drawn from a limited set of proper names (of people, weekdays, airports, train stations, streets, car model names, etc.), then it exerts a strong influence on the user's input. Dialogue managers using this approach may be able to handle even tasks of very large complexity in terms of information volume (see 3.5.1.1). Note that, in itself, system initiative is far from sufficient for this kind of input control to take place. If the system says, for instance, " A D A P Travels, how can I help you?," it does, in principle, take the initiative by asking a question. However, the question is not focused at all and therefore does not restrict the user's input. An open or unfocused system question may therefore be viewed as a way of handing over the dialogue initiative to the user. Note also that the method just described is better suited for some tasks than for others, in particular for tasks consisting of a series of independent pieces of information to be provided to the system by the user. Beyond those tasks, the strong form of input control involved will tend to give the dialogue a rather mechanical, less than natural quality. Moreover, some tasks do not lend themselves to system initiative only, and large unstructured tasks cannot be handled through focused output combined with system initiative in a way that is acceptable to the users. 3.5.2.7 Textual material The term "textual material" designates information about the system in typed or handwritten form and presented graphically, such as on screen or on paper, or haptically, such as using Braille. Typically, this information tells users what the system can and cannot do and instructs them on how to interact with the system. For particular purposes, such as when the users are professionals and will be using the SLDS extensively in their work, strong user input control can be exerted through
FROM SINGLE WORD TO NATURAL DIALOGUE
295
textual material which the user is expected to read when, or before, using the system. For obvious reasons, text-based system knowledge is difficult to rely on for walk-up-and-use systems unless the system, like Waxholm, includes a screen or some other text-displaying device. Users are not likely to have textual material on paper at hand when using the system: it is somewhere else, it has disappeared, or they never received it in the first place.
3.5.2.8 Barge-in Barge-in means that users can speak to the system, and expect to be recognized and understood by it, whenever they so wish, such as when the system itself is speaking or is processing recent input. Barge-in is not, in fact, an input control mechanism. Rather, it is something which comes in handy because full input control is impossible. In particular, it is impossible to prevent enough users from speaking when the system is not listening, no matter what means are being adopted for this purpose. The likelihood of users speaking freely "out of order" varies from one application and user group to another. In some applications, it may even be desirable that the users can speak freely among themselves whilst the system is processing the spoken input. Still, barge-in technology is advisable for very many SLDSs, so that the system is able to recognize and process user input even if it arrives when the system is busy doing something other than just waiting for it. In Waxholm, the system does not listen when it speaks. However, the user may barge in by pressing a button to interrupt the system's speech. This is useful when, for instance, the user already feels sufficiently informed to get on with the task. For instance, users who are experts in using the application can use the button-based barge-in to skip the system's introduction. The Danish Dialogue System does not allow barge-in when the system speaks. This turned out to cause several transaction failures, i.e. dialogues in which the user did not get the result asked for. A typical case is one in which the system's feedback to the user (see 3.6.6) shows that the system has misunderstood the user, for instance by mistaking "Saturday" for "Sunday." During the system's subsequent output utterance, the user says, e.g.: "No, Saturday." The system's lack of reaction to what the user said is easily interpreted by the user as indicating that the system had received the error message, which of course it hadn't, and couldn't have. As a result, the user will later receive a flight ticket which is valid for the wrong day. 3.5.3
Who Should Have the Initiative?
Dialogue in which the initiative lies solely with the system was discussed as an input control mechanism in Section 3.5.2.6. This section generalizes the discussion of initiative initiated there.
296
N. O. BERNSEN AND L. DYBKJ/ER
Dialogue management design takes place between two extremes. F r o m the point of view of technical simplicity, one might perhaps wish that all SLDSs could conduct their transactions with users as a series of questions to which the users would have to answer "yes" or " n o " and nothing else. Simpler still, "yes" or " n o " could be replaced by filled pauses ("grunts") and unfilled pauses (silence), respectively, between the system's questions, and speech recognition could be replaced by grunt detection. F r o m the point of view of natural dialogue, on the other hand, users should always be able to say exactly what they want to say, in the way they want to say it, and when they want to say it, without any restrictions being imposed by the system. Both extremes are unrealistic, of course. If task complexity is low in terms of, among other things, information volume and negotiation potential, then it is technically feasible today to allow the users to say what they want whilst still using some of the input control mechanisms discussed in Section 3.5.2. As task complexity grows in terms of information volume, negotiation potential and the other factors discussed in Section 3.5.1, it really begins to matter who has the initiative during the dialogue. We may roughly distinguish three interaction modes, i.e. system-directed dialogue, mixed initiative dialogue, and user-directed dialogue. This distinction is a rough-and-ready one because initiative distribution among user and system is often a matter of degree depending upon how often which party is supposed to take the initiative. In some cases, it can even be difficult to classify an SLDS in terms of who has the initiative. If the system opens the dialogue by saying something like: "Welcome to service X, what do you w a n t ? " - - i t might be argued that the system has the initiative because the system is asking a question of the user. However, since the question asked by the system is a completely open one, one might as well say that the initiative is being handed over to the user. In other words, only focused questions clearly determine initiative (cf. 3.5.2.6). The same is true of other directive uses of language in which partner A tells partner B to do specific things, such as in instructional dialogues.
3.5.3. 1 System-directed dialogue As long as the task is one in which the system requires a series of specific pieces of information from the user, the task may safely be designed as one in which the system preserves the initiative throughout by asking focused questions of the user. This system directed approach would work even for tasks of very large complexity in terms of information volume, and whether or not the tasks are well structured. Note that if the sub-tasks are not mutually independent, or if several of them are optional, then system initiative may be threatened (cf. 3.5.1). Still, system-directed dialogue is an effective strategy for reducing
FROM SINGLE WORD TO NATURAL DIALOGUE
297
user input complexity and increasing user input predictability (cf. 3.4.2.1). In addition, system-directed dialogue actually is relatively natural for some tasks. From a dialogue engineering perspective, it may be tempting to claim that system-directed dialogue is generally simpler to design and control than either user-directed dialogue or mixed initiative dialogue, and therefore should be preferred whenever possible. This claim merits some words of caution. Firstly, it is not strictly known if the claim is true. It is quite possible that system-directed dialogue, for all but a relatively small class of tasks, is not simpler to design and control than its alternatives because it needs ways of handling those users who do not let themselves be fully controlled but speak out of turn, initiate negotiation, ask unexpected questions, etc. Secondly, products are already on the market which allow mixed initiative dialogue for relatively simple tasks, such as train timetable information [6], and it is quite likely that users generally tend to prefer such systems because they let the users speak freely to some extent.
3.5.3.2 Mixed initiative dialogue In mixed initiative dialogue, any one of the participants may h a v e - - o r t a k e - - t h e initiative. A typical case in which a mixed initiative approach is desirable is one in which the task is large in terms of information volume and both the user and the system need information from one another. Whilst asking its questions, the system must be prepared, sometimes or all of the time, that the user may ask a question in return instead of answering the system's own question. For instance, the user may want to know if a certain flight departure allows discount before deciding whether that departure is of any interest. In practice, most SLDSs have to be mixed initiative systems in order to be able to handle user-initiated repair meta-communication. The user must have the possibility of telling the system, at any point during dialogue, that the system has misunderstood the user or that the user needs the system to repeat what it just said (see 3.6.2). Only systems which lack metacommunication altogether can avoid that. Conversely, even if the user has the initiative throughout the dialogue, the system must be able to take the initiative to do repair or clarification of what the user has said. When thinking about, or characterizing, SLDSs, it may be useful, therefore, to distinguish between two issues: 9 who has the initiative in domain communication? 9 who has the initiative in meta-communication? (more in 3.6.2). The need for mixed initiative dialogue in domain communication is a function of bidirectionality of the flow of information needed to complete
298
N.O. BERNSEN AND L. DYBKJ,ZER
the task, sub-task interdependencies, the potential for negotiation occurring during task completion, the wish for natural and unconstrained dialogue, etc. (cf. 3.5.1). Other things being equal, mixed initiative dialogue is harder to control and predict than system directed dialogue.
3.5.3.3 User-directed dialogue In user-directed dialogue, the user has the initiative all or most of the time. User-directed dialogue is recommended for ill-structured tasks in which there is no way for the system to anticipate which parts of the task space the user wants to address on a particular occasion. The flight conditions information task (see 3.5.1) is a case in point, as is the email operation task (see 3.3.3). User-directed dialogue is also useful for tasks with regular users who have the time to learn how to speak to the system to get the task done. However, user-directed dialogue is harder to control and predict than system-directed dialogue. For the time being, user-directed dialogue is not recommended for walk-up-anduse users except for very simple tasks. 3.5.4 Input Prediction/Prior Focus In order to support the system's speech recognition, language processing, and dialogue management tasks, the dialogue manager developer should investigate if selective prediction of the user's input is possible at any stage during the dialogue (see 3.4.2). This may be possible if, for example, the system asks a series of questions each requesting specific pieces of information from the user. If the task has some structure to it, it may even be possible to use the structure to predict when the user is likely to ask questions of the system, thus facilitating mixed initiative (3.5.3) within a largely system directed dialogue. For instance, the Daimler-Chrysler dialogue manager and the Danish Dialogue System use various forms of input prediction. Another way of describing input prediction is to say that the dialogue manager establishes a (selective)focus of attention prior to the next user utterance. Useful as it can be, input prediction may fail because the user does not behave as predicted. In that case, the dialogue manager must be able to initiate appropriate meta-communication (see 3.6.2). This is not necessarily easy to do in case of failed predictions because the system may not be aware that the cause of failed recognition or understanding was its failure to predict what the user said. Unless it relaxes or cancels the prediction, the risk is that the dialogue enters an error loop in which the system continues to fail to understand the user. Input prediction can be achieved in many different ways. It may be useful to distinguish between the following two general approaches.
FROM SINGLE WORD TO NATURAL DIALOGUE
299
3.5.4.1 Knowledge-based input prediction In knowledge-based input prediction, the dialogue manager uses a priori knowledge of the context to predict one or more characteristics of the user's next utterance. Note that the a priori nature of knowledge-based input prediction does not mean that implemented predictions should not be backed by data on actual user behavior. It is always good practice to test the adopted knowledge-based input prediction strategy on user-system interaction data.
3.5.4.2 Statistical input prediction In statistical input prediction, the dialogue manager uses corpus-based information on what to expect from the user. Given a corpus of user-system dialogues about the task(s) at hand, it may be possible to observe and use regularities in the corpus, such as that the presence of certain words in the user's input makes it likely that the user is in the process of addressing a specific subset of the topics handled by the system, or that the presence of dialogue acts DA5 and DA19 in the immediate dialogue history makes it likely that the user is expressing DA25. Waxholm uses the former approach, Verbmobil the latter.
3.5.5 Sub-task Identification It is a useful exercise for the dialogue manager developer to consider the development task from the particular point of view of the dialogue manager. The dialogue manager is deeply embedded in the SLDS, is out of direct contact with the user, and has to do its job based on what the speech and language layers deliver. This happens in the context of the task, the target user group, and whatever output and input control the dialogue manager may have imposed. Basically, what the speech and language layers can deliver to the dialogue manager is some form of meaning representation. Sometimes the dialogue manager does not receive any meaning representation from the speech and language layers even though one was expected. Even if a meaning representation arrives, there is no guarantee that this representation adequately represents the contents of the message that was actually conveyed to the system by the user because the speech and language layers may have got the user's expressed meaning wrong. Still, whatever happens, the dialogue manager must be able to produce appropriate output to the user. Current SLDSs exhibit different approaches to the creation of a meaning representation in the speech and language layers as well as to the nature of the meaning representation itself. An important point is the following: strictly speaking, the fact that a meaning representation arrives at the dialogue manager is not sufficient for the dialogue manager to carry on with
300
N. O. BERNSEN AND L. DYBKJ,ZER
the task. First, the dialogue manager must identf/:v to ~'hich sub-task(s), or
topics, if any, that incoming meaning representation provides a contribution.
Only when it knows that, or believes that it knows, can the dialogue manager proceed to sort out which contribution(s), if any, the incoming meaning representation provides to the sub-task(s). In other words, many task-oriented SLDSs require the dialogue manager to do sub-task identification or topic identification. The task solved by most SLDSs can be viewed as consisting in one or several sub-tasks or topics to be addressed by user and system. One or several of these sub-tasks have to be solved in order that the user and the system succeed in solving the task. Other sub-tasks may be optional, i.e. their solution is sometimes, but not always, required. An example class of optional sub-tasks is the meta-communication sub-tasks (see 3.6.2): if the dialogue proceeds smoothly, no meta-communication sub-tasks have to be solved. Basically, dialogue managers can be built so as to be in one of two different situations with respect to sub-task identification. In the first case, the dialogue manager has good reason to assume that the user is addressing a particular domain sub-task; in the second case, the dialogue manager does not know which domain sub-task the user is addressing. In both cases, the dialogue manager must start from the semantic representations that arrive from the speech and language layers, look at the semantically meaningful units, and seek to figure out which sub-task the user is addressing.
3.5.5. 1 Local focus The dialogue manager may have good reason to assume that the user's utterance addresses a specific sub-task, such as that of providing the name of an employee in the organization hosting the system. Depending on the task and the dialogue structure design, there can be many different reasons why the dialogue manager knows which sub-task the user is currently addressing: there may be only one sub-task, as in an extremely simple system; the task may be well-structured; the system just asked the user to provide input on that sub-task, etc. Generally speaking, this is a good situation for the dialogue manager to be in, as in the Danish Dialogue System. This system almost always knows, or has good reason to believe, that the user is either addressing a specific domain sub-task or has initiated meta-communication. Since it has good reason to believe which sub-task the user is addressing, the task of the dialogue manager reduces to that of finding out exactly what is the user's contribution to that sub-task (or one of those sub-tasks, if we count in the possibility that the user may have initiated meta-communication). In such cases, the system has a local focus. The system may still be wrong, of course, and then it becomes the joint task of the system and the user to rectify the situation through metacommunication.
FROM SINGLE WORD TO NATURAL DIALOGUE
301
3.5.5.2 Global focus The dialogue manager does not know which of several possible sub-tasks the user is addressing. The main reason why the dialogue manager does not know which sub-task the user is currently addressing is that the dialogue manager has given the user the initiative, for instance by asking an open question in response to which the user may have addressed any number of possible sub-tasks. Alternatively, the user has unexpectedly taken the initiative. In such situations, the dialogue manager has to do sub-task identification, or topic identification, before it can start processing the user's specific contribution to the sub-task. Sub-task identification is crucial in systems such as RailTel/ARISE, Waxholm, and Verbmobil. Waxholm uses probabilistic rules linking semantic input features with topics. Given the rules and a particular set of semantic features in the input, Waxholm infers which topic the user is actually addressing. For sub-task identification, Verbmobil uses weighted default rules to map from input syntactic information, keywords, and contextual information about which dialogue acts are likely to occur, into one or several dialogue acts belonging to an elaborate taxonomy of approximately 54 speech acts (or dialogue acts). An important additional help in sub-task identification is support from a global focus, for instance when the dialogue manager knows that the task history (see 3.7.1) contains a set of as yet unsolved sub-tasks. These tasks are more likely to come into local focus than those that have been solved already. Another form of global focus can be derived from observation of dialogue phases. Most task-oriented dialogues unfold through three main phases: the introduction phase with greetings, system introductions, etc., the main task-solving phase, and the closing phase with closing remarks, greetings, etc. Sometimes it may be possible to break down the main tasksolving phase into several phases as well. If the system knows which phase the dialogue is in at the moment, this knowledge can be used for sub-task identification support. Knowing that can be a hard problem, however, and this is a topic for ongoing research. For instance, the joint absence of certain discourse particles called topic-shift markers and the presence of certain dialogue acts may suggest that the user has not changed dialogue phase. Generally speaking, the more possible sub-tasks the user might be addressing in a certain input utterance, the harder the sub-task identification problem becomes for the dialogue manager. When doing sub-task identification, the dialogue manager may follow one of two strategies. The simpler strategy is to try to identify one sub-task only in the user's input, even if the user may have been addressing several sub-tasks, and continue the dialogue from there as in Waxholm. The more demanding strategy is to try to identify each single sub-task addressed by the user, as in
302
N. O. BERNSEN AND L. DYBKJAER
RailTel/ARISE. The latter strategy is more likely to work when task complexity is low in terms of volume of information.
3.5.5.3 After sub-task identification Depending on what arrives from the speech and language layers, and provided that the dialogue manager has solved its sub-task identification task, the dialogue manager must now determine the users' specific contribution(s) to the sub-task(s) they are addressing (see 3.5.5). Following that, the dialogue manager must do one of five things as far as communication with the user is concerned: 1. advance the domain communication (see 3.6.1) including the provision of feedback (see 3.6.6) 2. initiate meta-communication with the user (see 3.6.2 and 3.6.5) 3. initiate other forms of communication (see 3.6.3) 4. switch to a fall-back human operator 5. end the dialogue (see 3.6.7). Advancing the domain communication means getting on with the task. Initiating meta-communication means starting a sub-dialogue (or, in this case, a meta-dialogue) with the user in order to get the user's meaning right before advancing the domain communication any further. No SLDS probably can do without capabilities for 1 and 2. If 2 fails repeatedly, some systems have the possibility of referring the user to a human operator (4). Otherwise, calling off the dialogue is the only possibility left (5). In parallel with taking action vis-a-vis the user, the dialogue manager may at this stage take a smaller or larger series of internal processing steps which can be summarized as: 6. updating the context representation (see 3.7.1) 7. providing support for the speech and language layers to assist their interpretation of the next user utterance (see 3.4.2).
3.5.6 Advanced Linguistic Processing The determination of the user's contribution to a sub-task requires more than, to mention just one example, the processing of semantic feature structures. Processing of feature structures often implies value assignment to slots in a semantic frame even though these values cannot be straightforwardly derived from the user's input in all cases. If the system has to decide on every contribution of a user to a sub-task, something which few systems do at present, advanced linguistic processing
FROM SINGLE WORD TO NATURAL DIALOGUE
303
is needed. It may involve, among other things, cross-sentence co-reference resolution, ellipsis processing, and the processing of indirect dialogue acts. In nearly all systems, the processing of most of these phenomena is controlled and carried out by one of the natural language components--by the parser in the Daimler-Chrysler dialogue manager, by the semantic evaluation component in the Verbmobil speech translation system--but never without support from the dialogue manager.
3.5.7 Co-reference and ellipsis processing In case of cross-sentence co-reference and ellipsis processing, the natural language system component in charge is supported by the dialogue manager which provides a representation of contextual information for the purpose of constraining the relevant search space. The contextual information is part of the dialogue history (see 3.7.1). Dialogue history information consists in one or several data structures that are being built up incrementally to represent one or more aspects of the preceding part of the dialogue. In principle, the more aspects of the preceding dialogue are being represented in the dialogue history, the more contextual information is available for supporting the processing done in the language layer, and the better performance can be expected from that layer. Still, co-reference resolution and ellipsis processing remain hard problems.
3.5.8 Processing of indirect dialogue acts Advanced linguistic processing also includes the processing of indirect dialogue acts. In this case, the central problem for the system is to identify the "real" dialogue act performed by the user and disguised as a dialogue act of a different type. In contrast to direct dialogue acts, indirect dialogue acts cannot be determined on the basis of their surface form, which makes the frequently used keyword spotting techniques used for the identification of direct dialogue acts almost useless in such cases. Clearly, the processing of indirect dialogue acts calls for less surface oriented processing methods involving semantic and pragmatic information associated with input sequences. This is a hard problem.
3.6
Communication
3.6.1 Domain Communication The primary task of the dialogue manager is to advance the domain communication based on a representation of the meaning-in-task-context
304
N. O. BERNSEN AND L. DYBKJ,ZER
of the user's input (cf. 3.5.5 and 3.5.6). Let us assume that the dialogue manager has arrived at an interpretation of the user's most recent input and decided that the input actually did provide a contribution to the task. This means that the dialogue manager can now take steps towards advancing the domain communication with the user. Obviously, what to do in a particular case depends on the task and the sub-task context. The limiting case is that the dialogue manager simply decides that it has understood what the user said and takes overt action accordingly, such as connecting the caller to a user who has been understood to want to accept a collect call, replaying an email message, or displaying a map on the screen. Some other cases are:
3.6. 1.1 More information n e e d e d The dialogue manager inserts the user's input meaning into a slot in the task model, discovers that more information is needed from the user, and proceeds to elicit that information. 3.6.1.2 Database look-up The dialogue manager looks up the answer to the user's question in the database containing the system's domain knowledge and sends the answer to the language and speech generation components (or to the screen, etc.). 3.6. 1.3 Producing an a n s w e r The dialogue manager inserts the user's input meaning into a slot in the task model, verifies that it has all the information needed to answer the user's query, and sends the answer to the language and speech generation components (or to the screen, etc.). 3.6. 1.4 Making an inference The dialogue manager makes an inference based on the user's input meaning, inserts the result into a slot in a database and proceeds with the next question. In Waxholm, for instance, the system completes the user's "on Thursday" by inferring the appropriate date, and replaces qualitative time expressions, such as "this morning," by well-defined time windows, such as "6 a.m-12 noon." Verbmobil does inferencing over short sequences of user input, such as that a counterproposal (for a date, say) implies the rejection of a previous proposal; a new proposal (for a date, say) implies the acceptance of a (not incompatible but less specific) previous proposal; and a change of dialogue phase implies the acceptance of a previous proposal (for a time, say). It is important to note that such domain-based inferences abound in h u m a n - h u m a n conversation. Without thinking about it, human speakers expect their interlocutors to make those inferences. The dialogue manager has no way of replicating the sophistication of human inferencing during conversation and dialogue. Most current systems are able to process only
FROM SINGLE WORD TO NATURAL DIALOGUE
305
relatively simple inferences. The dialogue manager developer should focus on enabling all and only those inferences that are strictly necessary for the application to work successfully in the large majority of exchanges with users. Even that can be a hard task. For instance, should the system be able to perform addition of small numbers or not? Simple as this may appear, it would add a whole new chapter to the vocabulary, grammar, and rules of inference that the system would have to master. In travel booking applications, for instance, some users would naturally say things like "two adults and two children;" or, in travel information applications, some users may want to know about the ~'previous" or the ~'following" departure given what the system has already told them. The developer has to decide how important the system's capability of understanding such phrases is to the successful working of the application in real life. In many cases, making an informed decision will require empirical investigation of actual user behavior. Finally, through control of user input (see 3.5.2), the developer must try to prevent the user from requiring the system to do inferences that are not strictly needed for the application or which are too complex to implement.
3.6. 1.5
More constraints ne e de d
The dialogue manager discovers that the user's input meaning is likely to make the system produce too much output information and produces a request to the user to provide further constraints on the desired output.
3.6. 1.6 Inconsistent input The dialogue manager discovers that the user's input meaning is inconsistent with the database information, infeasible given the database, inconsistent with the task history, etc. The system may reply, for example, "There is no train from Munich to Frankfurt at 3.10 p.m.," or "The 9 o'clock flight is already fully booked."
3.6. 1.7 Language translation The dialogue manager translates the user's input meaning into another language. 3.6. 1.8 Summary Among the options above, the first four and the last one illustrate straightforward progression with the task. The two penultimate options illustrate domain sub-dialogues. Quite often, the system will, in fact, do something more than just advancing the domain communication as exemplified above. As part of advancing the domain communication, the system may provide feedback to the user to enable the user to make sure that what the user just said has been understood correctly (see 3.6.6).
306
N. O. BERNSEN AND L. DYBKJAER
3.6.2
Meta-communication
Meta-communication, although secondary to domain communication, is crucial to proper dialogue management. Meta-communication is often complex and potentially difficult to design. In meta-communication design, it is useful to think in terms of distinctions between
9system-initiated and user-initiated meta-communication 9 repair and clarification meta-communication. These are all rather different from each other in terms of the issues they raise, and distinction between them gives a convenient breakdown of what otherwise tends to become a tangled subject. In general, one of the partners in the dialogue initiates meta-communication because that partner has the impression that something went wrong and has to be corrected. Note that we do not include system feedback under meta-communication. Some authors do, and there does not seem to be any deep issue involved here one way or the other. We treat feedback as a separate form of system-touser communication in 3.6.6. Primarily for the user, feedback (from the system) is the most important way of discovering that something had gone wrong and has to be corrected.
3.6.2. 1 System-initiated repair meta-communication
System-
initiated repair meta-communication is needed whenever the system has reason to believe that it did not understand the user's meaning. Such cases include: 9 Nothing arrived for the dialogue manager to process, although input meaning was expected from the user. In order to provide appropriate output in such cases, the dialogue manager must get the user to input the meaning once more. It is worth noting that this can be done in many different ways, from simply saying "Sorry, I did not understand," or "Please repeat," to asking the user to speak louder or more distinctly. The more the system knows about the probable cause of its failing to understand the user, the more precise its repair metacommunication can be. Any such gain in precision increases the likelihood that the system will understand the user the next time around and thus avoid error loops (see 3.6.5) 9 Something arrived for the dialogue manager to process but what arrived was meaningless in the task context. For instance, the user may be perceived as responding " L o n d o n " to a question about departure date. In order to provide appropriate output in such cases, the dialogue manager may have to ask the user to input the meaning again.
FROM SINGLE WORD TO NATURAL DIALOGUE
307
However, as the system actually did receive some meaning representation, it should preferably tell the user what it did receive and that this was not appropriate in the task context. This is done by Waxholm and the Danish Dialogue System, for example. For instance, if the Danish Dialogue System has understood that the user wants to fly from Aalborg to Karup, it will tell the user that there is no flight connection between these two airports. Another approach is taken in RailTel/ARISE. These systems would take the user's " L o n d o n " to indicate a change to the point of departure or arrival (see below, this section). Verbmobil uses a statistical technique to perform a form of constraint relaxation in case of a contextually inconsistent user input dialogue act (cr. 3.5.6). A core problem in repair meta-communication design is that the user input that elicits system-initiated repair may have many different causes. The dialogue manager often has difficulty diagnosing the actual cause. The closer the dialogue manager can get to correctly inferring the cause, the more informative repair meta-communication it can produce, and the more likely it becomes that the user will provide comprehensible and relevant input in the next turn.
3.6.2.2
System-initiated
clarification
meta-communication
System-initiated clarification meta-communication is needed whenever the system has reason to believe that it actually did understand the user's meaning which, however, left some kind of uncertainty as to what the system should produce in response. Such cases include: 9 A representation of the user's meaning arrived with a note from the speech and/or language processing layers that they did not have any strong confidence in the correctness of what was passed on to the dialogue manager. The best approach for the dialogue manager to take in such cases is probably to get the user to input the meaning again, rather than to continue the dialogue on the basis of dubious information, which may easily lead to a need for more substantial meta-communication later on. Alternatively, as the system actually did receive some meaning representation, it might instead tell the user what it received and ask for the user's confirmation. 9 A representation of the user's meaning arrived which was either inherently inconsistent or inconsistent with previous user input. In cases of inherent inconsistency which the system judges on the basis of its own domain representation, the system could make the possibilities clear to the user and ask which possibility the user prefers, for instance
308
N.O. BERNSEN AND L. DYBKJ,ZER
by pointing out that "Thursday 9th" is not a valid date, but either "Thursday 8th" or "Friday 9th" would be. Cases of inconsistency with previous user input are much more diverse, and different response strategies may have to be used depending on the circumstances. A representation of the user's meaning arrived which was (semantically) ambiguous or underspecified. For instance, the user asks to be connected to Mr. Jack Jones and two gentlemen with that name happen to work in the organization; or the user wants to depart at "10 o'clock," which could be either a.m. or p.m. In such cases, the system must ask the user for information that can help resolve the ambiguity. The more precisely this can be done, the better. For instance, if the system believes that the user said either ~'Hamburg" or "Hanover," it should tell the user just that instead of broadly asking the user to repeat. Experience indicates that it is dangerous for the system to try to resolve ambiguities on its own by selecting what the system (i.e. the designer at design-time) feels is generally the most likely interpretation. The designer may think, for instance, that people are more likely to go on a flight at 10 a.m. than at 10 p.m. and may therefore assign the default interpretation "10 a.m." to users' "10 o'clock." If this approach of interpretation by default is followed, it is advisable to ask the user explicitly for verification through a "yes/no" feedback question (see 3.6.6). Although user meaning inconsistency and (semantic) ambiguity are probably the most common and currently relevant illustrations of the need for system clarification meta-communication, others are possible, such as when the user provides the system with an irresolvable anaphor. In this case, the system should make the possible referents clear to the user and ask which of them the user has in mind. As the above examples illustrate, system-initiated clarification metacommunication is often a "must" in dialogue manager design. In general, the design of system clarification meta-communication tends to be difficult, and the developer should be prepared to spend considerable effort on reducing the amount of system clarification meta-communication needed in the application. This is done by controlling the user's input and by providing cooperative system output. However, as so often is the case in systems design, this piece of advice should be counter-balanced by another. Speaking generally, users tend to lose attention very quickly when the system speaks. It is therefore no solution to let the system instruct the user at length on what it really means, or wants, on every occasion where there is a risk that the user might go on to say something which is ambiguous or (contextually) inconsistent. In other words, careful prevention of user
FROM SINGLE WORD TO NATURAL DIALOGUE
309
behavior which requires system-initiated clarification meta-communication should be complemented by careful system clarification meta-communication design. One point worth noting is that, for a large class of systeminitiated clarifications, yes/no questions can be used.
3.6.2.3
User-initiated
repair
meta-communication
User-initiated repair meta-communication is needed whenever the system has demonstrated to the user that is has misunderstood the user's intended meaning. It also sometimes happens that users change their minds during the dialogue, whereupon they have to go through the same procedures as when they have been misunderstood by the system. In such cases, the user must make clear to the system what the right input is. Finally, users sometimes fail to hear or understand what the system just said. In this case they have to ask the system to repeat, just as when the system fails to get what the user just said. These three (or two) kinds of user repair metacommunication are mandatory in many systems. User-initiated repair meta-communication can be designed in several different ways:
9 Uncontrolled repair input. Ideally, we would like the users just to speak freely whenever they have been misunderstood by the system, changed their minds with respect to what to ask or tell the system, or failed to get what the system just said. Some systems do that, such as Waxholm, but with varying success, the problem being that users may initiate repair in very many different ways, from "No, Sandhamn?" to "Wait a minute. I didn't say that. I said Sandhamn?" 9Repair keywords." Other systems require the user to use specifically designed keywords, again with varying success. In the Danish Dialogue System, users are asked to use the keyword "change" whenever they have been misunderstood by the system or changed their minds, and to use the keyword "repeat" whenever they failed to get what the system just said. Keywords are simpler for the system to handle than unrestricted user speech. The problem is that users sometimes fail to remember the keywords they are supposed to use. The more keywords users have to remember, the higher the risk that they forget them. For walk-up-and-use systems, 2 - 3 keywords seems to be the maximum users can be expected to remember. 9Erasing." A third approach is used in RailTel/ARISE. This approach is similar to using an eraser: one erases what was there and writes something new in its place. For instance, if the system gets "Frankfurt to Hanover" instead of ~'Frankfurt to Hamburg," the user simply has to repeat "Frankfurt to Hamburg" until the system has received the
310
N. O. BERNSEN AND L. DYBKJ,,ZER
message. No specific repair meta-communication keywords or metadialogues are needed. The system is continuously prepared to revise its representation of the user's input based on the user's latest utterance. This solution may work well for low-complexity tasks, but it will not work for tasks involving selective input prediction (see 3.5.4) and may be difficult to keep track of in high-complexity tasks.
3.6.2.4 User-initiated clarification meta-communication Userinitiated clarification meta-communication is probably the most difficult challenge for the meta-communication designer. Just like the user, the system may output, or appear to the user to output, inconsistent or ambiguous utterances, or use terms which the user is not familiar with. In h u m a n - h u m a n conversation, these problems are easily addressed by asking questions such as: "What do you mean by green departure?" or "Do you mean scheduled arrival time or expected arrival time?" Unfortunately, most current SLDSs are not being designed to handle such questions at all. The reasons are (a) that this is difficult to do and, often more importantly, (b) that the system developers have not discovered such potential problems in the first place. If they had, they might have tried to avoid them in their design of the system's dialogue behavior, i.e. through user input control. Thus, they would have made the system explain the notion of a green departure before the user is likely to ask what it is, and they would have made the system explicitly announce when it is speaking about scheduled arrivals and when it is speaking about expected arrivals. In general, this is one possible strategy to follow by the dialogue manager developer: to remove in advance all possible ambiguities, inconsistencies, and terms unknown to users, rather than to try to make the system handle questions from users about these things. We have developed a tool in support of cooperative system dialogue design [35, ~'~'~t'.disc2.dk]. Part of the purpose of this tool is to avoid situations in which users feel compelled to initiate clarification meta-communication. There is an obvious alternative to the strategy recommended above of generally trying to prevent the occurrence of user-initiated clarification meta-communication. The alternative is to strengthen the system's ability to handle user-initiated clarification meta-communication. The nature of the task is an important factor in determining which strategy to follow or emphasize. Consider, for instance, users inquiring about some sort of "Yellow Pages" commodity, such as electric guitars or used cars. Both domains are relatively complex. In addition, the inquiring users are likely to differ widely in their knowledge of electric guitars or cars. A flight ticket reservation system may be able to address its domain almost without using terms that are unknown to its users, whoever these may be. Not so with a
FROM SINGLE WORD TO NATURAL DIALOGUE
311
used cars information system. As soon as the system mentions ABS brakes, racing tyres, or split back seats, some users will be wondering what the system is talking about. In other words, there seems to be a large class of potential SLDSs which can hardly start talking before they appear to speak gibberish to some of their intended users. In such cases, the dialogue manager developers had better prepare for significant user-initiated clarification meta-communication. It is no practical option for the system to explain all the domain terms it is using as it goes along. This would be intolerable for users who are knowledgeable about the domain in question.
3.6.2.5 Summary
To summarize, the dialogue manager is several steps removed from direct contact with the user. As a result, the dialogue manager may fail to get the user's meaning or may get it wrong. Therefore, both the system and the user need to be able to initiate repair metacommunication. Even at low levels of task complexity, users are able to express themselves in ways that are inconsistent or ambiguous. The system needs clarification meta-communication to handle those user utterances. In some tasks, user clarification meta-communication should be prevented rather than allowed. In other tasks, user clarification meta-communication plays a large role in the communication between user and system.
3.6.3
Other Forms of Communication
Domain communication including feedback (see 3.6.6) and metacommunication are not the only forms of communication that may take place between an SLDS and its users. Thus, the domain-independent opening of the dialogue by some form of greeting is neither domain communication nor meta-communication. The same applies to the closing of the dialogue (see 3.6.7). These formalities may also be used in the opening and closing of sub-dialogues. Another example is system time-out questions, such as "Are you still there?," which may be used when the user has not provided input within a certain time limit. If the SLDS's task-delimitation is not entirely natural and intuitive to users (cf. 3.5.2.1), users are likely to sometimes step outside the system's unexpectedly limited conception of the task. By the system's definition, the communication then ceases to be domain communication. For some tasks, users' out-of-domain communication may happen too often for comfort for the dialogue manager developer, who may therefore want to do something about it. Thus, Waxholm is sometimes able to discover that the user's input meaning is outside the domain handled by the system. This is a relatively sophisticated thing to do because the system must be able to understand outof-domain terms. Still, this approach may be worth considering in cases
312
N. O. BERNSEN AND L. DYBKJ,ZER
where users may have reason to expect that the system is able to handle certain sub-tasks which the system is actually unable to deal with. When the users address those sub-tasks, the system will tell them that, unfortunately, it cannot help them. In the Verbmobil meeting scheduling task, users are prone to produce reasons for their unavailability on certain dates or times. Verbmobil, however, although being unable to understand such reasons, nevertheless classifies them and represents them in the topic history (see 3.7.1).
3.6.4
Expression of Meaning
Once the system has decided what to say to the user, this meaning representation must be turned into an appropriately expressed output utterance. In many cases, this is being done directly by the dialogue manager. Having done its internal processing jobs, the dialogue manager may take one of the following approaches, among others:
3.6.4. 1 Pre-recorded utterances The dialogue manager selects a stored audio utterance and causes it to be played to the user by sending a message to the player. 3.6.4.2
Concatenation of pre-recorded words and phrases
The dialogue manager concatenates the output utterance from stored audio expressions or phrases and causes it to be played to the user by sending a message to the player.
3.6.4.3
Filling in a template used by a synthesizer
The dialogue
manager selects or fills an output sentence template and causes it to be synthesized to the user.
3.6.4.4 Producing m e a n i n g A more sophisticated approach is to have the dialogue manager produce the what, or the meaning, of the intended output and then have the output language layer determine the how, or the form of words to use, in the output. In this approach, the how is often co-determined by accompanying constraints from the dialogue manager's control and context layers, such as that the output should be a question marked by rising pitch at the end of the spoken utterance.
3.6.4.5 Summary The first two options are closely related and are both used in, e.g., the Danish Dialogue System. Waxholm and the DaimlerChrysler dialogue manager use the third option. This option is compatible with relatively advanced use of control layer information for determining
FROM SINGLE WORD TO NATURAL DIALOGUE
313
the prosody of the spoken output, for example. This can also be done in the first approach but is difficult to do in the second approach because of the difficulty of controlling intonation in concatenated pre-recorded speech.
3.6.5 Error Loops and Graceful Degradation An important issue for consideration by the dialogue management developer is the possibility that the user simply repeats the utterance which caused the system to initiate repair meta-communication. The system may have already asked the user to speak louder or to speak more distinctly but, in many such cases, the system will be in exactly the same uncomprehending situation as before. The system may try once more to get out of this potentially infinite loop but, evidently, this cannot go on forever. In such cases, the system might either choose to fall back on a human operator or close the dialogue. To avoid that, a better strategy is in many cases for the system to attempt to carry on by changing the level of interaction into a simpler one, thereby creating a "graceful degradation" of the (domain or meta-) communication with the user [36]. Depending on the problem at hand and the sophistication of the dialogue manager, this can be done in many different ways, including:
3.6.5. 1 Focused questions The user may be asked focused questions one at a time instead of being allowed to continue to provide one-shot input which may be too lengthy or otherwise too complex for the system to understand. For instance, the system goes from saying "Which information do you need?" to saying "From where do you want to travel?" 9Asking for rephrasing: The user may be asked to rephrase the input or to express it more briefly, for instance when the user's answer to a focused question is still not being understood. 9Asking for a complete sentence: The user may be asked to produce a complete sentence rather than grammatically incomplete input, as in Waxholm. 9 Yes~no questions: The user may be asked to answer a crucial question by "yes" or "no." 9Spelling: The user may be asked to spell a crucial word, such as a person name or a destination. It is important to note that the levels of interaction/graceful degradation approach can be used not only in the attempt to get out of error loops but also in combination with system-initiated clarification meta-communication (cf. 3.6.2.2). So, to generalize, whenever the system is uncertain about the
314
N. O. BERNSEN AND L. DYBKJAER
user's meaning, graceful degradation may be considered. The DaimlerChrysler dialogue manager standardly accepts three repetitions of a failed user turn before applying the graceful degradation approach. This seems reasonable.
3.6.6
Feedback
System feedback to users is essential to successful dialogue management. In order to be clear about what system feedback involves, it is convenient to distinguish between two kinds of feedback, information feedback and process
feedback.
3.6.6. 1 Information feedback The user must have the opportunity to verify that the system has understood the user's input correctly. In general, the user should receive feedback on each piece of information which has been input to the system. The feedback needs not be communicated through speech. Waxholm, for instance, provides a textual representation on the screen of what the system has recognized as well as of the system's output response. The important thing is that the user can perceive the feedback and verify if what the system did was what the user intended the system to do by providing a certain input. So the system's feedback may consist in presenting a particular map on the screen, or a table packed with information of some kind or other, or in playing a certain voice mail which it believes that the user has asked for. In many cases, however, the feedback will be speech produced by the system. Imagine the following dialogue: Dialogue 1 S l: U l: $2: U2: $3:
"ADAP Travels, can ! help you?" "When is the first train from Frankfurt to Hamburg tomorrow morning?" "5.35 a.m." "Thank you. Goodbye." "Goodbye."
Superficially, this conversation looks OK. However, the problem is that the system mis-recognized "Hamburg" as "Hanover"! Dialogue 1, therefore, is a transaction failure. Moreover, the transaction failure is in this case a result of sloppy dialogue design. A better solution would be:
Dialogue 2 S l: U l:
"ADAP Travels, can I help you?" "When is the first train from Frankfurt to Hamburg tomorrow morning?"
FROM SINGLE WORD TO NATURAL DIALOGUE
$2: U2: $3"
315
"The first train from Frankfurt to Hanover on 3rd May 1999 leaves at 5.35 a.m." [Initiates repair meta-communication.]
The system feedback in the example above is relatively straightforward to design. One thing to consider is whether the system should repeat some or all of its core information to the user twice, in this case by saying "... 5.35 a.m., 5.35 a.m." However, the feedback strategy in Dialogue 2 cannot always be trusted. What may happen is the following transaction failure:
Dialogue 3 SI" UI" $2" U2: $3" U3:
" A D A P Travels, can I help you?" "When is the first train from Frankfurt to Hamburg tomorrow morning?" "The first train from Frankfurt to Hanover on 3rd May 1998 leaves at 5.35 a.m." "Thank you." "Do you need any other information?"
In Dialogue 3, the user simply ignores the system's feedback and happily walks away with what is likely to be false information, just as in Dialogue 1. If the users turn out to be too prone to make the error illustrated in Dialogue 3, the more burdensome strategy shown in Dialogue 4 might be considered:
Dialogue 4 SI" UI"
$2" U2" $3"
" A D A P Travels, can I help you?" "When is the first train from Frankfurt to Hamburg tomorrow morning?" "You want to go from Frankfurt to Hanover tomorrow morning?" [Initiates repair meta-communication.]
Experience has shown that the strategy illustrated in Dialogue 4 is more robust than the one in Dialogue 3 [7]. The price to be paid for adopting the Dialogue 4 strategy is that the user and the system have to use more dialogue turns to solve the task. The Dialogue 4 strategy can be considered an early step in graceful degradation (see 3.6.5). The four dialogues above may suffice to illustrate the subtleties that dialogue designers have to face. The amount and nature of the feedback the system should give to the user also depends on factors such as the cost and risk involved in the user-system transaction. Feedback on important bank
316
N. O. BERNSEN AND L. DYBKJAER
transfers or costly journeys is obviously more critical than feedback on which email the system should be reading to the user next. Current opinion probably is that the dialogue manager developer should prefer the safer among the two most relevant feedback options. Even travel information, if the user gets it wrong, can have serious consequences for that user. For important transactions, an additional safeguard is to give the user a full summary of the agreed transaction at the end of the dialogue, preceded by a request that the user listens to it carefully. If this request is not there, the user who has already ignored crucial feedback once, may do so again. The additional challenge for the dialogue designer in this case is, of course, to decide what the system should do if the user discovers the error only when listening to the summarizing feedback. One solution is that the system goes through the core information item by item asking yes/no questions of the user until the error(s) have been found and corrected, followed by summarizing feedback once again.
3.6.6.2 Process feedback SLDS dialogue manager developers may also consider to provide process feedback. Process feedback is meant to keep the user informed that the system is "still in the loop," i.e., that it has not gone down but is busy processing information. Otherwise, the user may, for instance, believe that the system has crashed and decide to hang up, wonder what is going on and start asking questions, or believe that the system is waiting to receive information and start inputting information which the system does not want to have. All of these user initiatives are, or can be, serious for the smooth proceeding of the dialogue. Process feedback in SLDSs is still at an early stage. It is quite possible for today's dialogue manager designers to come up with new, ingenious ways of providing the required feedback on what the system is up to when it does not speak to the user and is not waiting for the user to speak. The best process feedback need not be spoken words or phrases but, perhaps, grunts or uhm's, tones, melodies, or appropriate "earcons." Waxholm tells the user, for instance, "I am looking for boats to Sandhamn," thereby combining information feedback and process feedback. In addition, Waxholm uses its screen to tell the user to wait while the system is working.
3.6.7 Closing the Dialogue Depending on the task, the system's closing of the dialogue may be either a trivial matter, an unpleasant necessity, or a stage to gain increased efficiency of user-system interaction. Closing the dialogue by saying something like "Thank you. Good bye." is a trivial matter when the task has been solved and the user does not need to
FROM SINGLE WORD TO NATURAL DIALOGUE
317
continue the interaction with the system. Users often hang up without waiting for the system's farewell. In some cases, however, when the user has solved a task, the dialogue manager should be prepared for the possibility that the user may want to solve another task without interrupting the dialogue. This may warrant asking the user if the user wants to solve another task. Only if the user answers in the negative should the system close the dialogue. Closing the dialogue is a dire necessity when the system has spent its bag of tricks to overcome repeated error loops (see 3.6.5) and failed, or when the system hands over the dialogue to a human operator. In the former case, the system might ask the user to try again.
3.7
3. 7. 1
History, Users, I m p l e m e n t a t i o n
Histories
As soon as task complexity in terms of information volume exceeds one piece of information, the dialogue manager may have to keep track of the history of the interaction. Dialogue history is a term which covers a number of different types of dialogue records which share the function of incrementally building a dialogue context for the dialogue manager to use or put at the disposal of the language and speech layers (see 3.4.2, 3.5.6). Note that a dialogue history is not a log file of the interaction but a dedicated representation serving some dialogue management purpose. Note also that a dialogue history may be a record of some aspect of the entire (past) dialogue or it may be a record only of part of the dialogue, such as a record which only preserves the two most recent dialogue turns. In principle, a dialogue history may even be a record of several dialogues whether or not separated by hang-ups. This may be useful for building performance histories (see below) and might be useful for other purposes as well. It is useful to distinguish between several different types of dialogue history.
3.7. 1.1 Task h i s t o r y Most applications need a task history, i.e., a record of which parts of the task have been completed so far. The task history enables the system to: 9 focus its output to the user on the sub-tasks which remain to be completed 9 avoid redundant interaction 9 have a global focus (cf. 3.5.5) 9 enable the user to selectively correct what the system has misunderstood without having to start all over with the task.
318
N. O. BERNSEN AND L. DYBKJA~R
The task history does not have to preserve any other information about the preceding dialogue, such as how the user expressed certain things, or in which order the sub-tasks were resolved. If an output screen is available, the task history may be displayed to the user. If the user modifies some task parameter, such as altering the departure time, it becomes necessary to remove all dependent constraints from the task history.
3.7. 1.2 Topic history A topic history is in principle more complex than a task history. It is a record of the topics that have come up so far during the dialogue and possibly in which order they have come up. Even low-complexity systems can benefit from partial or full topic histories, for instance for detecting when a miscommunication loop has occurred (cf. 3.6.5) which requires the system to change its dialogue strategy, or for allowing users to do input repair arbitrarily far back into the preceding dialogue. Another thing a topic history does is to build a context representation during dialogue, which can be much more detailed than the context built through the task history. In spoken translation systems, such as Verbmobil, the context representation provided by the topic history is necessary to constrain the system's translation task.
3. 7. 1.3 Linguistic history A linguistic history builds yet another kind of context for what is currently happening in the dialogue. The linguistic history preserves the actual linguistic utterances themselves (the surface language) and their order, and is used for advanced linguistic processing purposes (see 3.5.6). Preserving the linguistic history helps the system interpret certain expressions in the user's current input, such as coreferences. For instance, if the user says: "I cannot come for a meeting on the Monday," then the system may have to go back through one or more of the user's previous utterances to find out which date "the Monday" is. The Daimler-Chrysler dialogue manager and Verbmobil, for instance, use linguistic history for co-reference resolution. Compared to the task history and the topic history, a linguistic history is a relatively sophisticated thing to include into one's dialogue manager at present. 3.7. 1.4 Performance history A petformance history is rather different from any of the above histories. The system would build a performance history in order to keep track of, or spot relevant phenomena in, the users' behavior during dialogue. So a performance history is not about the task itself but about how the user handles the task in dialogue with the system. For instance, if the system has already had to resolve several miscommunication loops during dialogue with a particular user, it might be advisable to connect that user with a human operator rather than continue the agony.
FROM SINGLE WORD TO NATURAL DIALOGUE
319
One way or another, performance histories contribute to building models of users, whether during a single dialogue or during a series of dialogues with a particular user.
3.7. 1.5 S u m m a r y Future systems solving high-complexity tasks are likely to include a task history, a topic history, and a linguistic history, as is the case in Verbmobil. For increased usability and adaptivity, they may need a performance history as well.
3.7.2
Novice and Expert Users, User Groups
The discussion above has focused on the central importance of the task to the dialogue manager developer. However, developers also have to take a close look at the intended users of the application as part of designing the dialogue manager. An important issue common to many different dialogue management tasks is the difference between novice and expert users. In most cases, this is of course a difference continuum, rather than an either/or matter. Furthermore, it may sometimes be important to the dialogue manager developer that there are, in fact, two different distinctions between novice and expert users. In particular, someone may be an expert in the domain of the application but a novice in using the system itself. Depending on for which of four user groups the system is to be developed (system expert/domain expert, system expert/domain novice, system novice/domain expert, system novice/domain novice), the dialogue manager may have to be designed in different ways.
3.7.2. 1 Domain and system experts If the target user group is domain and system experts only, the developer may be able to impose strict task performance order, a relatively large number of mandatory command keywords, etc., and support use of the system through written instructions, all of which makes designing the dialogue manager much easier. 3. 7.2.2 System novices If the target group is walk-up-and-use users who can be expected to be novices in using the system, a much more usertailored design is required. 3. 7.2.3 Domain and s y s t e m novices The need for elaborate, usertailored design increases even further if the system novices are also domain novices, so that any domain technicality has either to be removed or explained at an appropriate point during dialogue. For instance, even though virtually every user comes close to being a domain expert in travel
320
N.O. BERNSEN AND L. DYBKJAER
timetable information, many users do not know what a "green departure" is and therefore have to be told.
3. 7.2.4 Other user g r o u p s Depending on the task and the domain, the dialogue manager developer(s) may have to consider user groups other than novices and experts, such as the visually impaired, users speaking different languages, or users whose dialects or accents create particular problems of recognition and understanding. In the case of dialects or accents, performance history information might suggest that the dialogue manager makes use of the graceful degradation approach (cf. 3.6.5). 3.7.2.5 Mixing user g r o u p s The "downside" of doing elaborate dialogue design for walk-up-and-use users can be that (system) expert users rightly experience that their interaction with the system becomes less efficient than it might have been had the system included special shortcuts for expert interaction. Given the relative simplicity of current SLDSs, users may quickly become (system) experts, which means that the short-cut issue is a very real one for the dialogue manager developer to consider. The Danish Dialogue System, for instance, allows (system) expert users to bypass the system's introduction and avoid getting definitions of green departures and the like. Waxholm allows its users to achieve the same thing through its barge-in button. The acceptance of unrestricted user input in RailTel/ARISE means that experienced users are able to succinctly provide all the necessary information in one utterance. Novice users who may be less familiar with the system's exact information requirements may provide some of the information needed and be guided by the system in order to provide the remaining information.
3.7.3
Other Relevant User Properties
If the task to be solved by the dialogue manager is above a certain (low) level of complexity, the dialogue manager designer is likely to need real data from user interactions with a real or simulated system in order to get the design right at design-time. Important phenomena to look for in this data include:
3.7.3. 1 S t a n d a r d goals What are the users' standard goal(s) in the task domain? If the users tend to have specific standard goals they want to achieve in dialogue with the system, there is a problem if the system is only being designed to help achieve some, but not all, of these goals. Strict user input control (see 3.5.2) may be a solution--but do not count on it to work in all possible circumstances! Deep-seated user goals can be difficult or
FROM SINGLE WORD TO NATURAL DIALOGUE
321
impossible to control. Of course, another solution is to increase the task and domain coverage of the system.
3. 7.3.2 User beliefs Do the users tend to demonstrate that they have specific beliefs about the task and the domain, which may create communication problems? It does not matter whether these user beliefs are true or false. If they tend to be significantly present, they must be addressed in the way the dialogue is being designed. 3.7.3.3
User preferences Do the users tend to have specific
preferences which should be taken into account when designing the dialogue? These may be preferences with respect to, for instance, dialogue sub-task order.
3.7.3.4 Cognitive loads Will the dialogue, as designed, tend to impose any strong cognitive loads on users during task performance? If this is the case, the design may have to be changed in case the cognitive load makes the users behave in undesirable ways during dialogue. One way to increase users' cognitive load is to ask them to remember to use specific keywords in their interaction with the system: another is to use a feedback strategy which is not sufficiently heavy-handed so that users need to concentrate harder than they are used to doing in order not to risk ignoring the feedback.
3. 7.3.5 Response packages Other cognitive properties of users that the dialogue manager developer should be aware of include the response package phenomenon. For instance, users seem to store some pieces of information together, such as "from A to B.'" Asking them/1"ore where they want to go therefore tends to elicit the entire response package. If this is the case, then the dialogue manager should make sure that the user input prediction enables the speech and language layers to process the entire response package. 3. 7.4
Implementation Issues
The issue of dialogue management can be addressed at several different levels of abstraction. In this chapter we have largely ignored low-level implementation issues such as programming languages, hardware platforms, software platforms, generic software architecture, database formats, query languages, data structures which can be used in the different system modules, etc. Generic software architectures for dialogue management are still at an early stage, and low-level implementation issues can be dealt with in different
322
N.O. BERNSEN AND L. DYBKJ,ZER
ways with little to distinguish between them in terms of efficiency, adequacy, etc. Good development tools appear to be more relevant at this point. A survey of existing dialogue management tools is provided in [37].
3.7.4. 1 Architecture and modularity There is no standard architecture for dialogue managers, their modularity, or the information flow between modules. Current dialogue manager architectures differ with respect to their degree of domain and task independence, among many other things. The functionality reviewed above may be implemented in any number of modules, the modularity may be completely domain and task dependent or relatively domain and task independent, and the dialogue manager may be directly communicating with virtually every other module in an SLDS or it may itself be a module which communicates only indirectly with other modules through a central processing module. As to the individual modules, Finite State Machines may be used for dialogue interaction modeling, and semantic frames may be used for task modeling. However, other approaches are possible and the ones just mentioned are among the simplest approaches.
3.7.4.2
Main task of the dialogue manager
I f there is a central
task which characterizes the dialogue manager as a manager, it is the task of deciding how to produce appropriate output to the user in view of the dialogue context and the user's most recent input as received from the speech and language layers. Basically, what the dialogue manager does in order to interpret user input and produce appropriate output to the user is to: 9 use the knowledge of the current dialogue context and local and global focus of attention it may possess to: map from the semantically significant units in the user's most recent input (if any), as conveyed by the speech and language layers, onto the sub-task(s) (if any) addressed by the user - analyze the user's specific sub-task contribution(s) (if any) 9 use the user's sub-task contribution to: execute a series of preparatory actions (consistency checking, input verification, input completion, history checking, database retrieval, etc.) usually leading to: the generation of output to the user, either by the dialogue manager itself or through output language and speech layers. -
-
-
The dialogue management activities just described were discussed in sections 3.5.5-3.6.7 and 3.7.2-3.7.3 above. The analysis of the user's
FROM SINGLE WORD TO NATURAL DIALOGUE
323
specific sub-task contribution is sometimes called "dialogue parsing" and may involve constraints from most of the elements in the speech input, language input, context, and control layers in Fig. 1. In order to execute one or more actions that will eventually lead to the generation of output to the user, the dialogue manager may use, for example, an AI dynamic planning approach as in Verbmobil, a Finite State Machine for dialogue parsing as in Verbmobil, an Augmented Transition Network as in Waxholm, or, rather similarly, decide to follow a particular branch in a node-and-arc dialogue graph as in the Danish Dialogue System. As the dialogue manager generates its output to the user, it must also: 9 change or update its representation of the current dialogue context; and 9 generate whatever constraint-based support it may provide to the speech and language layers. These dialogue management activities were described in sections 3.4.2, 3.5.2, 3.5.4, and 3.7.1 above. At a high level of abstraction, what the dialogue manager has to do thus is to apply sets of decision-action rules, possibly complemented by statistical techniques, to get from (context+ user input) to (preparatory actions+ output generation + context revision + speech and language layer support). For simple tasks, this may reduce to the execution of a transformation from, e.g., (user input keywords from the speech layer) to (minimum preparatory actions + dialogue manager-based output generation including repair metacommunication) without the use of (input or output) language layers, context representations, and speech and language layer support. Different dialogue managers might be represented in terms of which increasedcomplexity properties they add to this simple model of a dialogue manager. Waxholm, for instance, adds semantic input parsing; a statistical topic spotter ranging over input keywords; out-of-domain input spotting; twolevel dialogue segmentation into topics and their individual sub-structures; preparatory actions, such as dialogue parsing including consultation of the topic history, and database operations including temporal inferencing; userinitiated repair meta-communication; a three-phased dialogue structure of introduction, main phase, and closing; an output speech layer; advanced multimodal output; and topic history updating.
3. 7. 4.3 O r d e r o f o u t p u t to t h e u s e r As for the generation of output to the user, a plausible default priority ordering could be: 1. if system-initiated repair or clarification meta-communication is needed, then the system should take the initiative and produce it as a matter of priority
324
N.O. BERNSEN AND L. DYBKJ,ZER
2. even if 1 is not the case, the user may have initiated metacommunication. If so, the system should respond to it 3. if neither 1 nor 2 is the case, the system should respond to any contribution to domain communication that the user may have made; and 4. then the system should take the domain initiative. In other words, in communication with the user, meta-communication has priority over domain communication; system-initiated meta-communication has priority over user-initiated meta-communication; user domain contributions have priority over the system's taking the next step in domain communication. Note that feedback from the system may be involved at all levels (cf. 3.6.6). Note also that the above default priority ordering 1-4 is not an ordering of the system states involved. As far as efficient processing by the dialogue manager is concerned, the most efficient ordering seems to be to start from the default assumption that the user has made a contribution to the domain communication. Only if this is not the case should 1, 2, and 4 above be considered.
3.7.4.4 Task and domain independence The fact that dialogue management, as considered above, is task-oriented, does not preclude the development of (relatively) task independent and domain independent dialogue managers. Task and domain independence is always independence in some respect or other, and it is important to specify that respect (or those respects) in order to state a precise claim. Even then, the domain or task independence is likely to be limited or relative. For instance, a dialogue manager may be task independent with respect to some, possibly large, class of information retrieval tasks but may not be easily adapted to all kinds of information retrieval tasks, or to negotiation tasks. Dialogue managers with modular-architecture and domain- and task-independence are highly desirable, for several reasons (cf. 3.3.3). For instance, such dialogue managers may integrate a selection of the dialogue management techniques described above while keeping the task model description and the dialogue model description as separate modules. These dialogue managers are likely to work for all tasks and domains for which this particular combination of dialogue management techniques is appropriate. The Daimler-Chrysler dialogue manager and the RailTel/ARISE dialogue manager are cases in point. Much more could be done, however, to build increasingly general dialogue managers. To mention just a few examples, it would be extremely useful to have access to a generalized meta-communication dialogue manager component, or to a domain-independent typology of dialogue acts.
FROM SINGLE WORD TO NATURAL DIALOGUE
4.
325
Conclusion
Spoken language dialogue systems represent the peak of achievement in speech technologies in the 20th century and appear set to form the basis for the increasingly natural interactive systems to follow in the coming decades. This chapter has presented a first general model of the complex tasks performed by dialogue managers in state-of-the-art spoken language dialogue systems. The model is a generalization of the theory of dialogue management in [3] and aims to support best practice in spoken language dialogue systems development and evaluation. To provide adequate context, dialogue management has been situated in the context of the processing performed by the spoken language dialogue system as a whole. So far, the dialogue management model presented here has been used to systematically generate a full set of criteria for dialogue manager evaluation. Preliminary results are presented in [38].
ACKNOWLEDGMENTS The work was carried out in the EU Esprit Long-Term Concerted Action DISC, Grant No. 24823, on Spoken Language Dialogue Systems and Components: Best practice in development and evaluation [www.disc2.dk]. The support is gratefully acknowledged, We would also like to thank the DISC partners Jan van Kuppevelt and Uli Heid who also analyzed some of the exemplar dialogue managers (see 3.1). This generated heated theoretical discussions and greatly improved our understanding of the intricacies of natural language processing in SLDSs.
REFERENCES [1] Fatehchand, R. (1960). Machine recognition of spoken words. Advances in Computers 1, 193-229. [2] Sharman, R. (1999). Commercial Viability will Drive Speech Research. Elsnews 8(1), 5. [3] Bernsen, N. O., Dybkja~r, H. and Dybkjeer, L. (1998). Desig, ing hlteractive Speech Systems. From First Ideas to User Testing, Springer Verlag, Berlin. [4] Bossemeyer, R. W. and Schwab, E. C. (1991). Automated alternate billing services at Ameritech: Speech recognition and the human interface. Speech Technology Magazine, 5(3), 24-30. [5] Aust, H., Oerder, M., Seide, F. and Steinbiss, V. (1995). The Philips automatic train timetable information system. Speech Communication 17, 249-262. [6] Peng, J.-C. and Vital, F. (1996). Der sprechende Fahrplan. Output 10. [7] Sturm, J., den Os, E. and Boves, L. (1999). Issues in spoken dialogue systems: experiences with the Dutch ARISE system, Proceedings of ESCA Workshopon Interactive Dialogue in Multi-Modal Systems, Kloster Irsee, Germany, pp. 1-4. [8] DARPA." Speech and Natural Language. Proceedings o['a Workshop. (1989). Morgan Kaufmann, San Mateo, CA. [9] DARPA: Speech and Natural Language. Proceedings o[a Workshophem at Hidden Valley, Pennsylvania. (1990). Morgan Kaufmann, San Mateo, CA.
326
N. O. BERNSEN AND L. DYBKJAER
[10] DARPA." Speech and Natural Language. Proceedings o['a Workshop. (1991). Morgan Kaufmann, San Mateo, CA. [11] DARPA. Proceedings of the Speech amt Natural Language Workshop. (1992). Morgan Kaufmann, San Mateo, CA. [12] Iwanska, E. (1995). Sumnmrv of the IJCAI-95 Workshop on Context in Natural Language Processing, Montreal, Canada. [13] Gasterland, T., Godfrey, P. and Minker, J. (1992). An overview of cooperative answering. Journal o[hTtellige, t hTjbrmation Svstenls, 1, 123-157. [14] Grosz, B. J. and Sidner, C. E. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12(3), 175-204. [15] Heid, U., van Kuppevelt, J., Chase, L., Paroubek, P. and Lamel, E. (1998). Working paper on natural language understanding and generation current practice. DISC Deliverable D1.4. [16] Baggia, P., Gerbino, E., Giachin, E. and Rullent, C. (1994). Spontaneous speech phenomena in naive-user interactions, in Proceedings o[ TWLT8, 8th T~'ente Workshop on Speech and Language Engineering, Enschede, The Netherlands, pp. 37-45. [17] Waibel, A. (1996). Interactive translation of conversational speech. IEEE Computer, 29(7), 41-48. [18] del Galdo, E. M. and Nielsen, J. (1996). hlter,atio,al User hltelfaces, Wiley, New York. [19] Thomson, D. L. and Wisowaty, J. L. (1999). User confusion in natural language services, Proceedings o[ ESCA Workshop on hTteractive Dialogue i, Multi-Modal Systems, Kloster Irsee, Germany, pp. 189-196. [20] Wyard, P. J. and Churcher, G. E. (1999). The MUeSLI multimodal 3D retail system, Proceedings of ESCA Workshop on htteractive Dialogue in Multi-Modal Systems, Kloster Irsee, Germany, pp. 17-20. [21] Heisterkamp, P. and McGlashan, S. (1996). Units of dialogue management: an example, Proceedings of lCSLP96, Philadelphia, pp. 200-203. [22] Lamel, L., Bennacef, S., Bonneau-Maynard, H.. Rosset, S. and Gauvain, J. L. (1995). Recent developments in spoken language systems for information retrieval, Proceedings of the ESCA Workshop on Spoken Dialogue Systems. Vigso, Denmark, pp. 17-20. [23] den Os, E., Boves, L., Lamel. L. and Baggia, P. (1999). Overview of the ARISE project. Proceedings o[ Eurospeech, Budapest, pp. 1527-1530. [24] Bub, T. and Schwinn, J. (1996). Verbmobil: the evolution of a complex large speech-tospeech translation system. DFKI GmbH Kaiserslautern. Proceedi, gs of ICSLP96, Philadelphia, pp. 2371-2374. [25] Alexandersson, J., Reithinger, N. and Maier, E. (1997). Insights into the dialogue processing of Verbmobil. Proceedings of the Fifth Co,jerence on Applied Natural Language Processhtg, ANLP 97, Washington, DC, pp. 33-40. [26] Bertenstam, J. Blomberg, M., Carlson, R. et al. (1995). The Waxholm system--a progress report. Proceedings of ESCA Workshop on Spoke, Dialogue S l'stems, Vigso, pp. 81-84. [27] Carlson, R. (1996). The dialog component in the Waxholm system. Proceedings of the Twente Workshop on Language Technology ( T W L T l l ) Dialogue Management in Natural Language Systems, University of Twente, the Netherlands, pp. 209-218. [28] Failenschmid, K., Williams, D., Dybkja~r, L. and Bernsen, N. O. (1999). Draft proposal on best practice methods and procedures in human factors. DISC Deliverable D3. 6. [29] Bernsen, N. O. (1995). Why are analogue graphics and natural language both needed in HCI?, in htteractive Systems." Desig,, Spec(fication, and Ver(fication. Focus on Computer Graphics, ed. F. Paterno, Springer Verlag, Berlin, pp. 235-251. [30] Bernsen, N. O. (1997). Towards a tool for predicting speech functionality. Speech Communication, 23, 181-210.
FROM SINGLE WORD TO NATURAL DIALOGUE
327
[31] Bernsen, N. O. and Luz, S. (1999). SMALTO: speech functionality advisory tool. DISC Deliverable D2.9. [32] Fraser, N. M., Salmon, B., and Thomas, T. (1996). Call routing by name recognition: field trial results for the Operetta(TM) system. IVTTA96, New Jersey. [33] Zoltan-Ford, E. (1991). How to get people to say and type what computers can understand. International Journal o[ Man-Machine Studies, 34, 527-547. [34] Amalberti, R., Carbonell, N., and Falzon, P. (1993). User representations of computer systems in human-computer speech interaction, htternational Journal of Man-Machine Studies, 38, 547-566. [35] Dybkjaer, L. (1999). CODIAL, a tool in support of cooperative dialogue design. DISC Deliverable D2.8. [36] Heisterkamp, P. (1993). Ambiguity and uncertainty in spoken dialogue, in Proceedings of Eurospeech93, Berlin, pp. 1657-1660. [37] Luz, S. (1999). State-of-the-art survey of dialogue management tools. DISC Deliverable D2.7a. [38] Bernsen, N. O. and Dybkjaer L. (2000). Evaluation of spoken language dialogue systems, in Automatic Spoken Dialogue Systems, ed. S. Luperfoy, MIT Press, Cambridge, MA.
This Page Intentionally Left Blank
Embedded Microprocessors: Evolution, Trends, and Challenges MANFRED SCHLETT Hitachi Europe Dornacherstr. 3 85622 Feldkirchen Germany
Abstract Embedded into nearly all electrical goods, micro-controllers and microprocessors have become almost commodity products in the microelectronics industry. However, unlike desktop computer processors as used in the embedded sector, many different processor architectures and vendors are on the market. The embedded system market itself is very fragmented, with no standardized embedded platform similar to PC motherboards currently available. Besides the traditional embedded control and desktop computer segment, many new classes of processors have been derived to meet the market needs of this embedded world, Due to still impressive technology advances, makers of embedded microprocessor systems are facing new challenges in the design and implementation of new devices. This chapter describes the major changes and trends occurring in the microprocessor area through the years emphasizing embedded microprocessors, and is giving an outlook on " w h a t might happen".
1. 2. 3. 4.
5.
6.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The 32-bit Embedded Marketplace . . . . . . . . . . . . . . . . . . . . . . . . General Microprocessor and Technology Evolution . . . . . . . . . . . . . . . . Basic Processor Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 System Level Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Embedded Controller Systems . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Embedded Processor Systems . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Computer Processor Systems . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 System A p p r o a c h Conclusion . . . . . . . . . . . . . . . . . . . . . . . . Processor Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Architecture Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Core Implementation Methodologies . . . . . . . . . . . . . . . . . . . . . 5.3 RISC Architecture Evolution . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Core Function Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Future Performance Gains . . . . . . . . . . . . . . . . . . . . . . . . . . Embedded Processors and Systems . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ADVANCES IN COMPUTERS, VOL. 52 ISBN 0-12-012152-2
329
330 332 337 342 342 344 346 346 347 348 348 351 357 358 361 363 363
Copyright ~i 2000 by Academic Press All rights of reproduction in any form reserved.
330 6.2 6.3
MANFRED SCH LE'I'T
Differentiating Factors and Implementation Trends . . . . . . . . . . . . . System A p p r o a c h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Scalable Systems a n d Classes of E m b e d d e d Processors . . . . . 6.5 E m b e d d e d S o f t w a r e a n d S t a n d a r d i z a t i o n Efforts . . . . . . . The Integration Challenge . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.
. . . . .
. . . . .
. . . . .
. . . . .
. . . . . . . . . . . .
365 369 370 372 374 376 377
Introduction
One of the current buzzwords in the microelectronics arena is the embedded system. After the memory price collapse in the late 1990s and the dominance of Intel in the microprocessor arena, the embedded world has still no dominant vendor and promises high growth rates. It is a market full of new developments and trends but, because there is no dominant vendor at the moment, analysts see a lot of opportunities there. Also, because the embedded market is not as unified as the PC and general computer market, a wide variety of different approaches are currently to be found. Hot topics in this area are embedded microprocessors, upcoming standard operating systems, and the system-on-chip integration challenge. Devices with very high performance but extremely low cost and low power consumption show us the way things will go. In the center of each embedded system sits an embedded microprocessor offering very high computational power. Traditionally, the market was divided into embedded control and desktop sectors, but microprocessor vendors created a lot of new classes of processors to match the requirements of today's applications. In the past five decades, there have been many new trends in the computer and microelectronics community. Looking at articles about microelectronics published 50 years ago, we can see that on one hand a great deal changed, but on the other hand we still use binary logic and switching elements. Today of course, we use more transistors than in the past. We still design microcomputers based on the von Neuman principles, and we still use computers to execute arithmetic algorithms. What has really changed is the extended use of microelectronics: nearly all of today's products are implemented using logic ICs or some form of computer. In the early 1950s, integrated microelectronics were used mainly in "calculation machines" or telecommunications equipment. Now, owing to the incredibly fast-growing computer and microelectronics industry and the immense improvements in enabling technology, the design challenge of today can no longer be compared with the challenges electronic system designers faced 50 years ago. This chapter describes the most important aspects of the changes we have witnessed and provides an overview of current microprocessor design challenges with a focus on the rapidly growing embedded world. Starting
EMBEDDED MICROPROCESSORS
331
with Intel's 4004 device in the 1970s, microprocessors have evolved to become one of the most important driving factors of the industry. Controllers are embedded, for example, into phones, refrigerators, ovens, TV sets, or printers, and are therefore used in nearly all electrical goods. When discussing microcontrollers and microprocessors, we tend to think of the Intel x86 architecture and its competitors--the I B M / M o t o r o l a Power PC, Digital Alpha, Sun Sparc, Hewlett-Packard PA-RISC, or MIPS Technologies MIPS architectures [1-7]. Designed primarily for the desktop computer market, these processors have dominated the scene, with the x86 being the clear winner. During the past two decades, these desktop architectures have been drastically further developed and architects are striving to deliver even more computational power to the desktop [8]. But in concentrating on the desktop, we may be missing the next step in microprocessor design: embedded CPUs. As David Patterson argued, Intel specializes in designing microprocessors for the desktop PC, which in five years may no longer be the most important type of computer. Its successor may be a personal mobile computer that integrates the portable computer with a cellular phone, digital camera, and video game player. Such applications require low-cost, energy-efficient microprocessors, and Intel is far from a leader in that area. [9] The evolution of the embedded microprocessor is the focus of this chapter. The chapter starts with a discussion on the driving forces of the embedded system market. What is happening in that market, and which vendors and processor architectures are market leaders? Before we look more closely at current microprocessor trends, one section illustrates the history of the microprocessor and its evolution during the past 25 years. It is always useful to look to the history first, before trying to estimate future trends. The next section defines the worlds of embedded and computer systems, providing basic definitions of the microprocessor itself and the differences between embedded control, embedded processing, and computer systems. This will allow a better understanding of the individual worlds and the needs driving new embedded concepts. The following section is dedicated to instruction set architectures (ISA), their definition, evolution, implementation, and future. On the basis of these discussions, a more detailed look at embedded processors and systems is being prepared. Can embedded systems be classified the way PCs are? Finally, the last chapter introduces the integration challenge, offering further cost reduction, higher performances, and the future integration capabilities offered by developing new and advanced systems. This chapter cannot mention all the new trends in the embedded system market, but provides at least, a basic overview of the past, current, and future situation and evolution.
332
MANFRED SCHLETT 2.
The 32-bit Embedded Marketplace
A lot of excellent research and development work has never been exploited because it was done too early or too late or was not applicable to market needs. But as it is almost impossible to cover all trends in microprocessor design or to mention all new developments, we should look more closely at market trends driving new embedded processor developments. What is happening in the 32-bit embedded domain? Compared to the workstation and PC computer market, the 32-bit embedded processor market is currently extremely fragmented. Several different processor architectures from more than 100 vendors are competing with each other to be used in new embedded applications (see, e.g., [10]). This fragmentation is perhaps the most important difference from the standard computer market. Hardware and software companies active in this embedded field face a situation which differs substantially from the PC market. For an embedded software vendor it is very difficult and resource intensive to support all architectures and devices on the market. The consequence is that vendors normally focus on a single specific architecture, but even so support is very resource intensive because there is no standard embedded platform comparable to a PC motherboard, with fixed resources and a defined system architecture. This situation is caused by the fact that every embedded application has a completely different system-level architecture as a result of different requirements. This explains the huge number of architectures and devices available on the embedded market. Looking at microprocessors, the main players in today's 32-bit embedded market are the MIPS, A R M , SuperH, PowerPC, 68000, and Sparc architectures [11-13]. In fact, most of these architectures have been developed for the desktop computer market. The MIPS architecture from MIPS Technology, Inc. as well as Sun Microelectronics' Sparc were used initially in high-end workstations before being re-used in the embedded domain. The same is true for Advanced RISC Machines A R M architecture, used in an early so-called Arcor RISC PC. The volume business of the PowerPC - - developed jointly by IBM and M o t o r o l a - - i s still the desktop computer market. Although it has been on the market for quite a few years, Motorola's 68000 architecture is still one of the best-selling 32-bit architectures in the embedded domain, which is even more impressive given that the evolution of the 68000 has now ended. The above-mentioned architectures differ a lot in the way they are marketed. A R M processors, for example, are being licensed as a core to a large number of semiconductor vendors offering Application Specific IC's (ASIC) or standard devices based on the A R M core. The most successful A R M core is the A R M 7 T D M I , the so-called Thumb A R M 7 extension [14].
EMBEDDED MICROPROCESSORS
333
As the ARM cores have been licensed quite widely, the MIPS architecture follows a similar approach but has been licensed only to a limited number of global semiconductor players such as NEC, Toshiba, LSI Logic, IDT, Philips, and Siemens. Hitachi's SuperH architecture has been manufactured so far primarily by Hitachi itself, but recently, the SuperH architecture was licensed and the next generation SH-5 architecture developed jointly by Hitachi and ST Microelectronics. All of the above architectures are leaders in specific market areas. Now, which applications and markets are actually driving the embedded processor developments? Some examples: 9 Video game consoles." In the 1990s, the video game market was one of
the most vital. The main players in that area were SuperH and MIPS architectures. Driven by improving the graphics capabilities, these processors and systems could now nearly compete successfully with high-end graphic workstations. Turley [10] describes the embedded microprocessor market as the main driver of high-volume business in the 32-bit embedded RISC arena. Because of new announcements and alliances we will see also other architectures and vendors quite active in this field in the future. 9 Cellphones." This market shows incredible growth rates and has become
a very important driving factor of the industry. Previously using 8- and 16-bit microcontrollers and a dedicated Digital Signal Processing (DSP) processor which executed mainly audio compression functionality, cellphones of today are switching to 32-bit embedded controllers, one of the leading architectures in this field being the ARM7TDMI. 9 PC companions." The growing mobility in daily business life requires a
new class of personal computer equipment. Computers are getting smaller and smaller and could be described as mobile equipment, but a complete PC requires too many peripherals and is not convenient as a personal organizer to access data quickly. New products, so-called PC companions, which can store personal information and data plus a subset of normal PC data, have been introduced to the market. For example, hand-held or palm-size PCs with a simple PC data synchronization mechanism can help make information available right in our hands. The leading architectures in that market are again MIPS and SuperH but ARM, PowerPC, and 68000 devices are also used. 9 Set-top boxes." Digital set-top boxes are changing our way of using TV sets. Video-on-demand, internet access through the TV network or interactive movies require advanced boxes between conventional TV sets and the network. As entertainment becomes a more and more important factor, set-top boxes are now driving several developments
334
MANFRED SCHLETT
in the semiconductor industry. A typical block diagram of a digital settop box includes many different blocks, making it typical of the ASIC market where numerous architectures and cores are further integrated into complex devices. 9 Internet appliances." The internet, defined as a network enabling the fast exchange of information, could be interpreted as a revolution similar to the computer revolution, which happened to start 50 years ago. The pure physical network with touters and switches initiated the development of new networking products such as Ethernet cards, routers, or servers. But the internet is much more. Services associated with the internet, such as electronic commerce, allow easy and fast access to information, personal electronic communication, or other services, and offer a huge potential for new applications. For example, ensuring secure access to internet banking services or trading over the internet require new access procedures such as fingerprint recognition or cryptographic hardware and software tools. Other examples are products providing easy and simple access to these internet services, such as combinations of phones and internet terminals, telephony over the internet, or mobile equipment. 9 Digital consumer." The digital consumer market, covering products such as photography, audio, or video equipment, has for many years been a huge market requiring embedded controllers and processors. This will also remain a major market driver also in the future. New applications such as mobile MP3 players 1 have been developed and introduced, and offer excellent market potential. Another example is the Digital Versatile Disc (DVD) player. One challenge of that market is to connect all these new devices with PCs, but it will still take some considerable effort to have a simple and standard data exchange mechanism between cameras, players, and PCs. 9 In-car information systems." Partly defined as digital consumer products, car information and entertainment systems are becoming increasingly popular. Requiring the performance of PCs but with very special price and feature conditions such as extremely high reliability, this market offers excellent opportunities for high-end embedded processors. The market leader in this emerging area is the SuperH architecture. 9 Printers, networking, and office equipment." Following the huge potential of internet use, networking applications such as links, switches, and routers are all driving factors of the embedded processor market. In 1MP3 is a compression/decompression algorithm allowing the storage of music tracks and audio, respectively, on relatively low-capacity storage media.
EMBEDDED MICROPROCESSORS
335
conjunction with printers, fax machines, and other office equipment, this market is still full of opportunities. In fact, printers and networking were the first foothold of 32-bit embedded microprocessors [10]. Motorola's Coldfire and Intel's i960 devices [15] are the household names of this market. Industrial control." The 32-bit industrial control market segment has been clearly dominated for years by M otorola's 68000 architecture. In the industrial control segment no extremely high volume product is driving new developments as in the video game console market, but there is a huge number of individual products. As the 68000 family evolution has come to an end, new players and vendors are entering that market. Typical applications are motor control, industrial automation, etc.
Basically, we can identify two categories of products embedded controller and processor market:
driving the
9new emerging products which have not yet been introduced to the market 9 convergent products combining two existing products into a new more advanced and combined product.
For example, in the mobile computing area so-called smartphones and personal communicators are being developed. These products are combinations of a PC companion and a cellphone as indicated in Fig. 1. The first approach in designing such a system is just to combine two individual systems by a simple interface mechanism. The next-generation approach is normally to unify both approaches by using only one processor with peripherals integrating both systems into a single one.
FIG. 1. Future combined PC/PC hand-held/cellphone growth market.
336
MANFRED SCHLETT
In this context, it is also worth discussing the Net Computer (NC) initiative, which illustrates that even global players can face difficulties introducing new technology. In the early 1990s, a great deal of effort was put into replacing the conventional PC by the NC. The NC was initially thought of as an extremely lean computer without a hard disk, running application programs mainly via the internet or servers: in fact, the concept was very close to the server-client concept of the mainframe era running a main computer with many "stupid" clients. The Java concept was very much intended for that purpose. Obviously, whether because of the dramatic price reduction of computers or for other reasons, the NC did not replace the PC. Another market started to become important in the early 1990s. The embedded core market became visible on the horizon, and the first players were ARM and MIPS. As embedded processors were increasingly used in high-volume applications, it became a necessity to integrate the basic microprocessor into an ASIC, mainly to reduce the overall costs further. At the end of the 1990s, it was not only a race between different semiconductor vendors, but also a race between the most popular architectures. Most of the global players in that market entered this business by licensing their processor core technology to other players in the field--for example, SPARC, x86, SuperH, PowerPC - - but new processor cores had also been developed to meet the needs of that market. For example, the Argonaut RISC Core (ARC) or hyperstone's 32-bit RISC/DSP engine had been introduced [16]. As well as all the above-mentioned general trends, most of these cores have been further developed to meet market trends, with the most visible trend being to integrate multimedia or DSP capabilities into the chip. This trend occurred at the same time in the computer processor and embedded processor markets. For example, Intel introduced the x86's M M X extension, MIPS followed with MDMX, SPARC with VIS, ALPHA with MVI, and the embedded processors with the SuperH SH-DSP approach, ARM with the Piccolo concept, hyperstone with the unified RISC/DSP specification, and finally Infineon with the Tricore development (see, e.g., [17]). Awaiting the multimedia decade, new players developed completely new devices intended, for example, for use as multimedia PC peripherals, featuring several modem data streams, graphics acceleration, voice and image (de-)compression and a lot of more integrated capabilities. Unfortunately, the market for these devices has not taken off yet. Products including Chromatic Research's MPACT or Philips' Trimedia never became household names in the computer or embedded market. Alongside all these developments and market trends, the 32-bit embedded market is probably the market with the highest growth rate at the moment,
EMBEDDEDMICROPROCESSORS
337
but yet there is no dominant market leader comparable to the x86 in the computer and PC market. The big questions in the embedded domain are, if standardization occurs as happened in the PC market, and if there will be a processor architecture and vendor dominating this market? Before discussing this subject further, let us have a look at the evolution of microprocessors and technology generally during the past five decades.
3.
General Microprocessor and Technology Evolution
To predict future developments and trends, it makes sense to look to the past and see how it was possible to start with very simple logic and then a few years later be able to integrate several million transistors into a single piece of silicon. In the center of a microelectronic system resides a programmable microcontroller or microprocessor. Such a programmable microcontroller or microprocessor is basically defined as a device executing consecutively a set of arithmetic and other logical operations on a defined set of data. A programmer of such a system can define in which consecutive order these operations are executed by the microprocessor. Another definition is given on Intel's website (www.intel.conl): A microprocessor is an integrated circuit built on a tiny piece of silicon. It contains thousands, or even millions, of transistors, which are interconnected via superfine traces of aluminium. The transistors work together to store and manipulate data so that the microprocessor can perform a wide variety of useful functions. The particular functions a microprocessor performs are dictated by software. For further details see, e.g., [18]. Looking at systems controlled by microprocessors from a very general point of view, there is no difference between a modern computer and the microelectronics control system of a washing machine. If we look at the details, these systems d i f f e r - - o f c o u r s e - - b u t the principal concept is the same. So where does this concept come from? First of all, modern computing could be interpreted as a further development of the calculating machines invented to perform basic mathematical operations such as addition, multiplication, division, or finding prime numbers. Blaise Pascal, for example, invented in 1643 an adding machine called the Pascalene, which was the first mechanical adding machine. Another famous example is Charles Babbage's Difference Engine, dating from 1832. All these developments had been invented to accelerate basic arithmetical calculations. One of the most important steps leading to
338
MANFRED SCHLETT
the computer of today was George Boole's exercise on a "system for symbolic and logical reasoning" which we still use today when designing a 1 GHz processor. Of course, these early computers were not really freely programmable, and they were too big and too slow to be integrated into a washing machine. But the computer technology evolution of the past five decades has made this possible. The same system controlling a washing machine could be reused to run a pocket calculator or a cash-register system. The most important steps toward the microelectronic system of today happened around 1950. These steps were the von Neumann machine, the invention of the transistor, the first logic integrated circuits (IC), and the first commercial computer. The huge majority of processors and microelectronic systems on the market still follow the concept of the "yon Neumann machine" introduced by John von Neumann in 1946 [19, 20]. Many people active in the computer field believe that this term gives too much credit to von Neumann and does not adequately reflect the work of the engineers involved in the development and specification phase. Eckert and Mauchly at the Moore School of the University of Pennsylvania built the world's first electronic general-purpose computer, called ENIAC. In 1944, von Neumann was attracted to that project funded by the US Army. ENIAC was the first programmable computer, which clearly distinguished it from earlier computers. The group wanted to improve that concept further, so yon Neumann helped to develop the idea of storing programs as numbers and wrote a memo proposing a stored-program computer called EDVAC. The names of the engineers were omitted from that memo, so it resulted in the common term "yon Neumann computer." In 1946, Wilkes from Cambridge University visited the Moore School; when he returned to Cambridge he decided to run the EDSAC project. The result was a prototype called Mark-I, which might be called the first operational stored-program computer. Previously, programming was done manually by plugging up cables and setting switches. Data were provided on punched cards. The ENIAC and EDVAC projects developed the idea of storing programs, not just data, as numbers and made the concept concrete. Around the same time, Aiken designed an electromechanical computer called Mark-I at Harvard. The subsequently developed Mark-III and MarkIV computers had separate memories for instructions and data. The term "Harvard architecture" is still being used to describe architectures following that approach. In the von Neumann concept, the memory used for program code and data is unified. Quite often, the white papers for processors emphasize Harvard architecture, but in fact the difference between the von Neumann and Harvard architectures is not as great as is often implied. As
EMBEDDED MICROPROCESSORS
339
instructions and data have to be separated at a certain stage anyway, it is more a question of when they are separated. Several more computer pioneers deserve credit for their contributions. Atanasoff, for example, demonstrated the use of binary arithmetic, and Zuse in Germany also made another important development during the late 1930s and early 1940s. Following these scientific developments, the first commercial computer called UNIVAC I was introduced in 1951. Actually, UNIVAC's predecessor, called BINAC, had been developed again by Eckart and Mauchly. UNIVAC I was sold for $250 000 and with 48 built units it was the first successful commercial computer. The first IBM computer (the 701) followed in 1952. At that time the microprocessor of today had not been invented; other technologies such as vacuum tubes were used to implement the basic logic equations necessary to control the system, and to execute the logical operations. However, the basic concept of modern computers and computing had already been established. For further details and information see, e.g., [21]. The next important development was the invention of the transistor in 1947. The transistor soon displaced the vacuum tube as the basic switching element in digital designs, the first logic IC using transistor technology being fabricated in 1959 at Fairchild and Texas Instruments. But it took another 10 years until the world saw the first microprocessor. In 1971 the Intel Corporation introduced the first microprocessor, the 4004; see, e.g., [22]. The 4004 was the starting point for a race achieving higher processor speeds and higher integration. The 4004 integrated approximately 2300 transistors, and measured 3.0 mm x 4.0 mm, and executed an instruction typically in 10.3 ItS. The 4004 was a 4-bit CPU used in the Busicom calculator. This was a 14digit, floating- and fixed-point, printing calculator featuring memory and an optional square-root function. The calculator used 1 4004 CPU, 2 4002 RAM chips, 4 4001 ROM chips, and 3 4003 shift register chips. The 4004 had 16 instruction types, contained 16 general-purpose 4-bit registers, one 4bit accumulator, a 4-level, 12-bit push-down address stack containing the program counter and 3 return addresses for subroutine nesting, instruction register, decoder, and control logic. Finally some bus and timing logic was required to establish the entire CPU. During the past 30 years, the microprocessor and the enabling process technology have been further developed to an incredible level. The development of microprocessors seems still to follow Moore's law, for further details see, e.g., [23]. Moore's law posits that microprocessor performance defined by the number of transistors on a chip will double
340
MANFRED SCHLETT
every 18 months (see Fig. 2). By the year 2005, further technology advances will result in at least 10 of today's microprocessors fitting onto a single chip--at least, this is the projection of the 1998 update of the International Technology Roadmap for Semiconductors, reported in [24]. For highperformance microprocessors, this consortium projects technology parameters as shown in Table I. This means that Moore's law will be valid for another period of dramatic technological improvements. Intel's 4004 was the first commercially available microprocessor, but just a year later, Intel introduced the 8008 8-bit processor, followed by numerous other 8-bit processors, such as Intel's 8080, Motorola's MC6800, and the Fairchild F8 [25]. These microprocessors were not used in computers as CPUs, but primarily in embedded applications. At that time (the early 1970s), microprocessors were seen as a means of accelerating arithmetic calculations, which explains why, in the early days of microcomputers, the focus of research and development was on improving the raw calculation speed and the arithmetical binary methodology. This difference best illustrates the most important change. Today, we focus on what a computer does instead of ho~r a computer works. Desktop computers such as the Apple II in 1976 or 5 years later the IBM Personal Computer (PC) introduced this change. In the early 1980s, the workstation
1 000 000 c'-
0
1O0 000
v cO 0 0 ,,,,,_, C
10 000
41•entium III
~
1000
~
I.-
100
Pentium II
i486
i386
~ 10
80286
~8086 ~,0 J1971
8085 8080 1976
1981
1986
1991 Year
FIG. 2. Moore's law.
1996
2001
2006
341
EMBEDDED MICROPROCESSORS TABLE I THE VLSI CHIP IN THE YEAR 2005 PROJECTED BY THE 1998 UPDATE OF THE INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS: SEE, E.G., [24] Minimum feature size of process technology Total number of transistors Number of logic transistors Chip size Clock frequency Number of I / O connectors Number of wiring levels Supply voltage Supply current Power dissipation
0.1 l~m 200 million 40 million 520 mm 2.0-3.5 G H z 4000 7-8 0.9-1.2 V ~ 160 A ~ 160 W
market was established and for at least the next 15 years, the PC and workstation market drove microprocessor developments; see also [26]. This paradigm shift may be the most important one in the history of computer and microelectronic systems. If Ken Olson (CEO of Digital Equipment Corporation, 1977) had seen this shift he probably would not have stated that "There is no reason for any individuals to have a computer in their home."
C O
System
~:3r} O
Chip
O
Core
> ._1
Arithmetic function Logic function Basic transistor technoloqu
Functional logic block developments
ISA optimization, caches, superscalarity,..
ASIC integration and optimization
System-onchip, complete system level integration
Y
Time FIG. 3. Mainstream trends of processor and logic IC implementation during the past five decades.
342
MANFRED SCHLETT
Looking at the history of microprocessors and semiconductor technology from the point of view of complexity, it is possible to identify five steps in the evolution of IC logic technology, as illustrated in Fig. 3. The development started with simple logic equations and has now reached system-level silicon integration. Every step has a special focus. For example, during the implementation of arithmetic functions, scientific research focused very much on "how to implement arithmetic equations best with semiconductor technology." Today, this chapter of science is almost closed. Optimized CMOS implementation of a multiplier or an adder can be found in standard texts [27]. Of course the individual steps overlap each other, and even in the very early days complete system level integration had been done. But, looking at the evolution from a mainstream point of view, the focus of every decade is expressed in the number of scientific research projects and commercial products introduced into the market.
4.
Basic Processor Classification
After looking at the basic market trends in the 32-bit embedded world and the basic technology evolution during the past five decades, this section introduces a basic processor and system classification.
4.1 System Level Approach Embedded controllers or processors were driven by completely different parameters than processors of PCs, workstations, or supercomputers, [28,29]. But, in the early days, both worlds were pretty much the same. There are still some fields where the approaches are very similar, such as industrial PCs or some industrial control applications. To understand the differences in detail, it is necessary to define the individual worlds. First of all, a traditional embedded system could be defined as a logic system embedded into a system executing a predefined set of functions. Usually, these functions are executed when a predefined situation occurs or started by a simple user interface (e.g. realized by simple buttons). An embedded system is in most cases invisible; typical examples are washing machines, air conditioning systems, car brake systems, etc. The other world is the computer system, defined as a freely programmable system with a focus on the user interface. The system reacts on downloaded software. Typical applications are office automation, general data manipulation, or specific numerical scientific algorithms. Examples are PCs, workstations, supercomputers, or office machines.
EMBEDDED MICROPROCESSORS
343
An embedded system does not necessarily include a microcontroller or microprocessor, but, as mentioned above, this application area became one of the most important for the microelectronics industry. In this area, we find a differentiation into microcontrollers and microprocessors. A microcontroller is normally associated with controlling a system, whereas a microprocessor's focus is processing data as fast as possible. Controlling a system requires different approaches than processing data. Microcontrollers are mainly 4-, 8-, and 16-bit devices and microprocessors 32- and 64-bit devices. Thanks to changes we are about to describe, this distinction is no longer valid except for 64-bit devices. But things change quickly in the semiconductor business, so even that may no longer be true in a few years. In the early days, there was no difference between microcontrollers and microprocessors. Zilog's well-known Z80 [30] or Motorola's first 68000 device were used in computers as well as embedded systems. This was possible because at that time a single CPU was not sufficient to implement the entire system. Additional supporting chips were necessary, but could be replaced by others to be compliant with the system specification. With increasing integration capabilities and thus reduced transistor costs, devices became more specialized. Today, we differentiate mainstream CPUs into the following classes: 9 embedded controllers 9 embedded processors 9 computer processors. This classification is not intended to cover all aspects of logic IC development. There are many devices available for dedicated specific applications such as supercomputers or dedicated office machines which are not covered by this approach. In principle, a computer or an embedded system consists of some basic components such as a processing core, peripheral functions, supporting system modules, and the system bus architecture illustrated in Fig. 4. This basic system is called the micro-system. Besides the micro-system, an additional context is necessary to define the entire system. This context includes visual display units, keyboard, loudspeakers, analogue measuring equipment, etc. The processing core includes the instruction execution unit, peripheral functions include interfaces for serial or parallel communication, analogueto-digital converters, display controllers, etc. The supporting modules include bus interfaces and controllers, all kinds of memories, additional interfaces, etc. Of course, systems are often much more complex than the system shown in Fig. 4, but the basic configuration is the same.
344
MANFRED SCHLETT PM: peripheralmodule PM PM
Context (display, connectors, etc.)
PM
~f
Processing/ CPU module
Core-closesupportmodules Bus Systemsupportmodules
FIG. 4. General system level approach.
The general system level approach does not visualize the differences between the various systems. The separation into embedded or desktop and computer, respectively, is introduced by the peripheral functions that define the input and output of data and the communication with other systems. As mentioned above, we can classify the realizations into three main categories defined by the applications: embedded control, embedded processing, and computer systems. Following this approach, it is possible to identify the three types of rnicrocontrollers and microprocessors as illustrated in Fig. 5. The following sections introduce typical system configurations of these three types.
4.2
Embedded Controller Systems
Khan [31] called microcontrollers the "workhorses of the electronic era." These embedded microcontrollers are self-contained ICs designed to perform specific functions by themselves. Typical state-of-the-art controllers include ROM, RAM, and a list of specific peripherals executing special functions. Today, such an embedded control system is quite often a single chip system with one single microcontroller integrating memory and peripherals controlling the application. Current examples are motor control, airbags, microwave ovens, washing machines, etc. The main features of these microcontrollers are low cost and a vast amount of control functionality. The calculation performance does not have the highest priority. A typical system configuration is shown in Fig. 6.
EMBEDDED MICROPROCESSORS
345
FIG. 5. Basic processor classification.
On-chip RAM/ROM
Context
On-chip peripheral functions, I/O Microcontroller core
FIG. 6. Typical self-contained embedded control system.
The instruction set of a typical microcontroller core includes instructions supporting control functionality, which normally means it allows fast singlebit manipulation instructions such as mask bit instructions. Such a system is not intended to be reprogrammed or updated by the end user. The functionality--once set--will not be modified further. The typical performance requirement of such a system is between 1 and 10 million instructions per second (MIPS). For further details about typical microcontrollers see, e.g., [32].
346
MANFRED SCHLETT
4.3 Embedded Processor Systems This class of embedded processor systems includes devices which still belong to the embedded world but which focus on processing data instead of controlling an entire system. Typical systems include a limited number of additional peripheral chips, for example, an additional embedded microcontroller controlling a dedicated I/O functionality, but the main system functionality is still performed directly by the embedded processor (Fig. 7). Nevertheless, the subsystem might be quite complex. Additional support chips could run an entire graphics accelerator or even a video decoder that could not be handled by the embedded microprocessor itself. Examples for extended I/O capabilities are video or graphic controllers. To a certain degree, this system could be reprogrammed or updated in the field, but, compared to the embedded control system described above, a more open approach is necessary. Program download by the end user may be possible. Typically, such systems offer a MIPS performance between 50 and 500. Examples of these systems are video game consoles, PC companions, car information or navigation systems, etc.
4.4 Computer Processor Systems The third category includes so-called computer processor systems. Control functionality is carded out by dedicated modules and chips rather than by the main CPU itself. PC mainboards are a very good example of this approach. The main CPUs most important role is the execution of general application programs such as games, financial or office software, databases, etc. Figure 8 shows a conceptual block diagram of a standard PC motherboard from the 1998-99 era [33]. A substantial part of the entire system is covered by support chips (mainly the North and South bridges). These additional devices or support chips integrate the communication with additional components such
Context
Support chip (ASIC, companion chip ....)
Embedded microprocessor
Context
Memo~
FIG. 7. Typical embedded processor system configuration.
347
EMBEDDED MICROPROCESSORS
Backside
Frontside
cache
I
cache
Microprocessor
.]
Main memory
F"I Northbridge Advanced graphics port
PCI bus ; iv-
PCI bus card adapter
South bridge (legacy bus bridge, card bus bridge, super I/O)
FIG. 8. Conceptual PC motherboard block diagram [33]. as display, hard disk, sound cards, graphics accelerators, mouse, keyboard, etc. This illustrates that the most important differentiating factor of computer processors is the execution speed of the above-mentioned general programs and the possibility of extending and upgrading the system by the end user. Compared to embedded controllers and processors, the PC motherboard has to be standard and predefined. Software developed on another or previous system must be executed without any rewriting or work. This situation requires a need for upward compatibility of systems and processors, and strict adherence to that system standard. This is maybe the most obvious difference from the embedded system world; in classical embedded systems no such standard exists. The performance of such systems is no longer measured in MIPS; several benchmark programs have been developed to measure the overall system performance.
4.5 System Approach Conclusion The system level approach to embedded control, embedded processor, and computer processor systems does not show up any detailed differences
348
MANFRED SCHLETT
between the processor architectures, but it does disclose differences in the target systems. The main difference between these systems is the far more extended bus structure. The embedded control applications do not feature complicated external bus structures, but in the case of embedded processor systems, an external bus system has been introduced, as the embedded processor system does not include all the peripherals required to run the entire system. In case of computer processor systems, several additional bus structures are necessary to perform the required functionality, which leads to a kind of distributed processing requiring sophisticated methods to avoid bottlenecks. Typically, the computer processor system includes several embedded control systems, with the super I / O part introduced in the South bridge illustrated in Fig. 8 being a typical embedded control system. The different system approaches lead to different implementations of the basic processor core. In Section 5 basic implementation methodologies are discussed to illustrate these differences.
5.
Processor Architectures
In this section, the basics of microprocessor architectures will be introduced and discussed. We discuss in which way a processor architecture works, how it can be implemented, and how we can accelerate the processing speed.
5.1
Architecture Definition
According to Federico Faggin [34], one of the pioneers of microprocessors, it is possible to divide microprocessor progress into three phases, each lasting approximately 25 years: During the first phase, semiconductor technology was the dominant factor of its progress, during the next phase, both technology and architecture will play a major role, and finally, during the last stage, architecture will be the dominant factor. This statement indicates that the choice of architecture itself is becoming a crucial factor in the future worlds of semiconductors and microprocessors. This is especially true for the fragmented embedded system world with its various system requirements. As described earlier, a microprocessor executes a set of arithmetic or logical operations on a defined set of data and data transfer operations. This leads immediately to two subjects defining an architecture: instruction set and data format. The instruction set and the data format define the resources
EMBEDDED MICROPROCESSORS
349
needed to execute an operation in a certain time slot. The time needed to complete an instruction or the time needed to execute a set of instructions provide means to compare different architectures. Because every architecture has its own instruction set, a comparison of instruction timing has to be bundled with a certain task; for example, executing a special arithmetic operation such as a 10-dimensional vector multiplication. A comparison based solely on "how many instructions can be executed per second" is useless, as the entire arithmetic or logical operation or program has to be defined, measured, and compared. The processor's clock determines when a transition from one logical state to the other is being performed and the time from one clock tick to the next one is referred to as the clock cycle. The clock speed or frequency alone does not indicate the overall speed of an architecture implementation. It has to be combined with the information being processed. Basically, there are architectures executing an entire instruction between two clock ticks, whereas others execute only a small part of the entire instruction; for example, only decoding an instruction. The basic arithmetic operation may be performed during another clock cycle. The reason for doing this is described below. The instruction set of a typical microprocessor architecture includes the following instruction types: 9 move instructions." these are used to move data from one memory
location to another 9 arithmetic instructions." executing arithmetic operations such as add,
subtract, multiply, divide, or shift 9 logical instructions." executing logical Boolean bit manipulations of one
or more operands, such as OR, XOR, NOT 9program f l o w control instructions." controlling the program flow such as
branches, or exception processing interrupting the normal program flow 9processor state control instructions." changing the state of the processor
such as reset control. The control and program flow instructions ensure a proper handling of interruptions of the normal instruction stream. This could be caused, for example, by a signal from an external control device requiring a special treatment that means an interruption of the normal instruction sequence. Every processor state is defined or controlled by a set of data stored in the control registers. This data ensures a defined transition from one state to the other (Fig. 9). The instructions are decoded in a binary format. When the control unit of the architecture gets an instruction to execute, it interprets
350
MANFRED SCHLETT
Instruction 1
I-q
Instruction 2
Time
FIG. 9. Basic instruction stages: F, fetch; D, decode; E, execute.
that binary number and changes the processors state accordingly. Usually, an instruction is executed within three steps. The instruction is first fetched, i.e., transferred from a memory location to the control unit; next, the instruction is decoded; finally, the instruction is executed. All instructions are operating on data with a predefined format. An architecture could include several different data formats, for example, 16- or 32-bit integer and floating-point or fixed-point numbers. All these numbers have a binary representation but are interpreted in a different way. If just the binary representation is available, it is not possible to decide if it is a floating-point or an integer number. Only the interpretation of the execution units and the instructions makes the distinction clear. The size of the integer data format and the bus structure connecting different units decide which kind of n-bit architecture it is. On the market we find mainly architectures for n = 4, 8, 16, 32, or 64. In conjunction with this data bus transporting the binary values of instructions and data, an additional address bus is necessary to identify the memory location used to store this data. These features define the basics of an architecture to a very high level. Going more into detail, data paths, and memory locations for the data have to be defined as well as the processing states. The instructions are performed by changing the state of a processor at a certain time in a completely defined way. This is done until the instruction is completed, and all instructions are executed in a sequential defined way. Of course, as illustrated below, several mechanisms have been developed to accelerate the processing procedure of an instruction stream. This could lead to an automatic reorganization of the instruction stream or to parallel execution, but of course the logical result of the program is not affected. To understand the mechanisms of a processor it is important to describe the program flow and how software is executed. The software is separated into tasks, which are a collection of instructions executing certain functionality. The main task is normally used to organize the entire system and the entire program flow. Sub-tasks are started and controlled within the main task. A task could also be a collection of no-operation (NOP) instructions representing an idle task. This is necessary, for example, if the system is waiting and has nothing else to do. Tasks could be interrupted by
EMBEDDED MICROPROCESSORS
351
so-called interrupts or by other tasks (task switch). Interrupts could be interpreted in a very general way, including exceptions caused by instructions or by external devices requesting processing power of the processor. This causes an interruption of the normal program flow and is handled either by software or by hardware. The mechanism for handling this could differ a lot between different architectures. Every task has its own context defined by resources such as operand registers or memory locations, and also by control registers. If a new task is interrupted, either by interrupts or by another more important task, the context of the old task has to be stored, and when the old task is restarted the context has to be restored. Controlling tasks and interrupts is done by the operating system (OS), which makes the OS a crucial part of every system; this is true for PC systems as well as embedded systems.
5.2 Core Implementation Methodologies If we look at the mainstream processors, we find two main realization philosophies: the Reduced Instruction Set Computer (RISC) and the Complex Instruction Set Computer (CISC). The main difference between the instruction implementation methodologies is that RISC processors are based on simple (thus "reduced") instructions operating on a defined set of general-purpose registers, whereas CISC processors feature an instruction set executing more complex operations. In the early 1980s, many publications discussed the pros and cons of both approaches. Although the RISC approach allows much higher performances the CISC architectures currently dominate the market, with the x86 architecture dominating the desktop computer segment and the 68000 the 32-bit embedded segment. However, nearly all new architectures now being developed are based on RISC principles. A typical CISC architecture was IBM's System/360 architecture. As many new concepts have been introduced to the market during the past 25 years, it is now difficult to give clear definitions of CISC and RISC: many features evolved so long ago that the initial ideas are now unclear. Maybe a more useful approach is to list some of the basic features of the first computer implementations. IBM's System/360 architecture is a typical example of the CISC approach. Because of a rapidly changing basic technology it was then very useful to further develop and implement architectures, but to keep the instruction set compatible. The x86 architecture is probably the best-known representative of that philosophy. Software has to run on different implementations without any further work, and this approach led to the concept of microprogramming introduced in 1951 by Wilkes [35]. These microprograms were a collection of basic machine instructions which were
352
MANFRED SCHLETT
invisible to the application software programmer and which included all necessary information to define the processor's states and its transitions. As the programmer only sees this programming interface, the underlying hardware realization could be different. This gave the vendors the opportunity to realize the processor in different versions and therefore create a family concept. The difference of various implementations was basically the execution speed of the instructions. The more expensive the hardware the faster the program was executed, but the logical result was the same. But with the advance of technology and with the development of very efficient compilers, microprogramming was no longer the most effective approach. The introduction of the RISC philosophy was sometimes described as a paradigm change, but if we look at the details, it was more a transition and a further development of the basic concepts. The RISC approach could be seen as a transfer of microprograms to pure software, which could be developed and maintained much more easily. This change from microprograms to simple instruction execution engines opened the door to several new trends and architectural advances. Because many designers called their architecture RISC it is difficult to define exactly what the term means, but typical features of RISC architectures are:
9Simple instructions." By using strictly simple instructions, much higher clock speeds became possible. The execution time of an instruction depends directly on the number of logic gates that have to switch before completion. 9Load/store architecture." All arithmetic and logical instructions operate on fast switching registers; to move data from external storage media to these registers additional move operations are necessary. 9Instruction pipelining." Each step during the execution of an instruction, such as fetch or decoding, is capsulated and can thus be paralleled with the next instruction; for example, while decoding an instruction, the next instruction to be executed could be fetched. 9Fixed instruction length." This was introduced to unify the time needed to decode an instruction. 9 Very high data throughput. One of the most typical characteristics of RISC architectures is the classic five-stage pipeline illustrated in Fig. 10a. Pipelining is a simple mechanism to parallel and thus accelerate instruction execution. As this can be done in different ways, the literature is varied. The control unit of a processor encapsulates certain stages of an instruction, for example, into
EMBEDDED MICROPROCESSORS
(a)
'nstruc"on'F'D'' ] t Instruction 2] F I
(b)
E
....
~A E
Time
I 1:1
Instruction 3
E
Instruction 3 [
,nstruct,on', Instruction 2
353
~ y
Instruction 4
F
R
E
R
E
I
I F I F
R
E
V~
R
E
~A
D
R E
D
R
E
F
D
R
F
D R F
D
F
D
Time
FIG. 10. (a) Basic five-stage RISC pipeline; (b) two-way superscalar pipeline.
five stages: Fetch (F), Decode (D), Operand Read (R), Logic Execution (E), and Result Write Back (W). Every stage is processed by a certain unit, for example, the fetch unit, the decode unit, etc. The execution of the instruction requires the processing of all five stages, but, when the fetch unit is ready with the first instruction, the next instruction can be fetched in the next clock cycle. In this next clock cycle, the first instruction is decoded. The same is done with all the stages. This means at every clock cycle an instruction can be (ideally) fetched, and at every clock cycle one instruction can be completed. Every instruction requires five clock cycles to be fully completed, but the user does not see these five clock cycles. Of course, when a program sequence is started it takes five cycles until the user sees the result of the first instruction. But after these cycles, an instruction is completed at every cycle. Sometimes, special instructions using dedicated units or load and store instructions have different pipeline structures. Thus, for different instructions, the pipelining looks different and includes memory accesses that could require more than one clock cycle. This leads to one of the most important challenges in the definition of an architecture, avoiding and managing socalled pipeline breaks. The execution of an instruction has to be stopped because a previous instruction changes the processor state. This could happen, for example, if an external memory access occurs requiring a nondeterministic amount of clock cycles. If, however, an external memory access is necessary, a refresh of the external D R A M could happen. The access has to wait until the refresh is completed. As the architecture does not know how many cycles that requires, further processing of the instruction execution has to be stopped. Depending on resource conflicts with the next instructions, the next instruction might be further processed or also have to be stopped.
354
MANFRED SCHLETT
As mentioned above, the development of optimizing compilers was crucial for the success of RISC architectures. In the very early days of RISC architectures, there were two main approaches. 9 The Stanford-RISC approach involved keeping the architecture extremely simple as the compiler had to care about resource interlocking. There was no hardware support for a resolution of hardware conflicts and the compiler had to detect in advance possible conflicts and to add NOP instructions. Resource conflicts could happen because of the pipelining concept. When an instruction has to access a certain register, this register may still be in use by one of the preceding instructions. Without the parallelism introduced by pipelining, this would not occur. The archetype of this approach is the MIPS architecture. The idea of the MIPS architecture was to be really reduced and simple. But this approach required more complex or "smart" compilers. 9 The Berkeley-RISC approach offers additional hardware support in the event of resource conflicts. An example of this approach is the Sparc architecture. Interlock mechanisms and context saving mechanisms were implemented to support the programmer and to simplify compiler development. Of course, this approach leads to a more complex hardware design. The load/store principle is also typical for RISC architectures. To avoid bottlenecks when accessing data, operands are stored locally in a register file (see Fig. 11). All arithmetic and logical instructions operate on these registers. Thus, instructions are necessary to load operands from memory to these registers and vice versa. These load and store operations are independent from the other instructions. This approach has two major benefits: 9 the pipelining stages could be held similarly 9 if the result is being reused soon it could be kept in these local register files and no external (slow) memory access is necessary. CISC architectures normally have only a few local registers. These architectures are normally stack-based, thus putting operands into a stack storing the values of the operands in certain memory locations. Depending on the program that has to be executed, both approaches have advantages and disadvantages. The load/store principle causes additional instructions if the result is not further processed immediately, and the register file belongs to the category described above which has to be saved when an interrupt or task switch occurs. The CISC approach has disadvantages when operands
355
EMBEDDED MICROPROCESSORS
General purpose register bank
Functional units
Main
memory
System and control registers
FIG. 11. Register bank as an intermediate fast operand storage medium.
are often reused and immediately further processed. Using the load/store approach allows higher clock frequencies, thus leading to a better performance. To resolve the context saving issue, several RISC architectures introduced mechanisms to avoid extra cycles for saving and restoring the context. This could be done by introducing additional register banks and simple switch mechanisms. Of course, as resources are limited, it still could happen that the context has to be stored on an external stack. A lot of different strategies, such as moving register windows or shadow registers, have been implemented to accelerate the interrupt response time or subroutine calls and returns. Which strategy is best normally depends on the application, and it is difficult to identify one that is generally superior. As mentioned earlier, most of the embedded 32-bit architectures have not been developed with the explicit goal of becoming solely an embedded processor. This is especially true for the 68000 series, the x86 family, the Sparc architecture, and the MIPS architecture. This leads to the interesting question of how embedded architectures and desktop computer architectures differ. In fact, there are only a few requirements that determine if an architecture is useful for embedded applications. This is driven by the requirements of embedded applications such as costs, power consumption, and performance. Basically, increasing the calculation performance as much
356
MANFRED SCHLETT
as possible mainly drove desktop computer applications; embedded processor architectures require a cost-efficient implementation, the integration of peripherals, and a fast processing of interrupts. If we look at the processor core, the main difference is the handling of the interrupts and the exception processing. Because embedded processors are driven by controlling a system instead of accelerating it, the further processing of events becomes a key feature. This directly affects the implementation of the register set introduced earlier. If an interrupt indicating a change in the normal program flow occurs, the context has to be saved and subsequently re-established. But, offering as many registers as possible for temporarily storing the operands can help to increase the arithmetic performance of an architecture. This leads to a trade-off between the number of registers available for a program task and the time needed to save the context. In architectures designed primarily for the embedded market, a typical number of general-purpose registers is 16, representing (together with the control registers) the context of a task. To make a fast context switch possible, many architectures integrate shadow registers allowing an extremely fast response to interrupt requests by a simple switch mechanism. Architectures primarily designed for the desktop or workstation domain feature more registers, allowing nested subroutine calls, but this slows down the task switching time. For example, several architectures integrate a much higher number of general-purpose registers but feature a windowing mechanism. A specific partial block of registers could be addressed from a single routine; by moving this window to the next block and by overlapping these register windows, a fast parameter transfer to the next routine and a fast jump to that routine could be realized. The main difference between desktop and embedded architectures is the focus of the implementation. Desktop architectures focus on fast subroutine jumps, whereas embedded architectures focus mainly on fast interrupt processing. In fact, interrupt processing requires slightly different parameter transfers. In case of an interrupt, the processor jumps to the address associated with the interrupt and stores the context. In case of a subroutine call, additional parameters have to be passed from the calling routine to the new routine. Looking at all the various architecture implementations, different flavors could be identified ranging from single register banks to multiple or windowing register banks. There is no ideal implementation: every implementation has a trade-off between task context saving time and fast exception processing. The implementation of a general-purpose register bank for intermediate data storage is based on the assumption of operand re-usability throughout a program task; otherwise, an intermediate memory would not make sense. Basically, this is true for DSP applications. Typical DSP processors do not
EMBEDDED MICROPROCESSORS
357
feature a general-purpose register bank; the functional units access data directly from the main memory, normally fast on-chip memory. For further reading, [36] is recommended.
5.3
RISC Architecture Evolution
The architectures adapting the RISC approach drove, for example, the cache discussion, superscalar instruction execution extensions, out-of-order instruction execution, branch-prediction, and very long instruction word processing (VLIW). The introduction of caches was necessary and is still very important today as the clock frequency of the microprocessor core itself is much higher than the speed of external memory. Fast on-chip memory was necessary to avoid data bottlenecks between external and internal memory and registers. A cache could be defined as an intelligent RAM storing instruction and/or data needed to ensure fast program flow. The idea behind cache memories is to reduce the number of external memory accesses and to store frequently used data and instructions in a memory block close to the microprocessor core. There are two main approaches: 9unified cache architectures, offering a reasonable speed penalty at low costs 9separated cache structures, offering higher speed but requiring more hardware resources.
Unified caches use the same memory for instructions and data whereas separated caches have two different blocks, one for instructions and one for data, but this makes additional buses and additional memory cells necessary. The cache memory itself is defined basically by the refill strategy and the information elements (associatively). A useful introduction to cache design is given in [37]. A cache memory is not necessarily the best strategy if an extremely deterministic program flow is required. This is, for example, true for DSP architectures [38]. In general, if the program flow is deterministic, a simple on-chip RAM structure could be more useful. If a general-purpose routine has to be executed, a cache structure is often the better solution as a prediction of the program flow is not possible. To further improve the execution speed, several concepts for the parallel execution of instructions have been introduced. Besides the basic pipelining concept, a further parallelism has been added by using superscalar instruction issuing, which means to start more than one instruction in parallel [39]. Figure 10b illustrates the basic pipelining and instruction issue timing of a five-stage two-way superscalar pipelined architecture.
O0
358
="
MANFRED SCHLETT
~
0
rrl X
0
0
Core Function Extensions -H
5.4
0
.m
~.
~..~.
.~
~
~
.
~
o
~
.
~q-~.
~
:_.~.~
ff~
~
~
o
o
,.., ~1~
~
~
I::::~ ~ ~
~ ~
=
~o o ~
~
~
~
o~- ~
Other approaches are VLI W and single-instruction-multiple-data (SIMD) implementations. Basically. VLIW and S I M D are much simpler to realize but require additional compiler optimizations for efficient use. Superscalar approaches require a lot of additional hardware, as, for example, register conflicts could occur during the execution of an instruction. For example, a floating-point instruction may produce an overflow exception, which could not be predicted in advance. Deeply pipelined architectures such as the Digital's Alpha architecture (now Compaq) are affected by that situation. Also, a lot of additional hardware is required in the decode unit. Other implementations such as Hewlett-Packard's PA-RISC architecture VLIW approach require highly optimized compilers to efficiently use the available resources. VLIW and SIMD basically summarize a defined number of operations into a single or long instruction which requires a reorganization of the code during compilation time. The latter approaches are currently renewed by, for example, multimedia and DSP processors such as Texas Instruments' C60 architecture or Chromatic Research's MPACT chip [40,41]. T o use all these architecture extensions efficiently. it is necessary to combine them in the right way. Cache size, superscalarity, branch prediction, and other resources have to be used in a balanced way to end up with a useful architecture. The raw implementation of as many features as possible does normally not lead to fast, powerful, or cost-efficient architectures.
~ ~
~-
~ ' ~ - ~ o ~
~-~
~ 8~ ~ ~ .~
~. ~ =
r
o
~ o ~ ~ ~
o
~
"'*
o
~
0~
t'~
~.
~o
~l~ ~.,
"
~~
"<
Another important but basically very simple method of increasing performance for a dedicated application is to include hardware accelerating a special functionality required in that special application. This leads to ASIC and application specific standard products (ASSP). A very popular example is the integration of a floating point unit (FPU), a recent example of this architecture extension being the integration of DSP or multimedia functionality. In these cases, it is not just a further integration-the processor architecture and core have been modified and extended. The integration of a FPU is very straightforward; it is achieved by adding a special register bank for the operands, and by adding a set of additional instructions that comply with the IEEE 754 binary arithmetic standard, the most popular set of floating-point instructions. For desktop processors, this integration is also standard, but in the embedded domain, only very highend or specialized processors feature a FPU. Basically, only the standard operations for multiplication, division, addition, and subtraction have to be
EMBEDDED MICROPROCESSORS
359
implemented for a floating point support. In the case of DSP and multimedia the integration is far more complex and no standard is applicable. In general, DSP functionality means providing instructions accelerating algorithms known from the DSP area such as digital (finite and infinite) impulse response filters, fast Fourier transforms (FFT), Viterbi decoders, and other mathematical transformations. As well as pure DSP applications, these basic algorithms are used to implement higher level functions such as audio and video algorithms, speech coders, or modem functionality; see, e.g., [42]. One of the most important instructions in that area is the multiplyaccumulate instruction, but integrating that functionality into a general microprocessor does not automatically establish a device capable of running DSP algorithms. Normal embedded systems require the execution of a large general-purpose program with a typically small amount of data, whereas DSP algorithms typically run a small program operating on a huge set of data. This circumstance has led to a completely different way of implementing a DSP processor than for general-purpose processors (see Fig. 12). The block diagrams in Fig. 12 illustrate also that a combined general-purpose/DSP architecture requires a modified or extended instruction and data processing structure. DSP algorithms require a deterministic program flow normally not achievable when integrating big caches used for general-purpose programs. This shows that the simple integration of a multiply-accumulate instruction integrated into a general-purpose instruction set architecture (ISA) is not sufficient for a proper execution of DSP algorithms. To run both types of programs properly, a combined C P U / D S P ISA needs: 9 general-purpose control programs with a non-deterministic/random execution flow and a typically small amount of data 9 DSP algorithms with a very deterministic program flow executing the same small piece of code very often on a huge amount of data. The pure integration of an execution unit in parallel to the standard ALU of the CPU is illustrated in Fig. 13. This concept refers to the simple integration of a multiply-accumulate unit, which is illustrated above, but is not sufficient for real DSP performance. More advanced approaches offer additional data paths and control instructions for the DSP execution units. An example of such an advanced integration of CPU and DSP is Hitachi's SH-DSP architecture featuring additional memory, DSP registers, instruction and data paths, and control functionality (Fig. 14). Other examples are hyperstone's E1-32 RISC/DSP approach, ARM's Piccolo, or Infineon's Tricore.
360
MANFRED SCHLETT
Instruction buffer/ cache
"1
Instruction fetch/decode/ control
Register bank for operands
Arithmetic logic unit (ALU)
(a)
J
Buffer
Instruction memory
Data memory
~
Instruction fetch/decode/ control
Data load/ store unit
DSP unit
(b)
FIG. 12. (a)Block diagram of the instruction and data processing of a general-purpose microprocessor; (b) Block diagram of the instruction and data processing of a DSP processor.
ALU Register bank
DSP unit
FIG. 13. Simple integration of additional DSP instruction execution units into a CPU.
The most common data format of microprocessors is the 32-bit integer data format with the register content normally interpreted as a signed or unsigned 32-bit integer word. But in DSP applications, the most commonly used format is 16-bit fixed-point. In general multimedia applications even 8bit numbers (e.g. graphics) are relevant. Always extending this data to 32-bit
361
E M B E D D E D MICROPROCESSORS
Register bank
Configurable memory (also for instructions)
X-RAM
DSP unit
Y-RAM
L
ALU
DSP registers
FIG. 14. Combined CPU/DSP concept of Hitachi's SH-DSP architecture.
means wasting resources, so to use resources efficiently and to make parallel execution possible, sub-word processing has been introduced; see, e.g., [43]. Sub-word processing allows a very flexible interpretation of the contents of a register; for example, the content of a 32-bit register could be interpreted as 4 8-bit pixels or 2 16-bit fixed-point numbers. The further processing of the content depends only on the used instruction. An instruction operating on 2 32-bit registers can thus, for example, operate on 8 pixels in parallel, or on 4 16-bit fixed-point data, or on 2 32-bit integer values. The designer of such multimedia extensions is very flexible and can theoretically implement the function he wants to accelerate the final target multimedia algorithm. Actually, this concept is very resource efficient from a hardware point of view. Separating a 32-bit multiplier array into subsections could be implemented quite easily and this is even more valid for adders. In case of adders it is just a matter of stopping (or not stopping) the carry mechanism depending on the supported data format of the currently executed instruction. The combination of general-purpose CPUs and DSP processors is not driven by performance considerations. The motivation behind these developments is a further cost reduction and a simplified system design.
5.5
Future Performance Gains
Another aspect of possible future trends is in the case of basic processor core architectures and the improvement of the overall speed. This includes further pushing the achievable clock speed of the core to increase the number of executed instructions per second and further parallelism to increase the number of instructions executed per clock cycle. Both strategies
362
MANFRED SCHLETT
require different approaches and lead to a trade-off. The higher the amount of instructions executed in parallel, the more complex and time consuming becomes the logic to control it. This requires more processing and switching activities per clock cycle and thus makes it difficult to increase the clock frequency. To improve the overall speed, also entire programs could run in parallel. This could be achieved by several processors, but this approach becomes very inefficient as soon as costs are important or the programs have to exchange information or share the same resources. To implement this more efficiently, multithreading processor concepts have been created, enabling several programs and/or threads to run in parallel without the need to store and restore the context all the time. The earliest implementations allowed the execution of only a single thread at a time. The next-generation concept will also allow different threads to access resources at the same time, i.e., within the same clock cycle, so-called simultaneous multithreading architectures [44]. An interesting approach is also to combine dataflow models with multithreading architectures [45]. An architecture approach dramatically increasing the superscalar instruction execution parallelism is the superspeculative architecture [46]. The idea is to aggressively speculate past true data dependencies and harvest additional parallelism in places where none was believed to exist. What happens is that the instruction execution pipeline stages introduced in Fig. 10b are extensively extended. Many more stages are introduced and the pipeline scheme becomes more irregular.
Original code
Compiler
Parallel machine code
Parallelized
code
~
9
I Instruction 2 n-1
Hardware
n-bit instruction bundle ]1 Instruction 1 ][ Instruction 0 I[ Template_ I 0
FIG. 15. Graphical representation of the basic concept behind the EPIC concept of the IA-64 [48].
EMBEDDEDMICROPROCESSORS
363
Closer to commercial products are other approaches combining compiler optimizations with architecture features and a revival of the VLIW and SIMD approaches. A forerunner in that field is the IA-64 architecture, jointly developed by Intel and Hewlett-Packard [47] (Fig. 15). The basic idea is that the compiler scheduling scope is larger than the hardware scheduler, which sees only a small part of the program. To support the compiler, the instruction level parallelism (ILP) is made explicitly visible in the machine code. The architecture is further extended by prediction and control speculation that help the compiler expose and express ILP; see also [48]. A further component of that strategy is to extend the available resources for parallel instruction execution quite substantially.
6.
Embedded Processors and Systems
6.1
Implementation
In Section 5, we introduced the basic architecture and core of microprocessors. The differences between embedded and desktop processors are not huge from that perspective, although the concrete physical implementation will show many differences. The entire processor is now defined by the physical implementation of the instruction set architecture described in Section 5 plus the peripherals integrated into a silicon device. Typical peripherals are general-purpose I/O, analogue-to-digital converters, digital-to-analogue converters, LCD controllers, interrupt controllers, DMA controllers, memory interface controllers, serial and parallel interfaces, caches, and memory-management units. The basic architecture as introduced above is normally encapsulated as a so-called p r o c e s s o r core for further integration into an entire physical device. A basic core extends the above definition by additional interfaces between the core and the special function executing peripherals of a final processor. These additional peripherals are, for example, memory interfaces, etc. as shown in Fig. 16. Figure 16 shows additional buses outside the basic processor core: the external bus interface and I/O pins. These buses connect additional modules to the core. Although these additional modules are one of the main differences between different processor classes, in fact these peripheral functions are at present the factors that most differentiate between processors from different vendors and have become more and more system specific. The system description introduced in Section 4 leads to different implementations of the representatives of the processor classes. The
MANFRED SCHLETT
364
MMU ]--
r.. cache
]---
t --I
SH-3CPU core Interrupt controller
w]
.,]-
.?,lock Serial
I/O I
--] Bus controller ]---DMA •.•controller
Externalbus interface
--[ Timer I/O pins
FIG. 16. Block diagram of the SH7709 embedded processor.
embedded processor outlined in Fig. 16 differs substantially from computer processors. Tables II and III give an overview of the differences. Probably one of the biggest differences between embedded and desktop microprocessors is the fact that an embedded processor has to be adapted to a wide range of applications, whereas a PC or workstation processor is optimized for a single application. From a time-to-market point of view, this leads to a far more modular approach allowing a much quicker adaptation
TABLE II COMPARISON OF EMBEDDED CONTROLLERS/PROCESSORS AND COMPUTER PROCESSORS
Feature
Embedded controller and processor
Computer processor
Device costs Interrupt behavior Data throughput Performance High-level OS support Driving external peripherals Data format Serial and parallel I/O Power consumption System costs
Low Fast Low to medium Low to medium Not essential Essential Integer Essential Low to medium Low
Medium to high Normal High Medium to high Essential Not necessary Integer, floating-point Not necessary High High
EMBEDDED MICROPROCESSORS
365
TABLE Ill A COMPUTER PROCESSOR AND AN EMBEDDED PROCESSOR COMPARED: A SPECIFIC EXAMPLE [31] Feature
Intel 486
IDT R36100 (MIPS)
Internal bus width Cache memory Prefetch queue Floating-point unit Memory management unit Communication controller DMA controller Timers Memory controller P1284 parallel port I/O ports
32-bit 8 kB unified Yes Yes Yes No No No No No No
32-bit 4 kB instr./8 kB data No No No Yes Yes Yes Yes Yes Yes
to market trends and needs. In a highly optimized desktop processor such as the Intel Pentium, Sun's Ultrasparc, or Compaq's Alpha architecture, the peripheral modules are optimized for very high data throughput instead of modular flexibility. This very customized approach prohibits an easy adaptation for other applications as all modules are linked and combined, compared to the modular approach of embedded processors. In addition, the lifetime of a desktop processor is much shorter than the lifetime of an embedded processor, the best example of that difference being the 68000 family of Motorola, which is still being used in embedded applications. In desktop computers, this device was replaced years ago by the PowerPC architecture.
6.2
Differentiating Factors and Implementation Trends
The evaluation of embedded microprocessors could be done in different ways. The first way is to have a closer look at features of the embedded microprocessor only. Based on the considerations about embedded systems above the following list is suitable: 9 9 9 9 9
device and system costs level of integration (peripherals, memory) arithmetic/calculation performance system control capabilities power consumption
366
MANFRED SCHLETT
9 core availability for further integration 9 scalability and modularity 9 software and system support. Most of the current trends in embedded microprocessor design focus on the improvement of one or more of the above features, but of course there is a trade-off between the individual factors. For example, increasing the performance by increasing simply the clock speed automatically leads--if the other parameters remain the s a m e - - t o a linear increase in power consumption. Or, using additional hardware to accelerate the calculation speed of a software algorithm will lead to higher costs. The purpose of this list is to demonstrate that raw performance is only one feature of an embedded microprocessor. It shows that the pure core functionality becomes more and more unimportant and maybe explains the relatively stable number of 32-bit instruction set architectures on the market. Today, it is very expensive to establish a new architecture, with the development of a new architecture accounting for only a minor part of the overall costs associated with its successful introduction. Design trends in the embedded processor market differ substantially from trends in the computer or desktop market. The trends introduced in the previous paragraph reflect the current situation in the computer segment. Of course, several of these trends can and will also influence the design of future embedded processors, but because of cost and power consumption restrictions, the influence will be much smaller than that of the design of computer processors. For example, superscalarity was introduced to the computer segment processor in the early 1990s, but in the embedded segment, superscalar microprocessors have just been introduced, and are clearly oriented towards the high-end embedded market. This has to be taken into account when transferring Moore's law into the embedded field. The following trends can be seen in today's embedded processor design: 9 higher code density to reduce the system costs 9 low power consumption to make devices suitable for mobile applications and to increase battery-operated running time 9 system cost optimization by integrating specific application oriented peripherals such as LCD controllers 9 higher system performance for specific market needs such as for video game consoles or multimedia performance 9 integration o f memory into the device 9 availability as a core for high volume ASIC or ASSP design.
EMBEDDED MICROPROCESSORS
367
Increasing the code density results in lower costs for system memory. There are two main ways to improve code density: 9 to optimize the instruction code right from the beginning for code density by using, for example, a reduced instruction length 9 to compress the instruction code externally and to decompress the code through an on-chip decompression unit. The reduction of the overall code size is mainly driven by resource-intensive software such as high-level operating systems or cellphones which still require expensive SRAM memory devices. With the rising popularity of mobile equipment, the need for devices with very low power consumption becomes obvious. The challenge of lowpower design is mainly to achieve a reasonable performance with minimum power consumption. For a more extended introduction into that subject see, e.g., [49]. This has led to the popular MIPS/Watt performance ratio instead of measuring the pure MIPS number. Unfortunately, several processor vendors have started to market the processor core's M I P S / W a t t ratio. This approach does not make too much sense, as the driving capabilities of the processor are essential in embedded systems. Comparing embedded architectures and standard components on the market, Intel's StrongARM devices (developed and first marketed by Digital Equipment Corporation) took the lead in that area. Optimized for a very specific fabrication technology, these devices outperformed the competitors in MIPS/Watt performance for years. Other vendors followed with devices offering the same MIPS/Watt ratios. Basically, it is difficult to compare devices by the simple MIPS/Watt benchmark because the instruction sets differ. However, today's embedded microprocessors include several mechanisms to reduce the overall power consumption by offering even higher clock frequencies. Every module of a modern embedded microprocessor could be set into a module standby mode, the processor core itself can go into standby mode, and even a complete hardware standby mode is possible. To enable different power-on sequences after starting a standby sequence, additional different standby modes are implemented. In this context, the design of a proper clock tree becomes crucial. Another important way of improving power consumption is a further reduction of the power supply voltage level and a further technology shrink. Today's most advanced embedded microprocessors require only a 1.5 V power supply. The current target of the microprocessor vendors is to achieve 1000 MIPS/Watt. However, in the next few years, a further improvement to 2000-3000 M I P S / W a t t will be achieved. This scenario shows that the VLSI chip introduced in Table II will not be the typical solution for a
368
MANFRED SCHLETT
future embedded system. The main concern when looking to the data projected for the year 2005 is power consumption. To make efficient use of the leading edge technology of the year 2005, it will be necessary to find improved methods of saving power consumption. It is very likely that we will see further decreased core voltages using technology developed specifically for the embedded world with a focus on mobile computing. Another important challenge in embedded microprocessor design is the efficient integration of further components. For example, at the moment we can see a trend to applications that make intensive use of the graphical user interface (GUI), and thus the integration of display controllers becomes attractive. Besides the integration of additional logic modules, the integration of embedded DRAM or memory in general becomes important. Integrating memory is important for two reasons. 9 it enables the implementation of single-chip systems 9 it would reduce dependency on the rapidly changing DRAM business, driven by the PC industry. Unfortunately, the lifecycles of PC components differ substantially from the lifecycles of embedded system applications. To reduce that dependency, the integration of several megabytes of memory would be necessary. An early example of an embedded microprocessor offering a huge amount of embedded D R A M is Mitsubishi's M 3 2 R / D 32-bit processor [50]. So far, the integration of D R A M into a standard embedded microprocessor is not mainstream; die-size is still expensive. But, embedding memory into a device is already standard for ASICs. This introduces the next important design challenge, the availability as a processor core for further integration into an ASIC or ASSP. ASIC design differs a great deal from the design of a standard component. When designing an ASIC, a very short turnaround time is required. The focus is not on very high performance and highly optimized design; the design time itself is crucial. Fast and easy adaptation of a core (if possible a fully synthesizable core) is fundamental for ASIC design. The entire design flow has to be open right from the start. This is still different when designing a highly optimized standard embedded microprocessor, but it is very likely that all newly developed embedded cores will be considered right from the start as ASIC cores. Last but not least, in the embedded domain a trend to higher performance devices can also be seen. Some markets, such as video game consoles, require an everincreasing processor performance. This will result in an adaptation of the new trends of the computer processor segment, but the trends described in the previous paragraph will be reviewed before adapting them to the embedded world.
369
EMBEDDED MICROPROCESSORS
6.3 System Approach In terms of current market requirements, it is extremely important for a semiconductor vendor to offer complete solutions. The ability to build an entire system based on a certain processor becomes a crucial factor, and in that area, one key feature of standard embedded processors is the glueless connection of memory and other peripherals. In this case, the embedded microprocessor can drive external memory and peripheral devices without any further logic. To further improve the position of a standard embedded processor, additional devices are developed to support the entire system function. The level of integration depends only on cost considerations and technical feasibility. Making a system completely made in silicon necessitates physically implementing several versions to end up with a scalable system. This means that a user could implement a system in different flavors without putting a huge effort into the design. In summary, the level of integration is driven by the forecast volume of the target market: this means it could make sense to develop a two- or three-chip solution instead of integrating the entire system into a single device. This approach is also well known from the PC area, where the basic CPU is combined with a chip set supporting the necessary interface and bridge technology to end up with a complete PC motherboard. When implementing this system with different chips, it is possible to implement a high end system by simply adding a faster or more advanced CPU; then, if the volume is high enough, the design of a single device including all chips could offer tremendous cost advantages. One of the most popular examples of this system approach comes from the PC companion area where a simple configuration has been used to implement a basic functionality (Fig. 17). This functionality has been integrated into a few chips: then, by replacing some of the support chips by
RAM
Debug touchscreen UART
HD64461 (LCD controller, 4 1 ~ additional I/O)
SH7709
I
i
,,.
RAM
FIG. 17. PC companion system block diagram.
Display PC card Modem
370
MANFREDSCHLETT
more advanced support chips, a complete set of low-end to high-end systems can be created while keeping the design effort small. Other examples of these system solutions are car information systems or set-top boxes. Additional support chips are designed to support the main processor if there is a need for resource-intensive functionality such as video decompression. Quite often, processor vendors introduce ASSPs for a dedicated application; normally these ASSPs include specific peripherals, allowing a straightforward and simple system design for a very dedicated application. In the case of the PC companion example, developing an ASSP would mean integrating the companion chip and the microprocessor into a single-chip device, allowing a simple and straightforward system design, but normally such ASSPs are not practical for other applications. Another disadvantage is that system designers lose flexibility and scalability. The PC companion example indicates also that a certain standardization could be achieved in the embedded domain. A unified standard embedded platform seems to be impossible because of the huge amount of different applications, but every market segment has certain requirements. Standardization in the embedded domain is not only driven by hardware manufacturers; it is also a software issue, and as indicated already, standard embedded software is not yet available as too many platforms are on the market. Hence, vendors of embedded software focus on specific platforms. Standardization is thus heavily driven by software vendors such as OS vendors, and in order to extend the use of a specific software package it is necessary to unify the hardware platforms up to a certain level, but of course this is only possible for a dedicated market segment. A robot control engine does need different I / O functions than a PC companion, but at least for every market segment, a possible standard embedded platform seems to be possible. This standardization effort is discussed further in the next sections.
6.4
Scalable Systems and Classes of Embedded Processors
Embedded controllers and processors have to match the target application requirements as well as possible, but what does that mean? For example, to establish a hand-held PC it is necessary to offer a solution for running a monitor user interface, a keyboard, and a modem. In addition, because it is a mobile application, the power consumption should be as low as possible, while still offering a reasonable overall performance. As the PC companion market is rapidly growing, manufacturers are expected to offer also palmtop PCs, sub-notebooks, and maybe even communicators integrating a hand-held PC and a cellphone. These companion devices
371
EMBEDDED MICROPROCESSORS
require different CPUs in terms of costs, performance, and functionality, and therefore it is not possible to use a single microprocessor for all these applications. Semiconductor vendors react to that requirement by offering families of upwardly compatible devices, while integrating different features and peripherals This approach differs considerably from the family concept introduced in the computer sector. The processors of computer systems have to be strictly upward compatible from a software and system perspective. In the case of embedded processors, the upwards compatibility is not as strict as in the computer sector, so when switching from one embedded processor to another one, it is sufficient to provide a basic core instruction set compatibility. Because of cost optimizations, it is normally not necessary to maintain a system upward compatibility. In terms of the PC companion block diagram (Fig. 17), it is necessary to modify the peripherals and this results in the need to redesign several system components and parts of the software. Scalability and the requirements of embedded applications have fostered new classes of embedded microprocessors. The product range of today's embedded microprocessor and microcontroller families includes a set of instruction set compatible devices (Fig. 18). These devices cover a performance range of 10-400 MIPS and will soon reach the 1000 MIPS Focus on DSP, graphics, and multimedia / ' ~ A
,i or) 13..
11~
Additional units for high performance Focus on control functionality and peripherals
Low-end embedded controllers
nced embedded controllers, peripherals, multipliers,..
Embedded controllers with DSP functionality
Typically integrate RAM and ROM
bedded processors with hardware multipliers
Highperformance embedded processors with DSP functionality
Very highperformance embedded processors with additional multimedia capabilities
Typically integrate caches and RAM
FIG. 18. Today's classes of embedded microcontrollers and microprocessors [51].
372
MANFRED SCHLETT
barrier. The main players in this area are SuperH, MIPS, ARM, and PowerPC derivatives. Most of the vendors of devices based on these architectures now offer standard derivatives belonging to the classes shown in Fig. 18. To maintain upwards compatibility, each vendor normally focuses on a single architecture. Compatibility means providing instruction set upwards compatibility, not system upwards compatibility, excluding mainly the I/O functionality. Traditional compatibility as introduced in the computer sector in the early 1970s included even system compatibility and thus maintaining I/O compatibility. The instruction set upwards compatibility is achieved by reusing the basic embedded processor core, as described above. To make a family concept possible, vendors of embedded processors use the concept illustrated in Fig. 19. The basic embedded processor core is re-used and at most extended by introducing new instructions, but the instruction set is a superset of the initially defined set. To make scalability possible, the core support modules, such as caches, could be modified by, for example, increasing or decreasing the amount of cache memory. To enable the dedicated use of the processor in special applications, the I/O block has to be modified and adapted to the specific target application.
6.5
Embedded Software and Standardization Efforts
As discussed above, the market for embedded software is more fragmented than the PC market. Whereas the desktop is ruled by a few operating systems, the embedded operating system market is still a collection of several platforms supporting many different applications with varying requirements. The biggest competitor for embedded standard operating systems is still in-house developed software. However, it is becoming more and more expensive for companies to support all new classes of embedded processors in-house, i.e. on their own. A situation comparable to the PC industry might occur in the future; the situation in the embedded
Core support ] ~ modules (cache, etc.) I/O block
~
Embedded processor core Core instruction set extensions
FIG. 19. Embeddedprocessorconcept for upwardcompatibility.
EMBEDDED
373
MICROPROCESSORS
arena is still a long way from being decided. Several OS vendors have introduced new and efficient tool suites for embedded applications, focusing on the major architectures. Microsoft, for example, introduced the embedded OS Windows CE (Fig. 20). First introduced for hand-held PCs, this OS is now striving to succeed in the general embedded domain. Other vendors of embedded standard OSs include Wind River Systems, Integrated Systems, Accelerated Technologies, or Microware. Sun's Java initiative and concept could also be interpreted as an effort to standardize embedded software. By allowing software development to be platform independent, software could be re-used even if the hardware platform was completely changed. The embedded domain is still some distance from being standardized in a similar way to the PC motherboard market, but hand-held PCs might show the way ahead. The basic hand-held PC platform running Windows CE has been standardized to allow users of hand-held PCs the ability to download software from the internet from different vendors and to run the software on different hand-held PC platforms. As different hand-held PCs might integrate different embedded microprocessors, it is necessary to download the specific version of the software, but the vendor can simply re-compile the software and provide the specific software for another hand-held PC platform. This is possible because Microsoft standardized the basic hand-held PC platform, and referring to the previous section, we can see that this means using the same I/O functionality, which enables programmers simply to use the relevant application programming interface (API). By using such a standard API and configuration, software
Independent software vendor
Windows CE application Add-on technologies Core system interface
I
!
WindowsCE desktop
File system
Communications
Graphical
Device manager
Microsoft Kernel
windows
environment OEM adaptation
layer
OEM
I
Input and graphics drivers
Built-in and installable drivers
OEM hardware
FIG. 20. Overview of the main Windows CE components.
I
374
MANFRED SCHLETT
could become widely re-usable across platforms and even applications. As this would reduce the costs for new developments drastically, this standardization seems to be possible, especially, because today's embedded processors offer a much better performance/cost ratio than before. This cost advantage could compensate for the overhead coming from that standardization. The OS discussion is also driven by the availability of standard libraries supporting graphics, multimedia, or other special parts of the entire application. To summarize, the design of an entire embedded system not only requires a silicon device, it is also necessary to run an operating system together with application software. To minimize costs, a much higher re-usability of software and hardware components will be necessary in the future. An embedded system is no longer defined solely by the basic microprocessor; instead, the system approach and the system solution have become increasingly important. This also includes embedded operating systems and libraries and makes possible standard embedded platforms, similar to those in the PC industry.
7. The Integration Challenge Due to the tremendous advances in process technology, many more logic functions can be integrated into a single chip than in the past. Starting with simple logic equations or gate arrays in the early days, it is possible to integrate an application-specific functionality into a quite cheap device. Thus, in the mid 1980s, ASICs became very popular. It became possible to design a system with basically four blocks: microprocessor, memory, ASIC, and the context described above. The next step was to integrate the central microprocessor and the ASIC into a single chip. The microprocessor cores available for such further integration offered different levels of hardware optimization capabilities. Initially, so-called hardcores, which could not be changed, were used; but today we see softcores, which can be redesigned, offering greater possibilities of optimizing the design and thus to reducing the size and cost of the final ASIC. The next step in further integration is the system-on-a-chip (SOC) approach, sometimes also called system-on-silicon. The difference between all these developments is the level of integration. It is possible to interpret SOC devices as very complex ASICs. This definition fits the technical integration possibilities LSI, VLSI, and ULSI described in Table IV. Figure 21 illustrates the transition from a PCB layout to the far more complex SOC, integrating more and more the entire PCB functionality into a single chip device.
EMBEDDED MICROPROCESSORS
375
TABLE IV ASIC DEFINITION BY TECHNOLOGY AND COMPLEXITY Device
Process
Complexity/integration
ASIC
LSI
Complex ASIC
VLSI
Very complex ASIC/SOC
ULSI
Simple logic equations up to interface controllers and dedicated functional blocks; up to several hundred thousand gates Microprocessor core integration, additional hardware acceleration units; up to several million gates Microprocessor core, logic, memory, analogue functions, single-device-system solution; more than 10 million gates plus analogue system functions
m
!
I ROM
CPU
I S,C
[
DSP Mixed signal
Connector
<>
CPU core
r "1o o.!
o
I/0 )ads I
[
Memory
DSP
"O
I
core
Mixed signal
User logic
O
I/O 3ads
FIG. 21. Transition from PCB to SOC layout [52].
An example of a typical SOC is the camera-on-a-chip device also integrating the CCD sensor technology, allowing the integration of parts of the context; see, e.g., [53]. Another example is the scanner-on-a-chip system introduced by National Semiconductor integrating an analogue front-end, sensor clock generation, motor control, data buffering, parallel port interface, processor core, all in a dime-sized area, illustrated, e.g., in [52]. SOCs typically integrate intellectual property (IP) of different vendors. For example, the DSP and the microprocessor core could be sourced from different vendors. This situation, together with the amount of transistors to be integrated, and a new approach to testing and verification, will be the most important challenges in SOC design; see, e.g., [54]. The result is likely to be a very modular approach for the design of SOC chips, with every module having a test access mechanism compliant with the IEEE 1149.1 standard. A new standard--IEEE P1500-for embedded core test is also under development. Unfortunately, the testing
376
MANFRED SCHLETT
complexity index introduced by the Semiconductor Industry Association International Roadmap for Semiconductors introduces a non-linear and significant increase of transistors per pins in future SOC designs. This will result in a significant exacerbation of the test process. Clearly, the volume of test data needed for a certain level of fault coverage grows with the transistor count of the SOC under test. Besides testing and debugging, the overall design process is still not a simple matter of pressing a button. As the yield of working devices on a wafer still depends on the die-size of the device, it is crucial to keep the total size of the chip as small as possible. This is especially true for the very costsensitive embedded world. After the specification of an architecture, the core and the peripherals of a microprocessor are designed using logic equations describing the functional behavior. The description is normally done with languages such as Verilog or VHDL. These languages allow the utilization of electronic design automation (EDA) tools, simplifying the design process drastically, but there is still a lot to be done manually in order to optimize a design. In the area of place and route of logic blocks there are still many open questions. There are several steps during the process of designing a microprocessor that have to be done manually as automatic synthesis is still expensive in terms of die-size and thus costs. When designing a SOC with different logic from different vendors, it is necessary to standardize the interfaces and the on-chip buses. Without such standardization it would be necessary to redesign the interface blocks each time. But, from a commercial point of view, it is absolutely necessary to have a standard to allow the reuse of logic blocks from different vendors. In the fragmented embedded world it will be very difficult to unify the requests of the individual users, as all bus systems and interface controllers differ in speed, bandwidth, and general capabilities. To summarize: on the one hand, an improved design automation process that can handle the increased complexity, and on the other hand, manufacturing test access need to be improved further to make SOCs standard components of embedded systems. In addition, further advances in automatic design tools are necessary to allow widespread utilization of SOCs.
8.
Conclusion
We have seen in this chapter that the design of microelectronic systems has changed dramatically during the past 50 years. Starting with small discrete components in the 1950s, today we integrate entire systems on a
EMBEDDED MICROPROCESSORS
377
single piece of silicon. But nevertheless, the basic principles are still the same. The computer of today is still based on principles we used 50 years ago, but the design challenges changed dramatically. Then we focused very much on the basic enabling technology and improving the basic functionality; today, the system idea is the relevant factor. A single basic functionality and its improvement is not as relevant now, as it was in the past. The optimization of the entire system and the well-balanced interaction of all components are the main subjects of today's microelectronic system design. Looking at different market sectors, such as PC or embedded systems, we can see several similar developments. The embedded system market is, however, driven by completely different system requirements. Costs, power consumption, integration, and performance are quite different from those of conventional computer systems. In terms of technological advances, principles derived 50 years ago are still valid. The main difference when designing an embedded system is the system integration challenge. Today, the integration of peripheral blocks is as important as designing a leading-edge embedded microprocessor core. The computer segment has a dominant vendor and processor architecture, but with all the different application requirements and possible system solutions, it is hard to believe that there will be a single 32-bit embedded architecture which will be used in, say, 80% of all 32-bit embedded system applications. We can also see in the embedded arena a route to further standardization, and because of the tremendous technological improvements, the overhead of this standardization might even vanish in the cost-driven embedded world. At least, this trend offers the possibility of completely changing the embedded system world, which has so far been very fragmented.
REFERENCES [1] Brey, B. (1999). 8086/88, 80186/80188, 80286, 80386, 80486, Pentium, Pentium Pro, Pentium H Processor." Architecture and Programming, 5th edn, Prentice Hall, Englewood Cliffs, NJ. [2] Weiss, S. and Smith, J. (1994). POWER and PowerPC." Principles, Architecture, Implementation, Morgan Kaufmann, San Matteo, CA. [3] Siles, R. (ed.) (1998). Alpha Architecture Reference Manual, 3rd edn. Digital Press. [4] Bhandarhar, D. (1996). Alpha Implementations and Architecture. Complete Reference and Guide, Digital Press. [5] Weaver, D. and Germond, T. (1994). Sparc Architecture Manual, Version 9, Prentice Hall, Englewood Cliffs, NJ. [6] Kane, G. (1995). PA-RISC 2.0 Architecture, Prentice Hall, Englewood Cliffs, NJ. [7] Kane, G. and Heinrich, J. (1991). MIPS RISC Architecture, 2nd edn, Prentice Hall, Englewood Cliffs, NJ. [8] Wilson, J. et al. (1998). Challenges and trends in processor design. Computer, January, 3950.
378
MANFRED SCHLETT
[9] Patterson, D. (1998). Vulnerable Intel. New York Times, 9 June. [10] Turley, J. (1997). Embedded Microprocessors for Embedded Applications, Micro Design Resources, Seminar Microprocessor Forum, San Josem. [11] Jagger, D. (ed.) (1997). ARM Architecture Reference Manual, Prentice Hall, Englewood Cliffs, NJ. [12] Hasegawa, A. et al. (1997) SH3: High code density, low power. IEEE Micro 16(6), 11-19. [13] Miller, M. (1992). The 68000 Microprocessor FamiOv Architecture, Programming, and Application, Merrill Publishing Co. [14] Segars, S. et al. (1995) Embedded control problems, Thumb, and the ARM7TDMI. IEEE Micro 15(5), 22-30. [15] Intel Corporation (1995). 1960 Processors and Related Products. [16] Dolle, M. and Schlett, M. (1995) A cost-effective RISC/DSP microprocessor for embedded systems. IEEE Micro 15(5), 32-40. [17] Diefendorff, K. and Dubey, P. (1997) How multimedia workloads will change processor design. Computer 30(9), 43-45. [18] Crisp, J. (1998). Introduction to Microprocessors, Newnes, London. [19] v. Neumann, J. (1944). The first draft of a report on the EDVAC, Moore School of Electrical Engineering, University of Pennsylvania, June 30 (republished in the IEEE Annals of History and Computing (1993). [20] Burks, A., Goldstine, H. and v. Neumann, J. (1964) Preliminary discussion of the logical design of an electronic computing instrument, report prepared for U. S. Army Ordnance Dept. (reprinted in Datamation 8(9), 24-31, September 1962 (Part I) and in Datamation 8(10), 36-41, October 1962 (Part II)). [21] Hennessy, J. and Patterson, D. (1996). Computer Architecture--A Quantitative Approach, 2nd edn, Morgan Kaufmann, San Matteo, CA. [22] Faggin, F. (1972). The MCS-4--an LSI microcomputer system, Proceedings of the IEEE Region Six Conference, IEEE. [23] National Technology Roadmap for Semiconductors (1997). Semiconductor Industry Association, San Jose, CA. [24] Chappell, B. (1999). The fine art of IC design. IEEE Spectrum 36(7), 30-34. [25] Malone, M. (1995). The Microprocessor." A Biography, Springer Verlag, Berlin. [26] Tredennick, N. (1996) Microprocessor based computers. Computer 29(10), 27-37. [27] Weste, N. and Eshragian, K. (1985). Principles of CMOS VLSI Design, Addison-Wesley, Reading, MA. [28] Ball, S. (1996). Embedded Microprocessor Systems. Real World Design, Butterworth, Oxford. [29] Heath, S. (1997). Embedded Systems Design, Butterworth-Heinemann, Oxford. [30] Goanhar, R. (1992). The Z80 Microprocessor: Architecture, Interfacing, Programming, and Design, 2nd edn, Merrill Publishing Co. [31] Khan, A. (1996) Workhorses of the electronic era. IEEE Spectrum 33(10), 36-42. [32] Predko, M. (1998). Handbook of Microcontrollers, McGraw Hill, New York. [33] Shriver, B. and Smith, B. (1998). The Anatomy of a High-Performance Microprocessor--A Systems Perspective, IEEE Computer Society Press, Los Alamitos, NM. [34] Faggin, F. (1996) The microprocessor. IEEE Micro 16(5), 7-9. [35] Wilkes, M. (1951). The best way to design an automatic calculating machine, Manchester University Computer Inaugural Conference. [36] Heath, S. (1995). Microprocessor Architectures." RISC, CISC, and DSP, 2nd edn, Butterworth-Heinemann, Oxford. [37] Smith, A. (1997). Cache memory design: an evolving art. IEEE Spectrum, Vol. 34, pp. 40-44. [38] Lapsley, P. et al. (1996). DSP Processor Fundamentals, Berkeley Design Technology, Fremont, CA.
EMBEDDED MICROPROCESSORS
379
[39] Johnson, M. and Johnson, W. (1991). Superscalar Microprocessor Design, Prentice Hall, Englewood Cliffs, NJ. [40] Weiss, R. (1999). Advanced DSPs to deliver GigaMIPS for SBCs. RTC Europe, April, 5370. [41] Purcell, S. (1998) The impact of Mpact 2. IEEE Signal Processing Magazine 15(2), 102107. [42] Bellanger, M. (1989). Digital Processing of Signals, Wiley, Somerset, NJ. [43] Conte, T. et al. (1997) Challenges to combining general-purpose and multimedia processors. Computer 30(12), 33-37. [44] Eggers, S. et al. (1997) Simultaneous multithreading: a platform for next generation processors. IEEE Micro 17(5), 12-19. [45] Kavi, K., Lee, B., and Hurson, A. (1998). Multithreaded systems. Advances in Computers 46, 288- 328. [46] Lipasti, M. and Shen, J. (1997) Superspeculative microarchitecture for beyond AD 2000. Computer 30(9), 59-66. [47] Gwennap, L. (1997). Intel, HP make EPIC disclosure. Microprocessor Report, 27 October, 1,6-9. [48] Dulong, C. (1998) The IA-64 at work. Computer 31(7), 24-32. [49] Forman, G. and Zahorjan, J. (1994) The challenges of mobile computing. Computer 27(4), 38-47. [50] Nanomura, Y. et al. (1997) M32R/D--integrating DRAM and microprocessor. IEEE Micro 17(6), 40-48. [51] Schlett, M. (1998) Trends in embedded microprocessor design. Computer 31(8), 44-49. [52] Martin, B. (1999) Electronic design automation. IEEE Spectrum 36(1), 57-61. [53] Fossum, E. (1998) Digital camera system on a chip. IEEE Micro 18(3), 8-15. [54] Zorian, Y. (1999) Testing the monster chip. IEEE Spectrum 36(7), 54-60.
This Page Intentionally Left Blank
Author Index Numbers in italics indicate the pages on which complete references are given.
A Abramowitz, M., 112, 154 Abramson, N., 55, 88 Agerwala, T., 34, 87 Akamine, S., 167, 187 Akl, S., 197, 263 Alexandersson, J., 280, 326 Allis, V., 260, 266 Amalberti, R., 293, 327 Amano, S., 175, 188 Amdahl, G., 29, 87 Anantharaman, T., 200, 230, 233, 263, 265 Appel, A., 250, 251,266 Arakawa, A., 106, 116, 127, 129, 130, 154, 155 Arnold, D.J., 166, 187 Arnold, P., 71, 89 Aspray, W., 5, 7, 10, 16, 21, 23, 24, 25, 26, 28, 30, 85, 87 Atkin, L., 196, 200, 263 Aust, H., 269, 325 Austrian, G., 5, 85
B
Bach, M., 40, 76, 88 Backus, J., 60, 88 Baer, F., 112, 118, 121, 144, 154, 156 Baggia, P., 276, 280, 326 Baker, T., 61, 89 Ball, S., 342, 378 Bar Hillel, Y., 162, 163, 187 Baran, P., 46, 47, 88 Barnard, C., 75, 90 Barrett, D., 80, 90 Barry, J., 34, 87 Bass, L., 74, 90 Bates, J.R., 135, 136, 155, 156 Baum, E., 202, 263
Baxter, J., 207, 212, 264 Beal, D., 199, 207, 250, 263, 264, 266 Bell, G., 32, 87 Bellanger, M., 359, 379 Ben-Ari, D., 174, 188 Benbasat, I., 76, 90 Bengtsson, L., 94, 153 Bennacef, S., 280, 326 Bergin, T., 59, 88 Bergthorsson, P., 139, 156 Berlekamp, E., 218, 264 Berliner, H., 201,211,230, 263, 264, 265 Berners-Lee, T., 15, 86 Bernsen, N.O., 268, 274, 280, 281,282, 325, 326, 327
Berry, D.M., 174, 188 Bertenstam, J., 280, 326 Bhandarhar, D., 331,377 Billings, D., 207, 209, 243, 245, 248, 264, 266 Bird, P., 16, 86 Black, D., 43, 88 Blomberg, M., 280, 326 Boehm, B., 69, 89 Boehm, C., 61, 89 Boggs, D., 54, 55, 88 Boitet, C., 166, 172, 187, 188 Bonneau-Maynard, H., 280, 326 Booch, B., 71, 89 Bossemeyer, R.W., 269, 325 Bourbeau, L., 175, 188 Bourke, W., 122, 155 Boves, L., 269, 280, 315, 325, 326 Bowen, J., 71, 89 Brey, B., 331,377 Brooks, F., 37, 87 Brown, A., 73, 90 Brown, P., 177, 188 Brudno, A., 193, 262 Bub, T., 280, 326 Buizza, R., 150, 157
381
382
AUTHOR INDEX
Burks, A., 338, 378 Buro, M., 199, 237, 238, 239, 240, 241,263, 265, 266
Buxton, J., 61, 88
Cadow, H., 37, 87 Cameron, J., 70, 89 Caminer, D., 16, 86 Campbell, M., 198, 200, 230, 233, 263, 265 Campbell-Kelly, M., 5, 7, 10, 16, 21, 23, 24, 25, 26, 28, 30, 85 Carbonell, J.G., 167, 187 Carbonell, N., 293, 327 Carlson, R., 280, 326 Caro, M., 243, 266 Case, R., 30, 87 Cerf, V., 50, 56, 57, 88 Ceruzzi, P., 9, 16, 18, 19, 20, 23, 24, 25, 26, 27, 28, 31, 42, 86 Champy, J., 14, 86 Chappell, B., 340, 378 Charney, J., 92, 144, 153, 156 Chase, L., 275, 326 Chaudhri, A., 65, 89 Choi, J., 80, 90 Chomsky, N., 164, 187 Churcher, G.E., 277, 326 Clark, D., 50, 56, 57, 88 Clarke, g., 80, 90 Clements, P., 74, 90 Clingen, C., 38, 87 Cocke, J., 34, 87 Codd, E., 64, 89 Cohn, S.E., 142, 156 Comer, D., 57, 79, 88 Comrie, L., 21, 86 Condon, J., 230, 265 Constantine, L., 70, 89 Conte, T., 361,379 Corbato, F., 38, 87 Cortada, J., 5, 16, 85, 86 Cote, J., 135, 136, 156 Cotton, W.R., 145, 156 Courant, R., 97, 154 Coutier, P., 142, 156 Craigen, D., 71, 89 Cressman, G.P., 139, 141, 156
Crisp, J., 337, 378 Crocker, D., 15, 86 Crosby, P., 14, 86 Culberson, J., 225, 265 Cullen, M.J.P., 124, 125, 155 Cullingford, R.E., 167, 187
D
Dailey, D., 206, 264 Daley, R., 38, 87, 142, 156 Davenport, T., 14, 86 Deaven, D.G., 146, 156 de Bruin, A., 196, 198, 201,263 del Galdo, E.M., 276, 326 den Os, E., 269, 280, 315, 325, 326 Dertouzos, M., 82, 90 Dexter, A., 76, 90 Diebold, J., 9, 86 Diefendorff, K., 34, 35, 87, 336, 378 Dijkstra, E., 61, 88 Dolle, M., 336, 378 Donninger, C., 199, 263 D66s, B., 139, 156 Drury, D., 76, 90 Dubey, P., 336, 378 Dulong, C., 362, 363, 379 Dybkj~er, H., 268, 274, 280, 281,325 Dybkja~r, L., 268,274, 280, 281, 310,325, 326, 327
E
Ebeling, C., 200, 230, 263, 265 Eckert, J.P., 22, 25, 86 Eckert, W., 20, 86 Eggers, S., 362, 379 Ein-Dor, P., 11, 86 Eliasen, E., 120, 155 Epstein, E.S., 149, 157 Eshragian, K., 342, 378 Eskes, O., 223, 265
F
Faggin, F., 339, 348, 378 Failenschmid, K., 281,326
AUTHOR INDEX
Falzon, P., 293, 327 Farwell, D., 167, 188 Fatehchand, R., 268, 325 Feigenbaum, E., 23, 86 Feldman, J., 23, 86 Feldmann, R., 200, 202, 263 Felten, E., 230, 265 Findler, N., 243, 266 Fix, G.J., 124, 155 Fjortoft, R., 92, 153 Flynn, M., 30, 87 Forman, G., 367, 379 Fossum, E., 375, 379 Fox-Rabinovitz, M.S., 147, 157 Frank, I., 219, 265 Fraser, N.M., 287, 327 Friedrichs, K.O., 97, 154 Frnkranz, J., 207, 264
Gabor, A., 14, 86 Galerkin, B., 110, 154 Gandin, L., 141,156 Garlan, D., 74, 90 Gasser, R., 203, 260, 264 Gasterland, T., 273, 326 Gates, B., 82, 90 Gaunt, J.A., 118, 154 Gauvain, J.L., 280, 326 Gavrilov, D., 146, 156 Gerbino, E., 276, 326 Gerhart, S., 71, 89 Germond, T., 331,377 Gershman, A.G., 167, 187 Giachin, E., 276, 326 Gibson, R., 59, 88 Gifford, D., 9, 29, 86 Ginsberg, M., 219, 220, 221,264, 265 Glass, R., 60, 71, 88, 89 Goanhar, R., 343, 378 Godfrey, P., 273, 326 Goguen, J., 71, 89 Goldberg, A., 32, 87 Goldman, N., 166, 187 Goldstein, R., 76, 90 Goldstine, H., 23, 87, 338, 378 Goodman, K., 167, 188 Gordon, S., 250, 266
383
Gorry, G., 75, 90 Gosling, J., 62, 89 Gotlieb, C., 9, 86 Gower, A., 230, 265 Gray, J., 63, 89 Grosch, H., 11, 86 Grosz, B.J., 274, 326 Guterl, F., 33, 87 Gwennap, L., 363, 379
H
Haeckel, S., 80, 90 Hafner, K., 45, 46, 48, 50, 51, 57, 88 Hall, M., 34, 87 Haltiner, G.J., 104, 154 Hammer, M., 14, 86 Hasegawa, A., 332, 378 Hatton, L., 71, 89 Heath, S., 342, 357, 378 Heid, U., 275, 326 Heikes, R.P., 128, 129, 130, 131,155 Heinrich, J., 331,377 Heisterkamp, P., 280, 313, 326, 327 Henderson, J., 14, 79, 86 Hennessy, J., 34, 87, 339, 378 Hevner, A., 69, 84, 89 Hiltzik, M., 12, 32, 51, 53, 54, 86 Hinchey, M., 71, 89 Hirakawa, H., 175, 188 Hoane, J., 200, 233, 263 Hoffman, R.N., 149, 157 Hovermale, J.H., 144, 156 Hsu, F., 200, 202, 230, 232, 233,263, 264, 265
Hurson, A., 362, 379 Hutchins, J., 169, 171, 188 Hyatt, R., 230, 265
Intel Corporation, 335, 378 Isabelle, P., 175, 188 Iskandarani, M.. 147, 157 Iwanska, L., 273, 326
384
AUTHOR INDEX
Jacobson, G., 250, 251,266 Jacobson, I., 71, 89 Jacopini, G., 61, 89 Jagger, D., 332, 378 Janjic, Z.I., 146, 156 Jelinek, F., 178, 188 Johnson, L., 27, 87 Johnson, M., 357, 379 Johnson, W., 357, 379 Jouppi, N., 34, 87 Junghanns, A., 202, 226, 261,264, 266
Kahn, R., 56, 88 Kaji, H., 167, 188 Kalnay, E., 95, 149, 150, 153, 157 Kane, G., 331,377 Karels, M., 40, 88 Kasahara, A., 112, 154 Kasparov, G., 236, 265 Kast, F., 75, 90 Kavi, K., 362, 379 Kay, M., 167, 174, 187, 188 Kazman, R., 74, 90 Keim, G., 261,266 Khan, A., 344, 365, 378 King, G.W., 168, 188 King, M., 173, 188 Knight, B., 225, 265 Knuth, D., 194, 195, 262 Kobliska, G., 214, 264 Korf, R., 196, 260, 263, 266 Kouchakdjian, A., 71, 89 Krishnamurti, T.N., 151, 157 Krol, M., 190, 262 Krylov, V.I., 120, 155 Kumano, A., 175, 188 Kuo, Y.-H., 145, 156
Lafferty, D., 191,262 Lamb, V.R., 106, 154 Lamel, L., 275, 280, 326
Landsbergen, J., 166, 187 Lapsley, P., 357, 378 Lee, B., 362, 379 Lee, K.-F., 236, 265 Leffier, S., 40, 88 Leiner, B., 50, 56, 57, 88 Leith, C., 144, 156 Leith, C.E., 149, 157 Levy, D., 250, 266 Lewy, H., 97, 154 Licklider, J., 46, 88 Linger, R., 69, 71, 84, 89 Lipasti, M., 362, 379 Littman, M., 261,266 Longuet-Higgins, M.S., 112, 154 Loomis, M., 65, 89 Lord, S.J., 95, 153 Lorenz, E.N., 101, 116, 137, 148, 154 Lorenz, U., 202, 263 Lovtsky, 169, 188 Lu, P., 225, 265 Luftman, J., 14, 86 Luqi, 71, 89 Luz, S., 282, 322, 327 Lyon, M., 45, 46, 48, 50, 51, 57, 88
M
McAllester, D., 201,263 McCarthy, J., 163, 187, 231,265 McCartney, S., 22, 86 McConnell, C., 201,263 McDonald, A., 135, 155 McFadyen, J., 79, 90 McGlashan, S., 280, 326 Machenhauer, B., 110, 120, 144, 154, 155, 156
McKusick, M., 40, 88 McPherson, R.D., 95, 153 Mahajan, S., 236, 265 Maier, E., 280, 326 Malone, M., 340, 378 Manning, E., 78, 90 Marsland, T., 196, 198, 200, 263 Martin, B., 375, 379 Martin, J., 70, 89 Mass, C.F., 145, 156 Masuda, Y., 130, 131, 155 Melby, A.K., 173, 188
AUTHOR INDEX
Mercer, R.L., 178, 188 Merilees, P.E., 122, 155 Merwin-Daggett, M., 38, 87 Mesinger, F., 144, 146, 156 Metcalfe, R., 54, 55, 88 Mielke, P.W., 145, 156 Miller, M., 332, 378 Mills, H., 61, 88 Minker, J., 273, 326 Mintz, Y., 127, 129, 130, 155 Mitchell, H., 146, 156 Moore, J., 84, 90 Moore, R., 194, 195, 262 M tiller, M., 260, 266 Muraki, K., 167, 187 Myers, B., 67, 68, 89 Myers, G., 70, 89 Mysliwietz, P., 202, 263
N Nagao, K., 168, 188 Nagao, M., 176, 188 Nanomura, Y., 368, 379 National Technology Roadmap for Semiconductors, 339, 378 Nau, D., 219, 265 Naur, P., 61, 88 Nelson, H., 230, 265 Newborn, M., 197, 263 Nickovic, S., 146, 156 Nielsen, J., 276, 326 Nirenburg, S., 167, 187, 188 Nogami, H., 175, 188 Nolan, R., 75, 80, 90 Nowatzyk, A., 230, 265
O Oerder, M., 269, 325 Ohnishi, H., 130, 131, 155 Okumura, A., 167, 187 O'Neill, J., 45, 46, 49, 51, 88 Organick, E., 38, 87 Orr, K., 70, 89 Orszag, S.A., 120, 154 Otto, S., 230, 265
385 P
Padegs, A., 30, 87 Palmer, J., 27, 87 Palmer, T.N., 150, 157 Panofsky, H.A., 139, 156 Papp, D., 245, 248, 266 Paroubek, P., 275, 326 Parto, A., 205, 212, 264 Patel-Schneider, P.F., 176, 188 Patterson, D., 34, 87, 331,339, 378 Paulk, M., 69, 89 Pearl, J., 198, 263 Peebles, R., 78, 90 Pena, L., 207, 209, 245, 264, 266 Peng, J.-C., 269, 325 Pericliev, V., 174, 188 Peters, T., 80, 90 Peterson, J., 40, 88 Pfleeger, S., 71, 89 Phillips, N.A., 92, 103, 104, 153, 154 Pijls, W., 196, 198, 201,263 Pinker, S., 82, 90 Plaat, A., 196, 197, 198, 201,263 Platzman, G.W., 116, 118, 119, 154 Pleszkoch, M., 69, 84, 89 Predko, M., 345, 378 Pugh, E., 25, 27, 87 Purcell, S., 358, 379
Quarterman, J., 40, 88 Quine, w.v., 162, 187
Rackoff, N., 77, 90 Radin, G., 34, 87 Ralston, A., 71, 89 Randall, D.A., 128, 129, 130, 131,155 Randell, B., 61, 88 Raskin, V., 167, 187 Rasmussen, E., 120, 155 Reiger, C., 166, 187 Reinefeld, A., 198, 263 Reithinger, N., 280, 326 Richardson, L.F., 92, 153
386
AUTHOR INDEX
Riesbeck, C.K., 166, 187 Rimon, M., 174, 188 Ringler, T.D., 128, 130, 131, 155 Ritchie, D., 39, 87, 88 Rivest, R., 202, 264 Robert, A.J., 112, 127, 129, 130, 131, 132, 154, 155
Rockart, J., 78, 90 Rodgers, W., 7, 85 Rojas, R., 24, 87 Rosenbloom, P., 236, 265 Rosenzweig, J., 75, 90 Rosset, S., 280, 326 Rottmann, V., 202, 263 Royce, W., 69, 89 Rullent, C., 276, 326 Rumbaugh, J., 71, 89 Russell, S., 202, 264
Sadler, L., 166, 187 Sadourny, R.A., 127, 129, 130, 155 Salmon, B., 287, 327 Saltzer, J., 38, 87 Salus, P., 41, 88 Samuel, A., 190, 205, 227, 261,262, 265 Sawyer, J.S., 132, 155 Schaeffer, J., 196, 197, 198,200, 201,202, 203, 204, 207, 209, 225, 226, 227, 245, 248, 261,263, 264, 265, 266 Schank, R.C., 166, 187 Scherr, A., 78, 90 Scherzer, L., 203, 264 Scherzer, T., 203, 264 Schlett, M., 336, 371,378, 379 Schwab, E.C., 269, 325 Schwinn, J., 280, 326 Scott Morton, M., 75, 90 Seaman, N.L., 145, 156 Segaller, S., 34, 87 Segars, S., 332, 378 Seide, F., 269, 325 Shannon, C., 190, 262 Shapiro, S., 250, 266 Sharman, R., 268, 325 Shaw, M., 74, 90 Shazeer, N., 261,266 Shen, J., 362, 379
Sheppard, B., 209,250, 251,252, 253,264, 266 Sherer, S., 71, 89 Shriver, B., 346, 347, 378 Shuman, F.G., 94, 144, 153, 156 Sidner, C.L., 274, 326 Silberman, J., 112, 118, 154 Silberschatz, A., 40, 43, 88 Siles, R., 331,377 Slate, D., 196, 200, 203, 263, 264 Slocum, J., 173, 188 Smith, A., 357, 378 Smith, B., 346, 347, 378 Smith, H., 250, 266 Smith, J., 331,377 Smith, S., 219, 265 Smith, W., 202, 263 Spector, A., 9, 29, 86 Stahl, D., 80, 90 Stallings, W., 36, 39, 43, 44, 48, 55, 56, 87, 88 Staniforth, A., 126, 135, 136, 146, 155, 156 Stegun, I.A., 112, 154 Steinbiss, V., 269, 325 Stenchikov, G.L., 147, 157 Stevens, W., 70, 89 Stonebraker, M., 43, 65, 88, 89 Strachey, C., 224, 265 Strang, G., 124, 155 Stroustrup, B., 61, 89 Sturm, J., 269, 315, 325 Suarez, M.J., 147, 157 Sutton, R., 205, 212, 264 Swade, D., 5, 85 Szafron, D., 207, 209, 225, 245, 248,264, 265, 266
Takacs, E.L., 147, 157 Talagrand, O., 142, 156 Tanenbaum, A., 17, 32, 36, 37, 40, 41, 42, 43, 86, 88
Tarr, P., 80, 90 Taylor, M., 147, 157 Temperton, C., 124, 155 Tesauro, G., 212, 214, 218, 264 Thomas, T., 287, 327 Thompson, G., 145, 156 Thompson, K., 39, 87, 88, 191,202, 203, 230, 262, 264, 265
AUTHOR INDEX
Thomson, D.L., 277, 326 Throop, T., 219, 265 Thuburn, J., 128, 130, 131, 155 Tinsley, M., 224, 265 Tjaden, D., 203, 264 Toth, Z., 150, 157 Trammel, C., 69, 84, 89 Tredennick, N., 341,378 Treloar, N., 225, 265 Tribbia, J.J., 144, 147, 156, 157 Tridgell, A., 207, 212, 264 Truscott, T., 224, 265 Tucker, A., 167, 187 Turing, A., 190, 262 Turley, J., 332, 333, 335, 378
U Uchida, H., 167, 187 Ullman, J., 43, 88 Ullrich, W., 77, 90
V van Kuppevelt, J., 275, 326 Vauquois, B., 166, 187 Venkatraman, N., 14, 79, 86 Vital, F., 269, 325 von Neumann, J., 23, 87, 92, 153, 338, 378 Vyssotsky, V., 38, 87
Weigle, W.F., 116, 154 Weiss, R., 358, 379 Weiss, S., 331,377 Weizer, N., 36, 87 Welander, P., 98, 132, 154 Weste, N., 342, 378 Wexelblat, R., 59, 88 Whinston, A., 80, 90 Whittaker, M., 221,265 Wiin-Nielsen, A., 132, 155 Wilkes, M., 351,378 Wilks, Y., 167, 188 Wilks, Y.A., 177, 188 Williams, D., 281,326 Williams, F., 5, 85 Williams, R.T., 104, 154 Williamson, D., 131, 155 Williamson, D.L., 127, 131, 155 Wilson, J., 331,377 Winegrad, D., 22, 86 Wing, J., 71, 89 Wise, A., 80, 90 Wiseman, C., 77, 90 Wisowaty, J.L., 277, 326 Witkam, A.P.M., 166, 187 Wolff, M., 30, 87 Wyard, P.J., 277, 326
Yellin, F., 62, 89 Yourdon, E., 70, 89
W
Waibel, A., 276, 326 Wallnau, K., 73, 90 Warner, T.W., 145, 156 Watson, R., 38, 87 Weaver, D., 331,377 Weaver, L., 207, 212, 264 Wefald, E., 202, 264
Zachman, J., 73, 90 Zadeh, N., 214, 264 Zahorjan, J., 367, 379 Zoltan-Ford, E., 293,327 Zorian, Y., 375, 379
387
This Page Intentionally Left Blank
Subject Index Augmented Transition Network, 323 Automation era, 7-9, 36 ABC, 23-4 Absolute vorticity, 106 Ace-Ace, 246 Action Selector, 246 Adjustment terms, 132 Advanced Network Services (ANS), 57 Advanced Research Projects Agency (ARPA), 45-6 ALGOL-68, 61 Algorithmic programming, 58-62 Alignment era, 14 ALOHANET, 55 ALPAC, 171, 172 Alpha-beta search algorithm, 191, 193-5, 202, 209, 210-11 Altair, 13 Alto computer, 32, 53 Analysis point, 140 ANSI, 84 A-O compiler, 59 Apollo Computer, 33 Apple Computer, 13, 32, 33, 42 Apple Macintosh, 33, 42 Apple Power PC, 35 Application programming interface (API), 277, 373-4 Application specific standard products (ASSP), 358 ARM, 336 ARPA, 48, 52 ARPANET, 11, 13-14, 32, 48-51, 55-8 Artificial intelligence, 163, 166-7, 190 ASIC, 336, 358, 368, 374, 375 Assembly language, 36 ASSP, 368, 370 Asynchronous Transfer Mode (ATM), 44 ATLAS-II, 167 ATR, 183 AT&T, 47-8
B
B* algorithm, 201 Backgammon, 206, 211 - 18 Background error, 141 Background field, 140 Barotropic fluid, 104 Barotropic vorticity equation (BVE), 105-6, 114, 116, 119, 124, 126, 130, 132, 133, 135 Batch processing, 36-7 Batch sequential architecture, 76 BBN, 49, 50 Besk machine, 94 Bilingual lexicons, 165 BINAC, 339 Bingo Blocking, 252 Boundary value problem, 122 BPIP, 202 Bravo text processor, 54 Bred modes, 150 Breeding concept, 150 Bridge, 208, 218-24 Bridge Baron program, 219 BUNCH, 8 Busicom calculator, 339 Business automation, 18-24 Business computing eras, 4-16 see also specific eras overview, 2- 3 software technologies, 72 technology components, 4 Business data processing, 18 Business process reengineering (BPR), 14, 81 Business programming languages, 59-60 Business strategy-information system strategy alignment, 81
389
390
SUBJECT INDEX
Business system architecture, 3, 73-81, 85 Business tools, 81-2
C programming language, 61-2 C++, 61-2, 71 Cache memories, 357 Calculation era, 5-7, 35 Candide system, 178, 182 Capability Maturity Model (CMM), 69-70 Card Programmed Calculator, 20 Carnegie Mellon University, 183 Cash registers, 7 Cellphone phones, 333 Census of 1890, 5 CETA, 166, 167 CFL stability criterion, 97, 104, 131, 132 Checkers (draughts), 224-30 Chess, 230-6 Chinook, 224- 30 Cilkchess, 206 CISC technology, 34 Cleanroom methods, 71 Client-server architecture, 78 Climatology, 149 CMOS, 342 COBOL, 36, 60-2, 70, 76 Colorado State University, 145 Commercial off-the-shelf software (COTS), 84-5 Communication, 43-58 architectures, 79 spoken language dialogue systems (SLDSs), 303-17 Communications inflexion point, 82-3 Compaq Alpha, 365 Compatible Time-Sharing System, 51 Competitive potential, 79 Compiler technology, 34 Complex Instruction Set Computer (CISC), 351-7 Component-based architectures, 79-80, 80 Component-based development (CBD) methods, 71 - 3 Component-based software development, 84-5
Computation-communication inflexion point, 12 Computation platform, 16-43 Computer-aided design (CAD), 66-7 Computer-aided manufacturing (CAM), 667 Computer interfaces, 65-6 Computer networking. See Network(ing) Computer processor systems, 246-7 Computer programming languages, 59 Computerized file management, 62-3 Computing cycle, 122 Conceptual schema, 64 Control Data Corporation (CDC), 49 Conversational Monitoring System (CMS), 39 Cooperative system, 75 CORBA, 73, 80 Coriolis parameter, 105, 115 COTS components, 80 CPM, 67 CP/M-86 operating system, 41 Critical success factor (CSF) method, 77-8
Daimler-Chrysler dialogue manager, 280, 298, 303, 312, 314, 318, 324 Danish Dialogue System, 280, 287, 290, 2923,295,298,300, 307, 309, 312, 320, 323 Data analysis, 99 Data Base Task Group (DBTG), 63 Data-centric paradigm, 70 Data communications, 11, 13 Data initialization, 99 Data management, future directions, 65 Data mining, 83-4 Data-oriented methods, 70 Data-oriented system development, 70 Data processing, stages of growth, 75-6 Data warehousing, 83 Database systems, 62-5 Datagrams, 56 DCOM, 73, 80 Decentralization era, 12-14, 30, 39 Decision support systems, 83 Deep Blue, 191,232, 234-6 Default reasoning, 164 Defense calculator, 25
SUBJECT INDEX
Desktop computers, 32-3, 39-43, 58, 340 Digital consumer market, 334 Digital Equipment Corporation (DEC), 31, 32 Digital Signal Processing (DSP), 333, 358, 359, 360 Digital versatile disc (DVD) player, 334 Distributed architectures, 78 Distributed business systems, 78 Distributed client-server architectures, 77-9 Distributed networks, 47 DLT, 166 Docutran, 182 DOD-STD-2167A, 69 Domain communication, spoken language dialogue systems (SLDSs), 303-5 DRAM, 353, 368 Drawing programs, 67 Drum machines, 26 3DVAR, 142 4DVAR, 142
Earth's radius, 115 Earth's rotation, 112-13 Earth's vorticity, 105 Eckert-Mauchly Computer Corporation (EMCC), 24 ECMWF. See European Center for MediumRange Weather Forecasts (ECMWF) EDSAC, 338 EDVAC, 7, 21-3, 338 Eigenvalue problem, 111 Electronic commerce, 83 Electronic Control Company, 7 Electronic design automation (EDA), 376 Electronic mail, 51 Embedded controllers, 244-5 Embedded controllers/processors vs. computer processors, 364-5 Embedded microcontrollers, 370-2 Embedded microprocessor marketplace, 332- 7 architectures, 332-3 main players, 332 product categories, 335-7 Embedded microprocessors, 246, 329-79 architectures, 248-63 classes, 370-2
391
classification, 242-8 core function extensions, 358-61 core implementation methodologies, 351-7 differentiating factors, 365-8 evaluation, 365-6 evolution, 337-42 future performance gains, 361-3 implementation, 363-4 implementation trends, 365-8 integration challenge, 374-6 logic functions, 374-6 overview, 330-1 scalable systems, 370-2 system approach, 369-70 system level approach, 242-4 upwards compatibility, 372 Embedded software and standardization efforts, 372-4 ENIAC, 7, 8, 20-3, 338 Ensemble prediction techniques, 148-51 Enstrophy, 106 Enterprise JavaBeans, 73, 80 Enterprise Resource Planning (ERP), 80, 85 EPIC, 362-3 Error covariance matrices, 142 Error equation, 110 Eta model, 146 Ethernet, 11, 54-6 European Center for Medium-Range Weather Forecasts (ECMWF), 94-5, 122, 136, 139, 150 Evaluation function, 192, 205, 207 Event-driven architectures, 80 Expansion coefficients, 110, 111, 114, 115, 119 Expansion functions, 110, 112, 113 Expected value (EV), 246-7 Expert systems, 83 External subschema, 63
Fairchild Semiconductor, 11 File systems, 62-5 File transfer protocol (FTP), 50 Finite difference discretization, 147 Finite difference equations, 104, 111 Finite difference method, 102 Finite difference model, 146 Finite difference techniques, 106
392
SUBJECT INDEX
Finite element method, 107, 123-7 Finite State Machine, 323 Formal development methods, 71 FORTRAN, 26, 36, 60, 70 FORTRAN-IV, 61 Fourier series, 103, 112, 114 FPU, 358 FTP, 56 Fujitsu VPP 700, 95
Galerkin approximation, 110 Galerkin methods, 107, 125, 126 Games, 189-266 advances, 191-261 alternative approaches, 201 - 2 application-dependent knowledge, 197 caching information, 195-6 endgame database, 203 history heuristic, 197 killer heuristic, 197 knowledge advances, 203-7 move ordering, 196-7 null-move searches, 199 perspectives, 210-11 progress in developing, 191 quiescence search, 200 research tool, 190 search advances, 192-203 search depth, 199-200 search methods, 210 search window, 197-8 simulation-based approaches, 207-10 singular extensions, 200 transposition table, 196, 197, 203 see also specific games GAT, 181 Gauss- Lobotto quadrature, 147 Gaussian quadrature, 120 Gaussian weights, 120 General Systems Theory, 75 Generic Opponent Modeling (GOM), 248 Georgetown GAT system, 170 Geostrophic forecast system, 96 Gesture recognition, 68 GETA, 167, 173 GIB, 219-24 Gibbs phenomenon, 119
Go, 260 Graphical user interface (GUI), 53-4, 61, 64, 368 Graphics, 67 Gravity waves, 132 Grid box, 128 Grid elements, 128 Grid point models, 145 Grid stretching, 101 Gridpoint models, 103 Grosch's law, 11
H
Hand Evaluator, 245 Hardware, 16-43 classic generations, 17 Harvard Mark I, 21, 59 Hat functions, 124 Hermite polynomials, 112 Hewlett-Packard, 33 HICATS, 181 HICATS/JE, 167 Hitech chess program, 201 Holerith card, 5 Honeywell DDP 516 minicomputers, 49 Hough functions, 112 HTML, 67 Human-computer interaction (HCI), 65-8 Hydrostatic approximation, 145 Hydrostatic forecast system, 96 Hypertext, 67
I
IA-64, 362-3 IBM, 5, 7, 8, 13, 18 IBM 1401 computer system, 27-8, 36, 38, 66 IBM 1403 chain printer, 27, 37, 53, 66 IBM 601 series, 20 IBM 650, 26 IBM 700 series, 8, 25, 28-9 IBM 702 data processing machine, 25 IBM 7090, 28-9, 36 IBM 7094, 36 IBM automatic sequence controlled calculator, 21 IBM disk technology, 27
SUBJECT INDEX IBM PC, 13, 33, 41,340 IBM PC/AT, 33, 42 IBM PC/XT, 33, 42 IBM PS/2, 33, 42 IBM System/360, 10, 27, 29-30, 351 IBM System/360-370, 9-12 IBSYS, 36 Icosahedral grid, 129, 131 Icosahedron, 128 IEEE 754 binary arithmetic standard, 358 IEEE 1149.1 standard, 375 IEEE P1500 standard, 375 Immediate Scoring, 252 In-car information systems, 334 Incremental development, 69 Indexed Sequential Access Mechanism (ISAM), 63 Industrial control market, 335 Information engineering, 70 Information growth, 83-4 Information retrieval, machine translation (ST), 185 Information systems, in-car, 334 Information technology (IT) strategy, 79, 80 Innovation era, 9-12 Instruction level parallelism (ILP), 363 Instruction set architecture (ISA), 359 Integrated circuits, 10, 30-2 Integration era, 9-12 Intel 8080 microprocessor, 13 Intel Pentium, 365 Intelligent agents, 83 Interaction coefficient, 118, 119, 122, 123, 126 Interactive computing, 11 Interface Message Processors (IMPs), 49-50 International technology roadmap for semiconductors, 340, 341 Internet, 15-16, 62, 67, 80 services and appliances, 334 Internet Othello Server (lOS), 237 Internetworking, 56-7 I/O operations, 37 IRC, 244, 248 ISO-9000, 70
Jacobian, 106, 126 JAUNTIER, 250
393
Java, 71 Job Control Language (JCL), 36 JOUNCES, 250
KBMT-89, 167 KnightCap, 207 Knowledge discovery, 83 Knowledge growth, 83-4 Knowledge management, 83 Knowledge management support systems, 83 Knowledge sources for machine translation (MT), 168-9 Kroneker delta, 113
Lagged Average Forecast (LAF) technique, 149-50 Lagrangian integration scheme, 132 LANs, 13, 14, 30, 44, 45, 55-6, 58 Laplacian, 111, 113, 115, 120, 125, 130 Large-scale integration (LSI), 10, 53 Laser printer, 52 Layered architectures, 78 Legendre cardinal functions, 147 Legendre functions, 114 Legendre polynomials, 112, 113, 120, 121 Legendre spectral transform method, 148 LEO computer, 16 Linear basis functions, 126 LINUX, 84 Local area networking. See LANs Logistello, 237-42 Loki, 244- 50
M
Machine translation (MT), 159-88 acceptability threshold, 180-1 adaptivity, 184 AI-oriented, 166-7 artificial languages, 173-4 bilingual corpora, 182 choices and arguments for and against paradigms, 173-80
394
SUBJECT INDEX
Machine translation (MT) (continued) computation problem, 163-5 cross-linguistic differences, 176 current situation, 185-6 direct systems, 165 domain-specialized, 182 evolution, 169-73 example-based, 168, 184 first generation, 172 formalist approach, 177 fourth generation, 173 future technology, 186-7 "generations", 172-3 immaturity, 185-6 impossible?, 161 - 3 information retrieval, 185 interlingua, 165-6, 176-8 as natural language, 177-8 knowledge sources for, 168-9 knowledge-based (KBMT), 166, 167, 175-6 linguistics-inspired theories, 164 main paradigms, 165-9 major concepts, topics and processes. 170 meaning-based, 177 multi-engine, 183-4 multilingual text processing, 184- 5 natural language, 173-4 new optimism of, 186-7 novel applications, 182-5 overview, 160-1 partial automation, 181 practical systems, 184 public attitudes toward, 171 real world, 180-5 research, 171, 180 restricting the ambiguity of source text. 181-2 rule-based systems, 168 second generation, 173 spoken language, 182-3 statistical, 168, 178-80, 182 third generation, 173 transfer, 178 transfer systems, 165-6 treatment of meaning, 174-5 Magnetic disk secondary storage, 26-7 Magnetic Drum Computer (MDC). 26 MAILBOX, 51 Mainframe computer architectures, 75-6
Management of information systems (MIS), 73-4 Manual business processes, 75 MARGIE. 166 Massively parallel processor (MPP), 122 Master File-Transaction File applications, 76 Maven. 251-4 Mechanical office, 5 Medium-scale integration (MSI), 10, 53 Memory technology. 34 MERIT. 57 Meta-communication, spoken language dialogue systems (SLDSs), 306-11 METAL system, 173, 176 Micro Instrumentation Telemetry Systems (MITS), 32 Microsoft, 32. 33, 42, 43, 67 Military research, 17-18 MILNET, 57 MIMD, 123 Miniaturization, 82 Minicomputer, 30-2 Minimal window, 198 Minimax. 192-4 MIPS, 34-5. 336. 354, 355 MIPS/Watt performance ratio, 367 MM5 model, 145 Model variability, 102 Monte Carlo simulation, 192 Monte Carlo techniques, 149, 209 Moore's law, 11, 82, 339, 340, 366 Motorola 68000. 33 MPACT. 358 MPP. 123 MS-DOS. 39, 41-2. 67 MTD(f) algorithm, 198 MULTICS. 38-40 Multimedia. 358 Multimedia data, 65 Multiprogramming. 37-9 Multitasking, 37 MVS (multiple virtual storage), 39 MVS/ESA. 39 MVS/XA, 39
N NASA/Goddard GEOS model, 147 National Cash Register Company (NCR), 7
SUBJECT INDEX National Centers for Environmental Prediction (NCEP), 95 National Meteorological Center, 122, 144 National Weather Service/NCEP, 146 Navier-Stokes equation, 93 NCAR/CCM3, 147 NCEP model, 150 NegaScout, 198, 237 Net computer (NC) initiative, 336 Network data model, 63 Network operating systems, 39-43 Network protocols, 50-1 Network Working Group (NWG), 50 Networking, 43-58, 334-5 Neurogammon program, 213 Nine Men's Morris, 203 Nonlinear instability, 103 Nonorthogonal functions, 112 NSFNET, 14, 57 Numerical weather prediction (NWP), 91157 computational methods, 102- 7 data analysis, 137-43 data assimilation, 137-43 ensemble techniques, 148- 51 Eulerian approach, 93, 107 finite-element method, 123-7 Galerkin approach, 96 initialization process, 143-4 interaction coefficient method, 119 Lagrangian approach, 93, 98, 107 Monte Carlo approach, 149 regional, 144-8 semi-Lagrange method, 131-7 spectral methods, 96, 107-23 spherical geodesic grids, 127- 31 time truncation, 131-7 NWS/NCEP, 150
Object-oriented languages, 71 Object-oriented methods, 65, 71 Object-relational technology, 65 Objective analysis, 139-40 Observation point, 140 Observational error, 141 Office equipment, 334-5 Office of the Future, 51-4
395
On-line data processing, 63-4 On-line real-time architectures, 76-7 Open Software Foundation (OSF), 41 Open source software, 84 Open system standards, 84 Operating systems, 16-43 Operetta telephone directory, 287 Orac, 243-4 OS/2, 42 OS/360, 37 OS/SVS, 39 Othello, 236-42 OVERTOIL, 250
Packet switched networking, 46-8 Pangloss, 167 Parallel processors, 97 PA-RISC, 358 Partition search algorithm, 220 Pascal, 61 PC companions, 333 PC-DOS, 41 PDP-8, 31 PDP-10, 32 Penn State University/NCAR model, 145 Personal calculators, 35 Personal Computer (PC), 13 Phonetic typewriter, 268 Physical schema, 64 Pipe and filter architecture, 76 PIVOT, 167 PL/1, 61 Pluggable Sequence Relay Calculator (PSRC), 20 Poker, 242-50 Polar problem, 127, 136 Pole problem, 104 Position value, 204 POSIX, 40-1 PowerPoint, 67 Prediction equations, 94, 100, 107, 110, 115, 136 Preference Semantics, 167 Principal variation, 193 Principal Variation Search, 198 Printers, 334-5 ProbCut, 199-200, 238
396
SUBJECT INDEX
Procedure-oriented methods, 70 Programmed Data Processor (PDP) series, 31 Project MAC, 38 Project Whirlwind, 28, 31 Punched-card data management, 62 Punched-card methods, 18-20 Punched-card record keeping, 5
QDOS, 41
Radiation, 138 RailTel/ARISE, 280, 287, 301,302, 307, 309, 320, 324 RAMS model, 145 RAND Corporation, 46 Random access memory (RAM), 28 Random Access Memory Accounting Machine (RAMAC), 27 Real-time business applications, 77 Reanalysis, 144 Reduced Instruction Set Computer (RISC), 34, 351-7 architecture evolution, 357-8 Berkeley-RISC approach, 354 load/store principle, 354 Stanford-RISC approach, 354 Reengineering era, 14 Regional prediction modeling, 144-8 Relational databases, 64 Report Program Generator (RPG) programming language, 28 Repository architecture, 77 Request for Comment (RFC1), 50 Research Character Generator (RCG), 53 Rhomboidal truncation, 117 Richardson's equations, 92 Robust parsing, 275 Rossby number, 143 Rossby type motions, 135 Rossby waves, 131 RPG (Report Program Generator), 60 Rubik's Cube, 260
Scanning Laser Output Terminal (SLOT), 52-3 Scientific calculation, 18-24 Score and Rack, 252 Scrabble, 250-9 Selective sampling, 209 Semi-implicit scheme, 97 Semi-Lagrange method, 131 - 7 Separated cache structures, 357 Service level, 79 Set-top boxes, 333-4 SH7709 embedded processor, 364 Shallow water equations model (SWE), 12931 Siemens Nixdorf Corporation, 183 Sigma coordinates, 147 Silicon Graphics, 33 SIMD parallel processor, 122-3 Simula-67, 71 Simulation search, 210 Simultaneous multithreading architecture, 362 Single-instruction-multiple-data (SIMD), 358, 363 Sketchpad, 67 SLDSs, 267-327 advanced linguistic processing, 302-3 after sub-task identification, 302 barge-in technology, 295 communication, 303-17 core functionality, 279 co-reference and ellipsis processing, 303 design and construction, 271 design-time, 280 development, 279 dialogue closure, 316-17 dialogue context, 273 dialogue control, 273-4 dialogue history, 317 dialogue management, 278-324 dialogue manager, 281,283-6, 294, 300, 302, 307, 322-4 see also specific dialogue managers domain, 273 domain communication, 303-5 domain structure, 288 efficient task performance, 281 elements, 271-2
SUBJECT INDEX error loops, 313-14 expression of meaning, 312-13 feedback, 314-16 feedback on what the system understood, 293 focused output, 294 focused questions, 313-14 getting the user's meaning, 285-303 global focus, 301-2 graceful degradation, 313-14 histories, 317-18 human factors, 276-7 ill-structured tasks, 288-9 implementation issues, 321 - 3 indirect dialogue act processing, 303 information feedback, 314-16 initiative initiation, 295-6 input language processing control, 284-5 input prediction, 284 input prediction/prior focus, 298-9 instructions on how to address the system, 292-3 interaction history, 273 key components, 277 knowledge-based input prediction, 299 language layer, 283 layers, 271-2 linguistic history, 318 linguistic processing, 274-5 local focus, 300 meta-communication, 306-11,324 mixed initiative dialogue, 296-8 multilingual systems, 282 multimodal systems including speech, 281-2 multitask, multiuser systems, 283 negotiation tasks, 290-1 novice and expert users, 319-20 order of output to user, 323 out-of-domain communication, 311 output control, 285, 293-4 overall processing flow, 272 overview, 269-71 performance, 276-7 performance history, 318-19 process feedback, 316 processing feedback, 293 real-time requirements, 285 run-time, 280 speech generators, 276
397
speech layer, 275-6, 283 speech recognition, 275-6 statistical input prediction, 299 sub-task identification, 299-302 system capabilities, 292 system-directed dialogue, 296-7 system-initiated clarification metacommunication, 307-9 system-initiated repair metacommunication, 306- 7 system initiative, 294 system varieties, 281-3 systems integration, 277, 278 task and domain independence, 324 task complexity, 285-91 task history, 317-18 task-oriented, 271 - 8 task structure, 288-90 textual material, 294-5 topic history, 318 unimodal speech input/speech output, 281 user access, 277 user-directed dialogue, 298 user groups, 319- 20 user-initiated clarification metacommunication, 310-11 user-initiated repair meta-communication, 309-10 user input control, 291 - 5 user modelling, 273 user properties, 320-1 volume of information, 287-8 well-structured tasks, 289-90 Smalltalk, 71 SMALTO, 282 SMTP, 56 Software, 58-73 Software development, 28 component-based, 84- 5 Software development processes and methods, 68-73 Software Engineering Institute (SEI), 69 Software technologies, business computing, 72 Sokoban, 260 Solid harmonics, 113, 114 SPANAM, 181 SPARC, 34 Spectral Element Atmospheric Model (SEAM), 147
398
SUBJECT INDEX
Speech, 68 Speech processing, 182-3 Speech recognition, 271 Speech technologies, 267-327 Spherical geodesic grids, 127-31 Spiral mode, 69 Spoken language dialogue systems. See SLDSs Spooling, 37, 38 Spreadsheets, 66 SRAM, 367 Standard operating procedures (SOPs), 75 Stanford University Network (SUN), 34 Statistical interpolation, 140 Statistical sampling, 209 Strategic alignment model, 79 Strategy execution, 79 Stream function, 115 Stretched grid model, 146 Stretching factors, 147 Structured programming languages, 60-1 Structured Query Language (SQL), 64 Subjective analysis, 139 Sun Microsystems, 33, 34 Sun Ultrasparc, 365 Surface boundary effects, 104 Surface spherical harmonics, 112 System-on-a-chip (SOC), 374, 375-6, 376 System V Interface Definition (SVID), 40 SYSTRAN, 172, 173, 181
Tape Processing Machine (TPM), 25 Task-oriented spoken language dialogue systems, 271 - 8 TAUM, 167 TAUM-aviation, 175 TAUM-METEO, 172, 181 TCP/IP protocol, 13-14, 15, 40, 56 TD(,k), 206 TD-gammon 3.0, 212-14 TDLeaf(,k), 207 Technology transformation, 79 TELNET, 50, 56 Temporal difference learning (TDL), 192, 205-6 TENEX, 51 Texas Hold'era, 242, 243
Text-based command interfaces, 66-7 Text-based command languages, 67 Text editing, 66 Three-dimensional graphics, 68 Time sharing, 37-9 Time Sharing Option (TSO), 39 Time Sharing System (CTSS), 38 Time truncation, 131 - 7 TITUS. 182 Total quality management (TQM), 14, 79 Transform method, 120 Transmission Control Protocol and Internet Protocol (TCP/IP), 11 Trapezoidal formula, 120 Triangular truncation, 117-18, 121 Triple Generator, 245 Truncation errors, 119, 131 TV sets, 333-4
Ubiquitous Computing, 15-16, 80 ULTRA, 167 Unified cache architectures, 357 Unified model, 145 Unified Modeling Language (UML), 71 United Kingdom Meteorological Office (UKMO), 96, 145 UNIVAC, 7-9, 17, 24-5, 59, 339 UNIVAC I computer system, 62 Universities, 17-18 University of Hawaii, 55 University of Pennsylvania, 21-3 UNIX, 33, 34, 39-43, 67, 76 URLs, 67
Variance formula, 141 Variational analysis, 99 VAX 11/780, 32 VAX architecture, 32 Verbmobil, 282, 290-1,301,303, 304, 312, 318, 323 Verilog, 376 Very large-scale integration (VLSI), 10, 53, 54 Very long instruction word processing (VLIW), 357, 358, 363
SUBJECT INDEX VHDL, 376 Video game consoles, 333 Virtual Sequential Access Mechanism (VSAM), 63 Visual Basic (VB) programming language, 61-2 Vitual reality, 68 VMS operating system, 32 Voice recognition, 68 Vorticity prediction, 105
W
WANs, 13, 14, 30, 44, 58 Warnier-Orr methods, 70 Waxholm, 301,307, 312, 323 Weather prediction se e Numerical weather prediction (NWP) Web-based architectures, 80-1 Web browser interfaces, 67-8
399
Weight Table, 246-8 Wide area networking. S e e WANs WIMP, 11, 42, 54, 58, 67 Windows NT, 33, 43 Word processing, 66 Workstations, 33-5, 80, 340-1 World Wide Web (WWW), 15, 56, 65, 67, 68, 80, 82
Xerox PARC, l l, 13, 32, 42, 51-6, 58 Xerox Star, 54 XML, 68
Z3, 23-4
This Page Intentionally Left Blank
Contents of Volumes in This Series Volume 21 The Web of Computing: Computer Technology as Social Organization ROB KLING and WALT SCACCHI Computer Design and Description Languages SUBRATA DASGUPTA Microcomputers: Applications, Problems, and Promise ROBERT C. GAMMILL Query Optimization in Distributed Data Base Systems GIOVANNI MARIA SACCO AND S. BING YAO Computers in the World of Chemistry PETER LYKOS Library Automation Systems and Networks JAMES E. RUSH
Volume 22 Legal Protection of Software: A Survey MICHAEL C. GEMIGNANI Algorithms for Public Key Cryptosystems: Theory and Applications S. LAKSHMIVARAHAN Software Engineering Environments ANTHONY I. WASSERMAN Principles of Rule-Based Expert Systems BRUCE G. BUCHANAN AND RICHARD O. DUDA Conceptual Representation of Medical Knowledge for Diagnosis by Computer: MDX and Related Systems B. CHANDRASEKARAN AND SANJAY MITTAL Specification and Implementation of Abstract Data Types ALFS T. BERZTISS AND SATISH THATTE
Volume 23 Supercomputers and VLSI: The Effect of Large-Scale Integration on Computer Architecture LAWRENCE SNYDER Information and Computation J. F. TRAUB AND H. WOZNIAKOWSKI The Mass Impact of Videogame Technology THOMAS A. DEFANTI Developments in Decision Support Systems ROBERT H. BONCZEK, CLYDE W. HOLSAPPLE, AND ANDREW B. WHINSTON Digital Control Systems PETER DORATO AND DANIEL PETERSEN
401
402
CONTENTS OF VOLUMES IN THIS SERIES
International Developments in Information Privacy G. K. GUPTA Parallel Sorting Algorithms S. LAKSHMIVARAHAN,SUDARSHAN K. DHALL, AND LESLIE L. MILLER
V o l u m e 24 Software Effort Estimation and Productivity S. D. CONTE, H. E. DUNSMORE, AND V. Y. SHEN Theoretical Issues Concerning Protection in Operating Systems MICHAEL A. HARRISON Developments in Firmware Engineering SUBRATA DASGUPTA AND BRUCE D. SHRIVER The Logic of Learning: A Basis for Pattern Recognition and for Improvement of Performance RANAN B. BANERJI The Current State of Language Data Processing PAUL L. GARVIN Advances in Information Retrieval: Where Is T h a t / # * & ( a r Record? DONALD H. KRAFT The Development of Computer Science Education WILLIAM F. ATCHISON
V o l u m e 25 Accessing Knowledge through Natural Language NICK CERCONE AND GORDON MCCALLA Design Analysis and Performance Evaluation Methodologies for Database Computers STEVEN A. DEMURJIAN, DAVID K. HSIAO, and PAULA R. STRAWSER Partitioning of Massive/Real-Time Programs for Parallel Processing I. LEE, N. PRYWES, AND B. SZYMANSKI Computers in High-Energy Physics MICHAEL METCALF Social Dimensions of Office Automation ABBE MOWSHOWITZ
V o l u m e 26 The Explicit Support of Human Reasoning in Decision Support Systems AMITAVA DUTTA Unary Processing W. J. POPPELBAUM, A. DOLLAS, J. B. GLICKMAN, and C. O'TOOLE Parallel Algorithms for Some Computational Problems ABHA MOITRA AND S. SITHARAMA IYENGAR Multistage Interconnection Networks for Multiprocessor Systems S. C. KOTHARI Fault-Tolerant Computing WING N. TOY Techniques and Issues in Testing and Validation of VLSI Systems H. K. REGHBATI
CONTENTS OF VOLUMES IN THIS SERIES
403
Software Testing and Verification LEE J. WHITE Issues in the Development of Large, Distributed, and Reliable Software C. V. RAMAMOORTHY,ATUL PRAKASH,VIJAY GARG, TSUNEO YAMAURA,AND ANUPAM BHIDE
V o l u m e 27 Military Information Processing JAMES STARK DRAPER Multidimensional Data Structures: Review and Outlook S. SITHARAMA IYENGAR, R. L. KASHYAP, V. K. VAISHNAVI, AND N. S. V. RAO Distributed Data Allocation Strategies ALAN R. HEVNER AND ARUNA RAO A Reference Model for Mass Storage Systems STEPHEN W. MILLER Computers in the Health Sciences KEVIN C. O'KANE Computer Vision AZRIEL ROSENFELD Supercomputer Performance: The Theory, Practice, and Results OLAF M. LUBECK Computer Science and Information Technology in the People's Republic of China: The Emergence of Connectivity JOHN H. MAIER
V o l u m e 28 The Structure of Design Processes SUBRATA DASGUPTA Fuzzy Sets and Their Applications to Artificial Intelligence ABRAHAM KANDEL AND MORDECHAY SCHNEIDER Parallel Architecture for Database Systems A. R. HURSON, L. L. MILLER, S. H. PAKZAD, M. H. EICH, AND B. SHIRAZI Optical and Optoelectronic Computing MIR MOJTABA MIRSALEHI, MUSTAFA A. G. ABUSHAGUR, AND H. JOHN CAULFIELD Management Intelligence Systems MANFRED KOCHEN
V o l u m e 29 Models of Multilevel Computer Security JONATHAN K. MILLEN Evaluation, Description, and Invention: Paradigms for Human-Computer Interaction JOHN M. CARROLL Protocol Engineering MING T. LIU Computer Chess: Ten Years of Significant Progress MONROE NEWBORN Soviet Computing in the 1980s RICHARD W. JUDY AND ROBERT W. CLOUGH
404
CONTENTS OF VOLUMES IN THIS SERIES
Volume 30 Specialized Parallel Architectures for Textual Databases A. R. HURSON, L. L. MILLER, S. H. PAKZAD, AND JIA-BING CHENG Database Design and Performance MARK L. GILLENSON Software Reliability ANTHONY IANNINO and JOHN D. MUSA Cryptography Based Data Security GEORGE J. DAVIDA AND YVO DESMEDT Soviet Computing in the 1980s: A Survey of the Software and its Applications RICHARD W. JUDY AND ROBERT W. CLOUGH
Volume 31 Command and Control Information Systems Engineering: Progress and Prospects STEPHEN J. ANDRIOLE Perceptual Models for Automatic Speech Recognition Systems RENATO DEMORI, MATHEW J. PALAKAL, AND PIERO COSI Availability and Reliability Modeling for Computer Systems DAVID I. HEIMANN, NITIN MITTAL, AND KISHOR S. TRIVEDI Molecular Computing MICHAEL CONRAD Foundations of Information Science ANTHONY DEBONS
V o l u m e 32 Computer-Aided Logic Synthesis for VLSI Chips SABURO MUROGA Sensor-Driven Intelligent Robotics MOHAN M. TRIVEDI AND CHUXIN CHEN Multidatabase Systems: An Advanced Concept in Handling Distributed Data A. R. HURSON AND M. W. BRIGHT Models of the Mind and Machine: Information Flow and Control between Humans and Computers KENT L. NORMAN Computerized Voting RoY G. SALTMAN
Volume 33 Reusable Software Components BRUCE W. WEIDE, WILLIAM F. OGDEN, AND STUART H. ZWEBEN Object-Oriented Modeling and Discrete-Event Simulation BERNARD P. ZIEGLER Human-Factors Issues in Dialog Design THIAGARAJAN PALANIVELAND MARTIN HELANDER Neurocomputing Formalisms for Computational Learning and Machine Intelligence S. GULATI, J. BARHEN, AND S. S. IYENGAR
CONTENTS OF VOLUMES IN THIS SERIES
405
Visualization in Scientific Computing THOMAS A. DEFANTI AND MAXINE D. BROWN
V o l u m e 34 An Assessment and Analysis of Software Reuse TED J. BIGGERSTAFF Multisensory Computer Vision N. NANDHAKUMAR AND J. K. AGGARWAL Parallel Computer Architectures RALPH DUNCAN Content-Addressable and Associative Memory LAWRENCE CHISVIN AND R. JAMES DUCKWORTH Image Database Management WILLIAM I. GROSKY AND RAJIV MEHROTRA Paradigmatic Influences on Information Systems Development Methodologies: Evolution and Conceptual Advances RUDY HIRSCHHEIM AND HEINZ K. KLEIN
Volume 35 Conceptual and Logical Design of Relational Databases S. B. NAVATHE AND G. PERNUL Computational Approaches for Tactile Information Processing and Analysis HRISHIKESH P. GADAGKAR AND MOHAN M. TRIVEDI Object-Oriented System Development Methods ALAN R. HEVNER Reverse Engineering JAMES H. CROSS II, ELLIOT J. CHIKOFSKY, AND CHARLES H. MAY, JR. Multiprocessing CHARLES J. FLECKENSTEIN, D. H. GILL, DAVID HEMMENDINGER, C. L. MCCREARY, JOHN D. MCGREGOR, ROY P. PARGAS, ARTHUR M. RIEHL, AND VIRGIL WALLENTINE The Landscape of International Computing EDWARD M. ROCHE, SEYMOUR E. GOODMAN, AND HSINCHUN CHEN
Volume 36 Zero Defect Software: Cleanroom Engineering HARLAN D. MILLS Role of Verification in the Software Specification Process MARVIN V. ZELKOWITZ Computer Applications in Music Composition and Research GARY E. WITTLICH, ERIC J. ISAACSON, AND JEFFREY E. HASS Artificial Neural Networks in Control Applications V. VEMURI Developments in Uncertainty-Based Information GEORGE J. KLIR Human Factors in H u m a n - C o m p u t e r System Design MARY CAROL DAY AND SUSAN J. BOYCE
406
CONTENTS OF VOLUMES IN THIS SERIES
Volume 37 Approaches to Automatic Programming CHARLES RICH AND RICHARD C. WATERS Digital Signal Processing STEPHEN A. DYER AND BRIAN K. HARMS Neural Networks for Pattern Recognition S. C. KOTHARI and HEEKUCK OH Experiments in Computational Heuristics and Their Lessons for Software and Knowledge Engineering JURG NIEVERGELT High-Level Synthesis of Digital Circuits GIOVANNI DE MICHELI Issues in Dataflow Computing BEN LEE AND A. R. HURSON A Sociological History of the Neural Network Controversy MIKEL OLAZARAN
V o l u m e 38 Database Security G~NTHER PERNUL Functional Representation and Causal Processes B. CHANDRASEKARAN COMPUTER-BASED MEDICAL SYSTEMS John M. Long Algorithm-Specific Parallel Processing with Linear Processor Arrays JOSE A. B. FORTES, BENJAMIN W. WAH, WEIJA SHANG, AND KUMAR N. GANAPATHY Information as a Commodity: Assessment of Market Value ABBE MOWSHOWITZ
Volume 39 Maintenance and Evolution of Software Products ANNELIESE VON MAYRHAUSER Software Measurement: A Decision-Process Approach WARREN HARRISON Active Databases: Concepts and Design Support THOMAS A. MUECK Operating Systems Enhancements for Distributed Shared Memory VIRGINIA LO The Social Design of Worklife with Computers and Networks: A Natural Systems Perspective ROB KLING AND TOM JEWETT
Volume 40 Program Understanding: Models and Experiments A. VON MAYRHAUSER AND A. M. VANS Software Prototyping ALAN M. DAVIS
CONTENTS OF VOLUMES IN THIS SERIES
407
Rapid Prototyping of Microelectronic Systems APOSTOLOS DOLLAS AND J. D. STERLING BABCOCK Cache Coherence in Multiprocessors: A Survey MAZIN S. YOUSIF, M. J. THAZHUTHAVEETIL, and C. R. DAS The Adequacy of Office Models CHANDRA S. AMARAVADI, JOEY F. GEORGE, OLIVIA R. LIU SHENG, AND JAY F. NUNAMAKER
V o l u m e 41 Directions in Software Process Research H. DIETER ROMBACH AND MARTIN VERLAGE The Experience Factory and Its Relationship to Other Quality Approaches VICTOR R. BASILI CASE Adoption: A Process, Not an Event JOCK A. RADER On the Necessary Conditions for the Composition of Integrated Software Engineering Environments DAVID J. CARNEY AND ALAN W. BROWN Software Quality, Software Process, and Software Testing DICK HAMLET Advances in Benchmarking Techniques: New Standards and Quantitative Metrics THOMAS CONTE AND WEN-MEI W. H w u An Evolutionary Path for Transaction Processing Systems CARLTON PU, AVRAHAM LEFF, AND SHU-WEI, F. CHEN
V o l u m e 42 Nonfunctional Requirements of Real-Time Systems TEREZA G. KIRNER AND ALAN M. DAVIS A Review of Software Inspections ADAM PORTER, HARVEY SIY, AND LAWRENCE VOTTA Advances in Software Reliability Engineering JOHN D. MUSA AND WILLA EHRLICH Network Interconnection and Protocol Conversion MING T. LIu A Universal Model of Legged Locomotion Gaits S. T. VENKATARAMAN
Volume 43 Program Slicing DAVID W. BINKLEY AND KEITH BRIAN GALLAGHER Language Features for the Interconnection of Software Components RENATE MOTSCHNIG-PITRIK AND ROLAND T. MITTERMEIR Using Model Checking to Analyze Requirements and Designs JOANNE ATLEE, MARSHA CHECHIK, AND JOHN GANNON Information Technology and Productivity: A Review of the Literature ERIK BRYNJOLFSSON AND SHINKYU YANG The Complexity of Problems WILLIAM GASARCH
408
CONTENTS OF VOLUMES IN THIS SERIES
3-D Computer Vision Using Structured Light: Design, Calibration, and Implementation Issues FRED W. DEPIERO AND MOHAN M. TRIVEDI
V o l u m e 44 Managing the Risks in Information Systems and Technology (IT) ROBERT N. CHARETTE Software Cost Estimation: A Review of Models, Process and Practice EIONA WALKERDEN AND ROSS JEFFERY Experimentation in Software Engineering SHARI LAWRENCE PFLEEGER Parallel Computer Construction Outside the United States RALPH DUNCAN Control of Information Distribution and Access RALF HAUSER Asynchronous Transfer Mode: An Engineering Network Standard for High Speed Communications RONALD J. VETTER Communication Complexity EYAL KUSHILEVITZ
V o l u m e 45 Control in Multi-threaded Information Systems PABLO A. STRAUB AND CARLOS A. HURTADO Parallelization of DOALL and DOACROSS L o o p s - - a Survey A. R. HURSON, JOFORD T. LIM, KRISHNA M. KAVl, AND BEN LEE Programming Irregular Applications: Runtime Support, Compilation and Tools JOEL SALTZ, GAGAN AGRAWAL, CHIALIN CHANG, RAJA DAS, GuY EDJLALI, PAUL HAVLAK, YUAN-SHIN HWANG, BONGKI MOON, RAV1 PONNUSAMY, SHAMIK SHARMA, ALAN SUSSMAN AND MUSTAFA UYSAL Optimization Via Evolutionary Processes SR1LATA RAMAN AND L. M. PATNAIK Software Reliability and Readiness Assessment Based on the Non-homogeneous Poisson Process AMRIT L. GOEL AND KUNE-ZANG YANG Computer-supported Cooperative Work and Groupware JONATHAN GRUDIN AND STEVEN E. POLTROCK Technology and Schools GLEN L. BULL
V o l u m e 46 Software Process Appraisal and Improvement: Models and Standards MARK C. PAULK A Software Process Engineering Framework JYRKI KONTIO Gaining Business Value from IT Investments PAMELA SIMMONS Reliability Measurement, Analysis, and Improvement for Large Software Systems JEFF TIAN
CONTENTS OF VOLUMES IN THIS SERIES
409
Role-based Access Control RAVl SANDHU Multithreaded Systems KRISHNA M. KAVI, BEN LEE AND AELI R. HURSON COORDINATION MODELS AND LANGUAGES GEORGE A. PAPADOPOULOS AND FARHAD ARBAB Multidisciplinary Problem Solving Environments for Computational Science ELIAS N. HOUSTIS, JOHN R. RICE AND NAREN RAMAKRISHNAN
V o l u m e 47 Natural Language Processing: A H u m a n - C o m p u t e r Interaction Perspective BILL MANARIS Cognitive Adaptive Computer Help (COACH): A Case Study EDWIN J. SEEKER Cellular Automata Models of Self-replicating Systems JAMES A. REGGIA, HUI-HSIEN CHOU, AND JASON D. LOHN Ultrasound Visualization THOMAS R. NELSON Patterns and System Development BRANDON GOLDFEDDER High Performance Digital Video Servers: Storage and Retrieval of Compressed Scalable Video SEUNGYUP PAEK AND SHIH-FU CHANG Software Acquisition: The Custom/Package and Insource/Outsource Dimensions PAUL NELSON, ABRAHAM SEIDMANN, AND WILLIAM RICHMOND
V o l u m e 48 Architectures and Patterns for Developing High-performance, Real-time ORB Endsystems DOUGLAS C. SCHMIDT, DAVID L. LEVINE AND CHRIS CLEELAND Heterogeneous Data Access in a Mobile Environment - Issues and Solutions J. B. LIM AND A. R. HURSON The World Wide Web HAL BERGHEL AND DOUGLAS BLANK Progress in Internet Security RANDALL J. ATKINSON AND J. ERIC KLINKER Digital Libraries: Social Issues and Technological Advances HSINCHUN CHEN AND ANDREA L. HOUSTON Architectures for Mobile Robot Control JULIO K. ROSENBLATT AND JAMES A. HENDLER
Volume 49 A survey of Current Paradigms in Machine Translation BONNIE J. DORR, PAMELA W. JORDAN AND JOHN W. BENOIT Formality in Specification and Modeling: Developments in Software Engineering Practice J. S. FITZGERALD 3-D Visualization of Software Structure MATHEW L. STAPLES AND JAMES M. BIEMAN
410
CONTENTS OF VOLUMES IN THIS SERIES
Using Domain Models for System Testing A. VON MAYRHAUSER AND R. MRAZ Exception-handling Design Patterns WILLIAM G. BAIL Managing Control Asynchrony on SIMD Machines--a Survey NAEL B. ABU-GHAZALEH AND PHILIP A. WILSEY A Taxonomy of Distributed Real-time Control Systems J. R. AGRE, L. P. CLARE AND S. SASTRY
Volume 50 Index Part I Subject Index, Volumes 1-49
Volume 51 Index Part II Author Index Cumulative list of Titles Table of Contents, Volumes 1-49
ISBN 0-12-012152-2
9
120
2
>