Advances
in COMPUTERS VOLUME 31
Contributors to This Volume STEPHEN J. ANDRIOLE MICHAEL CONRAD PIEROCOSI ANTHONY DEB...
36 downloads
1131 Views
19MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Advances
in COMPUTERS VOLUME 31
Contributors to This Volume STEPHEN J. ANDRIOLE MICHAEL CONRAD PIEROCOSI ANTHONY DEBONS RENATO DEMORI DAVID I. HEIMANN NITINMITTAL MATHEW J. PALAKAL KISHORS . TRIVEDI
Advances in
COMPUTERS E D I T E D BY
MARSHALL C. YOVITS Purdue School of Science Indiana University-Purdue Indianapolis, Indiana
University at Indianapolis
VOLUME 31
ACADEMIC PRESS, INC. Harcourt Brace Jovanovich, Publishers
Boston San Diego New York London Sydney Tokyo Toronto
THISBOOK IS PRINTED ON
ACID-FREE PAPER.
@
COPYRIGHT @ 1990 BY ACADEMICPRESS,INC.
ALL RIGHTS RESERVED. NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS, INC. 1250 Sixth Avenue, San Diego, CA 92101
United Kingdom Edition published by ACADEMIC PRESS LIMITED 24-28 Oval Road, London NWI 7DX
LIBRARY OF CONGRESS CATALOG CARDNUMBER: 59-15761 ISBN 0-12-012131-X PRINTED IN THE UNITED STATES OF AMERICA
90919293
9 8 7 6 5 4 3 2 1
Contents
CONTRIBUTORS. . . . . . . . . . . . . . . . PREFACE.. . . . . . . . . . . . . . . . .
. . . . . . . .
vii ix
Command and Control Information Systems Engineering: Progress and Prospects Stephen J. Andriole
1. Introduction . . . . . . . . . . . . . . . . . 2. The Information Systems Engineering Process. . . . . . 3. The Domain of Command and Control . . . . . . . . 4. Command and Control Information and Decision Systems Engineering . . . . . . . . . . . . . . . . . 5. Case Studies in the Design, Development, and Application of C2 Information and Decision Systems . . . . . . . . 6. Next Generation Command and Control Information Systems Engineering . . . . . . . . . . . . . . . . . 7. Summary and Conclusions . . . . . . . . . . . . Appendix A: Group (Army Theater Level) Tactical Planning Substantive and User-Computer Interface Tasks and Requirements. . . . . . . . . . . Appendix B: Storyboards from the Group Planning Prototype References . . . . . . . . . . . . . . . . . .
. . . . . .
2 6 32
.
.
39
.
.
50
. . . .
57 76
. .
77 89 95
. . . .
Perceptual Models for Automatic Speech Recognition Systems Renato DeMori, Mathew J. Palakal and Piero Cosl
1. Introduction . . . . . . . . . . . . . . . . . . . 2. Speech and Speech Knowledge . . . . . . . . . . . . . 3. A Multi-Layer Network Model for ASR Systems. . . . . . . 4. The Ear Model: An Approach Based on Speech Perception . . . 5. The Vocal Tract Model: An Approach Based on Speech Production 6. Conclusions . . . . . . . . . . , . , . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . , . . . . . .
V
100 101 127 129 150 167 169 169
vi
CONTENTS
Availability and Reliability Modeling for Computer Systems
.
.
David 1 Heimann. Nitln Mittai and Kishor S Trivedl
1 . Introduction . . . . . . . . . . . . . . . . . . . 176 2. Measures of Dependability . . . . . . . . . . . . . . 180 3. Types of Dependability Analyses . . . . . . . . . . . . 200 4. The Modeling of Dependability . . . . . . . . . . . . . 201 5 . A Full-System Example . . . . . . . . . . . . . . . 218 6. Conclusions . . . . . . . . . . . . . . . . . . . 229 Acknowledgments . . . . . . . . . . . . . . . . . 230 References . . . . . . . . . . . . . . . . . . . . 231
Molecular Computing Mlchael Conrad
1. Introduction . . . . . . . . . . . . . . . . . 2. Background . . . . . . . . . . . . . . . . . 3. Theory of Molecular Computing . . . . . . . . . . 4. The Macro-Micro (M-m) Scheme of Molecular Computing . 5. Modes of Molecular Computing . . . . . . . . . . 6. The Molecular Computer Factory . . . . . . . . . . 7. Molecular Computer Architectures . . . . . . . . . 8. Conclusions and Prospects . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
.
. . . . . . . . .
236 238 246 269 289 303 307 317 318 319
Foundatlons of Information Sclence Anthony Debons
Prologue . . . . . . . . . . . . 1. Introduction . . . . . . . . . . 2. Essences: The Nature of Information . . 3. Structure: The Science of Information . 4. Synthesis: On a Theory of Foundations . 5. Overview . . . . . . . . . . . . Acknowledgments . . . . . . . . References . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . . . . . . . . . .
. . . . .
325 326 327 338 363 369 370 371
AUTHOR INDEX.........................................
379
INDEX ......................................... SUBJECT
387
CONTENTS OF PREVIOUS VOLUMES.........................
397
Contributors Numbers in parentheses refer to the pages on which the authors’ contributions begin.
Stephen J. Andriole (l), Department of Information Systems and Systems Engineering, School of Information Technology and Engineering, George Mason University, 4400 University Drive, Fairfax, Virginia 22030-4444 Michael Conrad (235), Department of Computer Science, Wayne State University, Detroit, Michigan 48202 Piero Cosi (99), Centro di Studio per le Richerche di Fonetica, C N R , Via G. Oberdan. 10.35122 Padova, Italy Anthony Debons (325), Institute for the Development of Expert Application Systems, Robert Morris College, Narrows Run Road, Corapolis, Pennsylvania, 15108-1189; Department of Information Science, University of Pittsburgh, Pittsburgh, Pennsylvania 15260 Renato DeMori (99), McGill University, School of Computer Science, 805 Sherbrook Street West, Montreal, Quebec, Canada H3A 2K6 David I . Heimann (175), Digital Equipment Corporation, 6 Tech Drive, Andover, Massachusetts 01810 Nitin Mittal (1 75), Digital Equipment Corporation, 6 Tech Drive, Andover, Massachusetts 01810 Mathew J. Palakal (99), Purdue University School of Science at Indianapolis, Department of Computer Science, 1201 East 38th Street AD13.5, Indianapolis, Indiana 46205-2868 Kishor S. Trivedi (1 7 9 , Computer Science Department, Duke University, Durham, North Carolina 27706
vii
This Page Intentionally Left Blank
Preface
The serial Advances in Computers provides a medium for the in-depth presentation of subjects of both current and long-range interest to the computer and information community. Within this framework, contributions for appropriate articles have been solicited from widely recognized experts in their fields. The time scale of the invitation is such that it permits a relatively leisurely perspective. Furthermore, the permitted length of the contributions is greater than many other publications. Thus, topics are treated both in depth and breadth. The serial began in 1960 and now continues with Volume 31. These books have played an important role over the years in the development of the computer and information fields. As these fields have continued to expand both in research and resulting applications as well as in their significance -so does the importance of the Advances series. As a consequence, it was decided that Academic Press would again this year publish two volumes, 30 and 31. Volume 30 was published earlier this year. Included in Volume 31 are chapters on command and control information systems, automatic speech recognition, reliability modeling of computer systems, molecular computing, and the foundations of information science. In the first chapter, Professor Andriole presents a multidisciplinary information systems design and development methodology that assumes more complex design challenges than we have faced in the past. The emphasis is on the process by which complex analytical problem-solving requirements are converted into computer-based systems. The author emphasizes the application of the information systems engineering process to command and control information and decision systems. He points out that, without structure, the design and development process will almost always fail. DeMori, Palakal, and Cosi state in their chapter that speaker-independent automatic speech recognition by computers of large or difficult vocabularies is still an unsolved problem, especially if words are pronounced connectedly. Since the early 1970s, there has been substantial progress toward the goal of constructing machines capable of recognizing and/or understanding human speech. One of the key improvements has been the development and application of mathematical methods that permit modeling the speech signal as a complex code with several, coexisting levels of structure. The authors present several past approaches and some current trends in automatic speech recognition research: using models based on speech production, and using models based on speech perception. In the third chapter, Drs. Heimann, Mittal and Trivedi address computer system dependability analysis, which ties together concepts such as reliability, maintainability and availability. It serves, along with cost and performance, as a major system selection criterion. Three classes of dependability measures are ix
X
PREFACE
described: system availability, system reliability, and task completion. The authors point out that the concept of system dependability is being considered with increasing interest as a component of computer system effectiveness and as a criterion used by customers for product selection decisions. They conclude that which of the measures is appropriate depends on the specific application under investigation, the availability of relevant data, and the usage or customer profile. Professor Conrad writes in the fourth chapter about molecular computers that are information processing systems in which individual molecules play a critical role. Natural biological systems fit this definition. Artificial information processing systems fabricated from molecular materials might emulate biology or follow new architectural principles. In either case they would qualify as molecular computers. The term may also apply to simulations of biomolecular systems or to virtual molecular computers implemented in conventional silicon machines. Conrad shows that molecular computing is both a science and a technology. These two factors are highly synergistic. The attempt to synthesize biomimetic or new molecular computing devices is an outgrowth of fundamental research in molecular and cellular biophysics, condensed-matter physics, polymer chemistry, neurophysiology, and computer science. It is likely to lead to new insights into mechanisms and materials that impact these areas as well. In the final chapter, Professor Debons considers the foundations of information science. He states that a perennial question posed by individuals both inside and outside the field of information concerns its nature: What is it? What are its essences, its structures, its boundaries? The study of information can be traced to antiquity, to philosophers and scholars concerned with the nature of knowledge. Contemporary information science arose from the scientific renaissance of the present century spurred by the launching of Sputnik. Advances in electronics, referred to as the “communication revolution,’’ increased the ability to transmit data for processing quickly and for greater distances. Debons considers the nature of the term “information”; he deals with information as a discipline and then synthesizes the various aspects of the science. It is my great pleasure to thank the contributors to this volume. They have given extensively of their time and effort to make this book an important and timely contribution to their profession. Despite the many calls upon their time, they recognized the necessity of writing substantial review and tutorial articles. It has required considerable effort on their part, and their cooperation and assistance is greatly appreciated. Because of their efforts, this volume achieves a high level of excellence and should be of great value for many years to come. It has been a pleasant and rewarding experience for me to edit this volume and to work with these authors. Marshall C. Yovits
Command and Control Information Systems Engineering: Progress and Prospects STEPHEN J . ANDRIOLE Department of Information Systems & Systems Engineering School of Information Technology & Engineering George Mason University Fairfax. Virginia
1. Introduction
2.
3.
4.
5.
6.
7.
. . . . . . . . . . . . . . . . . . . . . . . .
1 . 1 Chapter Overview . . . . . . . . . . . . . . . . . . . . 1.2 Information Systems Engineering Overview . . . . . . . . . . . The Information Systems Engineering Process . . . . . . . . . . . . 2.1 Systems Design in Perspective . . . . . . . . . . . . . . . . 2.2 Conventional Design Methods and Models . . . . . . . . . . . 2.3 The Prototyping Alternative. . . . . . . . . . . . . . . . . 2.4 Requirements Analysis Methods . . . . . . . . . . . . . . . 2.5 Task Requirements Analysis Methods . . . . . . . . . . . . . 2.6 User Profiling Methods . . . . . . . . . . . . . . . . . . 2.7 Organizational/Doctrinal Profiling Methods . . . . . . . . . . 2.8 TheTask/User/Organizational-DoctrinalMatrix . . . . . . . . . 2.9 Some Prototyping Methods . . . . . . . . . . . . . . . . The Domain of Command and Control . . . . . . . . . . . . . . 3.1 The Command and Control Process . . . . . . . . . . . . . 3.2 Command and Control Information and Decision System Requirements . Command and Control Information and Decision Systems Engineering . . . 4.1 C2 Information and Decision Systems Requirements Analysis . . . . . 4.2 C z System Modeling and Prototying . . . . . . . . . . . . . 4.3 Analytical Methods for C z Information and Decision Systems Engineering 4.4 C2 Systems Evaluation . . . . . . . . . . . . . . . . . . Case Studies in the Design, Development, and Application of C2 Information and Decision Systems . . . . . . . . . . . . . . . . . . . . 5.1 The Range of Applications . . . . . . . . . . . . . . . . . 5.2 The Group Planning Prototype . . . . . . . . . . . . . . . Next Generation Command and Control Information Systems Engineering . . 6.1 Emerging Issues and Challenges . . . . . . . . . . . . . . . 6.2 The Range of C 2 Information and Decision Support . . . . . . . . 6.3 Advanced Information Technologies . . . . . . . . . . . . . . 6.4 Integrated C2 Information and Decision Support . . . . . . . . . Summary and Conclusions . . . . . . . . . . . . . . . . . . .
. .
. . . . .
. .
.
. . .
.
. . .
.
. . .
. .
2 2 2 6 7 8 9 10 12 15 20 21 23 32 32 35 39 39 40 42 47 50 50 52 57 51 57 58 74 76
Sections 2.4 through 2.9 draw upon S.J. Andriole, Handbook of Decision Support Systems. published by Petrocelli Books. Inc., Princeton. New Jersey. 1 ADVANCES IN COMPUTERS. VOL. 31
Copyright (0 1990 by Academic Press Inc. All rights of reproduction in any form reserved ISBN 0-12-01213 I - X
2
STEPHEN J. ANDRIOLE
Appendix A: Group (Army Theater Level) Tactical Planning Substantive and User-Computer Interface Tasks and Requirements . -.- . . . . Appendix B: Storyboards from the Group Planning Prototype . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . .
77 89 95
1. Introduction 1.1 Chapter Overview
This chapter attempts several things. It first presents a multidisciplinary information systems design and development methodology that assumes more complex design challenges than we have faced in the past. The emphasis is on the process by which complex analytical problem-solving requirements are converted into computer-based systems. The chapter then turns to the application of the information systems engineering (ISE) process to command and control information and decision systems. Military command and control (as well as its civilian conterpart) presents special problems to the modern systems architect. Users are integral parts of nearly every system, the stakes are high, and the margin for error is small. User and operators come in many shapes and sizes, and robust analytical methods-especially those that deal well with uncertainty and stress-are almost always necessary. This chapter also presents some command and control ISE case studies intended to illustrate the most salient features of the ISE process. It ends with a look at future command and control information and decision systems engineering. 1.2
Information Systems Engineering Overview
Information systems engineering refers to the process by which information systems are designed, developed, tested and maintained. The technical origins of ISE can be traced to conventional information systems design and development and the field of systems engineering. ISE is by nature structured, iterative, multidisciplinary, and applied. The ISE process involves structured requirements analyses, functional modeling, prototyping, software engineering, and system testing, documentation and maintenance. Modern information systems solve a variety of data, information, and knowledge-based problems. Ten years ago most information systems were exclusively data-oriented; their primary purpose was to permit users to store, retrieve, manipulate, and display data. Application domains included inventory control, banking, personnel recordkeeping, and the like. The airline
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
3
reservation system is representative of the information systems of the 1970s. More recently, expectations about the capabilities of information systems have risen considerably. It is today quite routine to find information systems that provide analytical support to users. Some of these systems help users allocate resources, evaluate personnel, plan, and simulate large events and processes. Systems engineering is a field of inquiry unto itself (Eisner, 1988). There are principles of applied systems engineering and a growing literature that defines a field representing a synthesis of systems analysis, engineering, and economics. Systems engineering involves all the activities that extend over the entire life cycle of a system, including requirements definitions, functional designs, development, testing and evaluation. According to Andrew P. Sage, a prominent systems engineer and contributor to the field, the systems engineer’s perspective is different from that of the product engineer, software designer, or technology developer; while the product engineer deals with detail, the systems engineer takes a “top down” viewpoint. Where the product engineer deals with internals, the systems engineer deals more extensively with the external view of the system., including the system’s interfaces to other systems and its human users, repairers, and managers. Systems engineering is based upon the quantitative skills of the traditional engineer combined with additional quantitative and qualitative skills derived from applied mathematics, psychology, management and other disciplines that support knowledge organization and design. The systems engineering process is a logical sequence of activities and decisions that transform operational needs into a description of system performance parameters and an optimal system configuration (Sage, 1985). The information systems engineering process represents the marriage between the tools, techniques and application domains of information systems and the generic systems engineering process. Figure 1 presents a blueprint for the design, development and testing of information systems. The blueprint calls for the identification of user requirements, the modeling of the requirements, the design, development and testing of working prototypes, the specification of software (programming) requirements, system testing and documentation, and the development of a system maintenance plan. Figure 1 also suggests that there are a variety of methods available to the information systems engineer and several products that ideally emerge from the steps in the ISE process. The sections that follow present more details about the generic ISE process as well as insight into how the process can be applied in the domain of military command and control.
r
R
‘remts
t
METHODS OPTIONS
PRODUCTS
P
I
=
Evolutionary
-
Prototype System stzing
* *bm
cwtpt i
4
-
multi-Attrikttc Utility Cost-Btmfi t -Hybrids...
User-Computcr interaction
bstalKnowledqc Base Specification Analytical Methods Selection SortwEngineering
Specify Software
Data Flow-Onmted
Data Structure-Oriented Object-Oriented... Processor Options Input Device Options Output Device Optiom...
Requirements
+
-
'Identify k d w # r / Software confiauration -I ~
-
Benefit Models Cost-Benefit Models Ootimizsttonr\odets...
-.
I
rdge Structwes
+
Functional DcSU'lptiOtl LherPbrusl
Doamtnt System
+ i -
Training Manual Multi-Attribute Utility Cost-Benefit
Test
Maintain System
FIG. 1. The generic information systems engineering process.
Task Schedules Evaluations
6
STEPHEN J. ANDRIOLE
2.
The Information Systems Englneerlng Process
The overview of the generic information systems engineering process in Section 1 is intended to communicate the many dimensions of the design and development process as well as the sense that the whole process is in fact greater than the sum of its parts. Information systems engineering is a multidisciplinary endeavor anchored very much in the systems approach to problem-solving. But because of the nature of many information systems engineering challenges, there are a number of “watchwords” almost always associated with the design and development of complex systems. Some of these include “multidisciplinary,”“iterative,” and “synthetic.” It is difficult-if not impossible-to design, develop, test, evaluate, or field information systems without insight from the behavioral, mathematical, computer, engineering, and managerial sciences. It is impossible, for example, to design and implement effective user interfaces without insight into the empirical findings from human factors and cognitive science. Similarly,it is impossible to select the right analytical method without an appreciation for the methods that cross-cut the above listed sciences and disciplines. It is also impossible to capture complex analytical requirements after but one attempt. The whole process must be iterative; it always relies heavily upon the synthesis of disparate data, knowledge, experience, and technology. Over the years systems designers have discoveredjust how difficultit can be to capture user requirements. A variety of tools and techniques have been developed to assist systems analysts, but they have often proven inadequateespecially when iequirements are complex and analytical. By and large, systems design and development “life cycle” models fail to recognize the inherent requirements dilemma. Consequently, systems analysts developed a new design perspective, one that assumes that requirements cannot be captured the first time through and that several iterations may be necessary to define requirements accurately. The new perspective is anchored in the value of prototyping. Prototyping “informs” the design process by leveraging increasingly specific and verifiable information in the requirements analysis process. The objective of prototyping is to demonstrate a system concept before expensiveprogramming begins. Successfulprototyping can be traced to iterative requirements analyses, user involvement, and the use of one of several tools for converting requirements hypotheses into a tangible system concept. Prototypers usually build one of two kinds of demonstration systems: “throwaway” and “evolutionary.”Throwaway prototypes are developed when requirements are especially difficult to capture, which may be due to inarticulate users, a complex problem area, or some combination of the two. As the label suggests,they are literally thrown away after each iteration-until
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
7
one accurately represents requirements. This “final” prototype may then evolve into an evolutionary one, which can be incrementally enhanced over time. The information systems engineering process described here assumes that when requirements are complex and analytical, prototyping will be necessary. The discussion of the larger process that follows is anchored in this assumption. Note also that command and control information systems engineering-discussed in some later sections of this chapter-is nearly always complex and analytical. The assumption thus holds across domains. 2.1
Systems Design in Perspective
Not so many years ago computers were used mostly by scientists and engineers. As the field matured, computing was distributed to a larger set of professionals, including accountants, budgeteers, and managers. The personal computer altered forever the way we think about computing. Initially the appeal of desktop power was mitigated by cost, but as soon as personal computers became affordable, the revolution in personal computing began. Years ago computers were used to perform calculations that were prohibitively expensive via any other means. Early interactive systems were barely so, and engineers had to hack at them until they behaved. When generalpurpose mainframes emerged, large organizations with huge databases expressed the most interest. It is safe to say that most early applications of general-purpose mainframe computers were database oriented. Today there are interactive “decision support systems” that profess to augment the decision-making power of human information processors. There are systems that help users generate options, evaluate options, and interpret the feedback received after they are implemented. There are systems that help users plan, create scenarios, and diagnose diseases. Figure 2 suggests where database-oriented and analytical computing begin and end (Andriole, 1989a). The differences are clear. Analytical problemsolving assumes some degree of cognitive information processing. While all cognitive processing is anchored in “data” and “knowledge” that must be stored and manipulated, there are unique properties of cognitive information processing that call for unique requirements definitions. The difference between the collection and interpretation of diagnostic data illustrates database-oriented versus analytical problem-solving (and, by implication, database-oriented versus analytical computing). As computers become cheaper, smaller and faster, and as expectations about how they can be used rise, more and more instances of “analytical computing” will become necessary and-eventually-commonplace. The
8
STEPHEN J. ANDRIOLE
MA-ORIFNTE D COMPUTING unicatlve
PhysW
COMPUTlNG
Tasks
Instruct Inform 0 Request Query
Search Identify 0 Classify Categorize
0
Medfatlonal
PerceDtual
Tasks
Tasks
File Store Retrieve Sample
ANALYTlCAL
Tasks
Plan Evaluate 0 Prioritize Decide 0
ANALYTICAL COMPLEXITY CONTINUUM FIG.2. Data versus analytical computing.
leverage lies in our ability to identify, define and validate complex requirements. Hence, there is a need for prototyping within the larger structure of multidisciplinary information systems engineering.
2.2
Conventional Design Methods and Models
There are a variety of “conventional” systems design methods available to the software systems analyst and engineer. Dee (1984), Hice et al. (1978), Andriole (1983), Pressman (1987), Leslie (1986), Royce (1970), and Horowitz (1975), among many, many others, all propose some variation of the conventional software systems design and development process anchored in the “waterfall method first introduced by Royce (1970) and Boehm (1976). All of them share some characteristics, such as a sequential nature, a single stage for identifying, defining, and validating user requirements, and an orientation that seduces the designer into treating the process as manageable. What is the problem here? First and foremost is the lack of emphasis upon user requirements. Years ago it was assumed that requirements were easily defined. Since early computing requirements were often database intensive, the assumption was initially valid. But as the need to fulfill analytical requirements grew, conventional life cycle models failed to keep pace. It is possible to conclude that conventional systems design models and methods ignore user requirements-and approaches to their modeling and verification-in favor of emphases that stress the importance of software engineering: program design and structure, coding, testing, and debugging, and the like.
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
9
This conclusion is supported by the vagueness with which conventional design methodologists treat the whole concept of user requirements. Requirements cannot be defined by simply asking users what they do or by watching them for a while. Worse yet are requirements methods that rely upon “handbooks,” manuals, or other written materials to define and refine user needs. They are worse because they disconnect the systems analyst from the user and presume that requirements can be identified in a vacuum. In a sense the whole issue of “conventional” versus “prototyping” methods and models is a strawman unworthy of serious dispute. Why? Because-as always-the problems the prospective system are intended to solve should determine life cycle assumptions. Designers that begin a priori with a method will often fail, if only because they may end up matching the wrong life cycle assumptions with the wrong problems. Analytical problem-solving requirements cannot be captured via conventional systems design methods or models. Iteration is always the watchword in such cases. On the other hand, problems with an absence of analytical requirements might well be modeled via conventional models. 2.3 The Prototyping Alternative Modern systems design and development directs us to look first at the requirements that the system is intended to satisfy. It then suggests that some kind of modeling of the mock system can redefine our requirements which, in turn, will permit us to develop yet another functional description of the system, and so forth until the model accurately represents and satisfies requirements. Then-and only then-should we turn to software design and engineering, and these steps should in turn determine our hardware configuration. Bringing up the rear are “packaging” tasks, such as the preparation of users’ manuals, and “technology transfer” tasks, such as the introduction of the system into the target environment. There are debates over the way the first iteration of the system itself should be developed. Some hold that a thorough requirements analysis will assure the development of a responsive system, while others feel just as certain about the wisdom of some kind of “prototyping” strategy. Applications prototyping (Boar, 1984), the strategy that assumes that several iterations of an interactive system are necessary and desirable, has become very popular over the past few years. Among other advantages, prototyping supports modular software engineering, permits user participation in the design process, and protects project resources from the jaws of tunnel programming. Most importantly, the applications prototyping strategy permits analysts to keep the requirements analysis process alive during the critical conversion process.
10
STEPHEN J. ANDRIOLE
Prototyping assumes that the first version of the interactive system will be rejected and modified. It assumes that users and designers will have some difficulty identifying and defining critical system functions, and that a limited amount of money should be spent on each prototype until a durable system definition emerges. The money saved should be plowed back into requirements definition, tasks/methods matching, and modeling. Prototyping is as much a state of mind as it is a structured design methodology. Contrary to popular belief, prototyping is highly structured and extremely methodical in application. While some design theorists suggest that prototyping is “loose” and haphazard, successful prototyping requires adherence to a set of reasonably specific principles. The primary assumption that prototypers make is that interactive systems cannot be developed easily, quickly, or without input from prospective users. They assume that the system will have to be developed over and over again, but unlike conventional system developers, prototypers plan for iteration. 2.4
Requirements Analysis Methods
There is no more important yet more neglected step in systems design than requirements analysis. As Meister (1976) and a whole host of others have pointed out, without a clear set of requirements the system will satisfy the needs of the designers and not the intended users. Boar (1984) reports that 2040% of all system problems can be traced to problems in the design process, while 6 0 4 0 % can be traced to inaccurate requirements definitions. The message is clear: Know thy user. The prototyping strategy assumes that requirements cannot all be prespecified, that inherent communications gaps exist among design participants, and that “extensive iteration is necessary, inevitable, desirable, and to be encouraged” (Boar, 1984). The strategy also assumes that requirements do not stop once the tasks the system is supposed to support have been identified and defined. All good requirements definitions consist of user, task, and organizationalldoctrinal requirements. In fact, the best possible requirements definition is a matrix linking all three (user, task, and organizational/doctrinal) dimensions together, as suggested in Fig. 3. Requirements analysis also assumes feasibility. If one were to discover after a significant requirements investment that no one could define requirements, or that the ones that were defined were impossible to satisfy via computerization, or that in order to satisfy the requirements one had to spend ten times what any reasonable person would suggest the system should cost, then the problem can be said to have failed the feasibility set. Feasibility
Filing Retr tev ing w i n g Col lati ng Sortlng
/
form Filling Document Ckecking
/
Telephoning
0ictat ing
/
Conferring Meettng
Data Analysis
Calculation
Planning Decision-Making NAIVE
I
MANAGERIAL
USERS
FIG.3. User/task/organizational requirements matrix.
12
STEPHEN J. ANDRIOLE
assessment is thus one outcome of requirements analysis; the others include task, user, and organizational/doctrinal profiles, and the integrated tasks/ users/organizational doctrinal matrix.
2.5 Task Requirements Analysis Methods Task profiling consists of qualitative and-if possible-quantitative descriptions of the tasks that the system is intended to solve, automate, quasiautomate, or ignore. Task profiles are important because the selection of the right analytical method depends upon how well the tasks have been defined. The tasks themselves should be arranged hierarchically all the way down to the lowest, diagnostic sub-task. While this is not to imply that each and every task and sub-task be elaborately defined, it is to suggest that the task requirements process be highly structured. There are a variety of ways to structure task analyses. It is important to begin with some sense of how tasks differ generally. Over the years the psychological research community has developed a number of “generic” taxonomies that can be used as organizing frameworks for the subsequent development of problem-specific task taxonomies. Fleischman, et al. (1984) present perhaps the most comprehensive review of this literature. They cite several approaches to task classification (and the development of task taxonomies) worth noting: 0
0 0 0
behavior description approaches; behavior requirements approaches; ability requirements approaches; and task characteristics approaches.
Behavior description approaches include those that identify “categories of tasks.. .based upon observations and descriptions of what operators actually do while performing a task.” Behavior requirements approaches emphasize the “cataloguing of behaviors that should be emitted or which are assumed to be required in order to achieve criterion levels of performance.” Ability requirements approaches assume that “tasks are to be described, contrasted, and compared in terms of abilities that a given task requires of the individual performer” or operator; while task characteristics approaches are “predicated upon a definition that treats the task as a set of conditions that elicits performance” (Fleischman, et al.. 1984). With the exception of task characteristics approaches, most approaches try to identify important processes,functions, behaviors, or performance. The ideal task analysis would permit the systems designer to differentiate among the tasks and rank them according to their problem-solving importance.
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
13
It is important to note that the use of task taxonomies to profile user tasks occurs before, during, and after the task profiling process. Task taxonomies are used initially as organizing frameworks; they are used during the process as substantive and procedural compasses; and they emerge redefined as problemspecific taxonomies after the process. This last use is key: the purpose of applying an existing generic taxonomy (or developing a whole new one) is to help accelerate the development of the required problem-specific taxonomy. As soon as one begins the task requirements analysis one should reach for a generic task taxonomy to guide it. There are a number of methods that have solid track records for developing task profiles. In fact, they all have weaknesses, suggesting that the best approach to task profiling is eclectic, interdisciplinary, and-as alwaysiterative. Our experience with requirements analysis suggests that a single method never really captures the essence of the tasks we are trying to computerize. Our successful task profiles were the result of having applied at least two task analysis methods. The task requirements analysis methods discussed in this section fall into three broad categories (Ramsey and Atwood, 1979): 0 0 0
questionnaire and survey methods; interviews and field observation methods; and simulation and gaming methods.
As suggested, there are at least three ways to identify and define tasks. The first involves asking users what they do and how they do what they do in questionnaires and surveys. The second involves asking them in person (in a variety of different settings), while the third suggests that the best way to profile tasks is through a simulation or gaming exercise. Inherent in all of these methods is the belief that-given enough time and money-tasks can always be identified and defined. Nothing is farther from the truth. There are many tasks that defy precise description; it is also naive to assume that all users are articulate. Hence there is a need for the iterative prototyping strategy, which assumes that users are often unable to define their tasks and that some tasks are much more resistant to definition than others. There are at least five ways to profile requirements via questionnaires and surveys (Ramsey and Atwood, 1979): 0 0 0
0 0
importance ratings questionnaires; time estimate questionnaires; repertory grid techniques; Delphi techniques; and policy capture techniques.
14
STEPHEN J. ANDRIOLE
The key to the successful use of questionnaire and survey methods lies in one’s ability to select users with unusually good diagnostic understandings of what they do; users unable to introspect may only feed back perceptions of what they think they do, not accurate descriptions of what they actually do. There are also obvious situations where questionnaires and/or surveys would be inappropriate. If a system is intended to serve a small but elite group of military analysts, then it is unlikely that any real insight could be gained from the results of a questionnaire (which would probably be ignored anyway). On the other hand, if the system is intended for use throughout the military or throughout a particular subset of the military (for example, throughout the strategic intelligence community), and the user population is geographically dispersed, then mailed questionnaires may be the only way to go. Interview and field observation methods include the following (Ramsey and Atwood, 1979): 0
0 0 0
unstructured and structured interviews; ad hoc working-group-based methods; critical incident techniques; and formal job analysis techniques.
If the truth be told, the overwhelming majority of task requirements analyses consist of unstructured interviews and possibly an ad hoc working group session or two. A series of questions are usually posed to one or more interviewees who tend to perceive requirements as anecdotes of their (usually limited) experiences. While these anecdotes are useful, they too often take the place of a structured requirements database. The participatory approach to interactive systems design should be stressed again here. A few hours of a user’s time is really quite worthless. If the system is to be responsive to real needs, then a users’ strike force must be established. As suggested earlier, users should be made members of the design team and given important design responsibilities. It is also important to note that techniques such as formal job analysis are best suited for defining non-cognitive tasks, and that other methods, such as structured interviews, working groups and protocol analyses, are more likely to yield useful cognitive task definitions. The application of simulation and gaming methods essentially calls for a scenario, some experts, and some techniques for documenting what happens when the experts are asked to address the scenario. Ramsey and Atwood (1979) and others (Andriole, 1983; Carlisle, 1973) suggest at least three kinds
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
15
of simulations and games: 0 0 0
paper simulations; protocol analysis; and interactive simulation or “gaming.”
Nearly all of the above requirements analysis methods can trace their origins to disciplines other than computer science or information systems. In fact, most of them can be traced to psychology and management science. This alone attests to the interdisciplinary aspect of requirements analysis, and the need to involve specialists from the behavioral, computer and management sciences in the requirements definition and validation process. There are also aspects of the process that defy precise structuring. It is important to remember that requirements analysis is as much an art as it is a science. There is a “feel” to requirements analysis that comes after an analyst acquires a great deal of experience. Good requirements analysts also tend to learn a great deal about the target applications area; some of them become almost expert at the very tasks they are trying to define. The use of generic task taxonomies is intended to guide the requirements analysis process, but the tasks in the taxonomies are not intended to replace those identified during the actual collection phase of the process. Figure 4 suggests how the taxonomies can be used to (a) guide the initial process, and then (b) yield specific tasks and sub-tasks in a resource allocation scenario. Alternative requirements analysis methods are presented in the figure as intervening and iterative variables. 2.6
User Profiling Methods
Who will be using the system? Will the system be used by those relatively unsophisticated in the use of interactive systems, or is the user group experienced? These and similar questions are related to user profiling, the second critical dimension of the requirements definition. Users come in many shapes and sizes. There are a number of ways to classify them as well. They can be classified by job function, by their level of experience with interactive computing, by their role in a larger problem-solving process, or by some combination of these and additional criteria. Ramsey and Atwood (1979) mix some criteria to produce at least three classes of users: 0 0 0
naive; managerial; and scientific-technical.
GENERIC TASK TAXONOHIES
Behavior Description
REQUlREMENTS ANALYSIS METHODS
Questionnaire & Survey Requirements Interview & Observation
Requirements
u
\ 1 Simulation & Oaming wethods
SPEC1FIC TASKS
RESOURCE ALLOCATION: 0 Gather Project Data 0 Prioritize Projects (By Benefit) 0 Prioritize By Cost 0 Conduct CostBenefit Analyses Rank-Order Project 'Investments" 0 vary costs & Benefits ( "What i f..." )...
Character istl
t I
FIG.4. The task requirements analysis process.
USER/
+ ORBANIZATIONAL
PRorEs
REQUIREMENTS MODELINW PROTOTYPINW STORYB M R DING
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
17
This classification of users according to their experience and job function can tell us a great deal about how an information or decision system should be designed, but it also leaves out some important information. Will the system be used by “frequent” or “infrequent” users? Will it be used by users under “situational” pressure? Will the users be part of a larger organizational hierarchy, such as always occurs in the military? User profiling, like task profiling, should begin with a look at some of the existing user taxonomies. But the profiler should make sure the taxonomies reflect the application, that they are based upon criteria meaningful to the community the profiler is trying to help. As a general rule of thumb, the following questions should be posed before, during, and after the user profiling process: 0 0
0
0 0 0
How experienced with interactive computing are the users? How experienced with analytical methodology are the users; are they inclined to think analytically or are they more “passive” users? How frequently will they use the decision support system? What cognitive “styles” will they bring to the system? To what extent is their behavior determined by their role or rank? How high are their expectations about what the system can do for them?
The answers to these questions (and others-see below) will yield a user profile that will inform the systems design and development process; without the answers the designers will speculate about-or ignore altogether-the kind(s) of problem-solvers that will operate the system. Unfortunately, user requirements analysis methodology is not nearly as well developed as task requirements analysis methodology. Methods for developing taxonomies based on experience and cognitive “styles” and requirements are thus not altogether different from task requirements methods, though the focus is very different. There are several ways to gather information about how experienced with interactive computing and analytical methodology the users are. Note that the experience that should be measured includes experience with computing and analytical methods, and analytical computing. These distinctions are important because many systems tend to be model-oriented while much user experience with computing is data-oriented. Users who feel very comfortable with a sophisticated database management system cringe at the thought of interacting with a trivial analytical program. Conversely, users familiar with modeling software often find data retrieval and display programs completely useless.
18
STEPHEN J. ANDRIOLE
Conventional requirements data collection methods, like interview and field observation methods, can yield a good deal of insight into the users experience with analytical computing. But for these methods to be effective a great deal of front-end work must be done. The following questions can be used to structure an interview or interpret field observation: 0 0 0
0
0
What is the nature of your prior experience with computing?; has it been primarily data or model oriented? Are you a frequent (more than ten times a month) or infrequent (less than ten times a month) user?; do you avoid computers whenever possible? Do you have any formal training in analytical methodology?; if so, in what methods? What analytical programs have you used? What are your expectations about decision support?
These and similar questions can be used to profile users according to their general computing experience and their experience with analytical computing specifically. Scales can be developed to measure this experience, though they need only be very crude. It is also possible to observe users in an analytical computing scenario where they must interact with a system that makes certain demands on their problem-solving skill. There are two kinds of methods for profiling users’ cognitive styles and capabilities. The first involves applying one or more generic descriptive cognitive taxonomies, while the second assumes that insight can be gained by applying a generic ability requirements taxonomy. These taxonomies can be used to organize field- and scenario-based observation exercises. If there is time, and the circumstances are right, questionnaires can be administered to profile cognitive preferences and problem-solving styles. There are a variety of questionnaires available that purport to measure cognitive capabilities and styles, though few of them have been scientifically validated. Cognitive profiling seeks to identify user perceptual and mediational processes. It is important to define these processes, because they tell us a great deal about how the system should be designed and how it should interact with its users. Cognitive profiles can suggest, for example, that graphic output is inappropriate, that the interaction pace should be slow, and that the analytical method in the system should be highly visible. By and large, cognitive profiling informs the design of the man-machine interface and the system’s behavioral characteristics. Figure 5 suggests how the user profiling process works.
REQUIREMENTS ANALYSIS HETHODS
GENERIC USER TAXONOMIES
[
Questionnaire & Survey Methods
Experience Taxonomies
€22 Taxonomies
Cognitive Taxonomies
-+
& Observation
1
SPECIFIC USER PROFILES
-b
Simulation &Gaming
FIG.5. The user requirements analysis process.
4-
TASKS/ ORBANIZATIONAL PROF1LE
1
REQUlREMENTS HODELI NO/ PROTOTYP I N 6 1 STORYBOARD1N6
20
STEPHEN J. ANDRIOLE
2.7
Organizational/Doctrinal Profiling Methods
Until recently, very little attention was given to the impact that an analytical problem-solving system might have on an organization or bureaucracy. After countless systems were thrown out due largely to their incompatibility with established eficient problem-solving procedures, designers began to take note of the environment in which their systems were expected to perform. First, unless the mission explicitly calls for it, designers should try to avoid creating the impression that the system will change the way things are now done. The most appropriate support image suggests that the system can help organize and expedite problems that are otherwise tedious and time-consuming. Many early support systems were enthusiastically accepted when they helped reduce information overload, filter and route information, and structure decision option selection problems. But resistance grew when they moved into the prescriptive provinces previously the exclusive preserve of humans. Worse yet are the decision aids and support systems that only try to change an organization’s structure but try to do it with exotic analytical methodologies that require six months of “interactive training” before the system can be used. Systems will fail if they are incompatible with the organizations they are intended to support, regardless of how well designed they are, just as mediocre systems will excel within organizations with which they are perfectly compatible. Designers must also understand doctrine and the requirements that it generates. If the focus here were on basic systems research, then the issue would not be as important, but since the focus is on applied systems design and development, the issue is unavoidable. The least developed requirements methodology is that available for organizational-doctrinal profiling. As suggested above, the interactive systems design community has only recently recognized the importance of organizational context. Consequently, there are relatively few methods for profiling the organization and its doctrine that must be served by the system. The two general methods discussed here include critical activity profiling and compatibility testing methods. It is essential that an organization’s “mission” be fully understood before the system is functionally modeled. Here the reference is not to the individual tasks that comprise the mission (which are the focus of task requirements analysis), but rather to the higher-level function the organization is supposed to perform. The relationships that the organization has with other organizations are also critical. Critical activity profiling methods are primarily observation oriented. They are also fed by voluminous mission descriptions (also know as “policies and procedures” manuals). It is important to identify and define an organization’s
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
21
critical activities, because while a system may well help individuals solve specific low-level problems, the solutions themselves may be incompatible with the organization’s mission and “modus operandi.” Compatibility testing methods provide insight into an organization’s modus operandi and provide the means for averting major inconsistencies and incompatibilities between the system, the organization, and its “doctrine.” What are the organization’s policies, procedures, and “protocols”? How are problems solved within the organization? What is the hierarchical structure? Can the flow of information in the organization be modeled? Is it clear where the system will fit within this flow? The methodology for profiling organizations comes from the organizational development community. In one study (Srinivasan and Kaiser, 1987),an attempt was made to measure relationships between organizational factors and systems development. It was determined that the characteristics of an organization can (positively or negatively) affect systems design and development progress. Another study (Baroudi et al. 1986) suggested that user involvement in the systems design process predicted to levels of user satisfaction and usage. These and other studies suggest the likely relationship between organizational profiles and the extent to which systems support or hinder organizational performance. It is safe to say that there is by no means an abundance of generic (or specific) organizational/doctrinal taxonomies targeted at interactive computer-based systems design and development. There are, however, a number of taxonomies that recognize organizational personalities and pathologies. Unfortunately, this literature is of limited use. The best way to proceed is to develop a set of questions, identify a set of issues, and analyze organizational manuals that shed light upon the organization’s mission and modus operandi. One should then gather some data via direct observation, supplement it with codified doctrine, and then develop a crude organizational profile as it pertains to the system under development. Figure 6 suggests how organizations can be profiled. 2.8
The Task/User/Organizational-Doctrinal Matrix
A good requirements analysis will enable one to construct a problemspecific three-dimensional matrix, as suggested in Fig. 3; it will also permit the development of a prototype. But why go through all the trouble? The reason is that numerous design issues can only be solved through the matrix. For example, user type@)will determine the type of interactive dialogue one should use. Tasks will determine the analytical method selected to drive the system, while organizational-doctrinal considerations will determine the system’s interface, input requirements, output, physical size, and security
QENERIC ORGANIUT IONAL TAXONOHIES
Structural Taxonomies
N
OR6ANIUT IONS
"Cultural" Taxonomies
REQUIREMENTS ANALYSIS HETHODS
& Survey
&Observation
SPECIFIC ORGANIZATIONAL PROFILES 0
MIssion
0
Policies & Procedures
0
"Personality "...
"Strategic & Tactici31"
FIG.6. The organization requirements analysis process.
TASKS/
i- USER
pRTLE
REQUIREMENTS NODELINW PROTOTIPINO/ STORYBMRDINB
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
23
characteristics. I t is essential that user, task, and organizational-doctrinal definitions be deueloped and integrated before the design process proceeds any farther. As Fig. 7 suggests, the requirements matrix leads directly to the prototype. Requirements define the hypothetical system concept embodied in the prototype. The importance of requirements analysis cannot be overstated in the overall design and development process. The whole point of prototyping is to validate requirements via some tangible representation of the system concept. The extent to which requirements are accurately identified initially will determine the number of prototypes necessary to validate requirements.
2.9 Some Prototyping Methods There are several ways to capture the essence of the system to be built. As soon as the requirements analysis is completed, the prototyping strategy requires the development of some kind of representation or model of how the system will operate. Remember that this prototype will be temporary; it is intended to introduce the system concept to the users. They will no doubt find it flawed, and it will be adjusted (again and again) until they are “satisfied,” knowing full well that they might never really be happy with the design (even as members of the design team!). Such is the fate of the systems designer. A good prototype, or requirements model, serves many purposes simultaneously. As suggested, it fosters discussion about what the system should and should not do. But it also verifies the results of the requirements analysis. As members of the design team, users can inspect the integrated model and recommend changes. Finally, the model permits the design team to display something to its users early on in the design process, something that stimulates the design team, pleases management, and convinces users that the team is dedicated to solving their problems. There is very little agreement about which prototyping methods work best. Some believe that conventional flowcharting is sufficient, while others demand a “live” demonstration of the system-to-be. There are at least four viable prototyping methods, including the development of narratives, the development of flowcharts, methods based upon other information theories and methods, and those that yield “storyboards.” 2.9.1
Narrative Methods
Narratives remain powerful communication tools. When well done, they can accelerate the design process. Ideally, a narrative should describe what the system will do, indicate its input requirements, describe and illustrate its
THREE-DIMENSIONAL REQUIREMENTS NATRIX
a l
HODELINB BETHODS
Narrative Methods
-b h) P
i i l Cherting
Storybogrding
4I
T
FIG.7. The requirements/modeling/prototyping process.
I
PROTOTYPING OPT IONS
Evolutionary
9 Hybrid
I
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
25
output, and suggest a softwarelhardware configuration. At the same time, it should not be so long or verbose as to discourage study. Its prose should be terse and to the point; it should also be illustrated with simulated screen displays. Narratives should be used only when the system is relatively uncomplicated, when the tasks to be performed are less than cognitive. They should also be used only when users will find them appropriate. Many military users, for example, would find narratives too tedious for serious study. 2.9.2 Flowcharting Methods We are all familiar with conventional (logic) flowcharts. In the hands of an experienced systems analyst, logic flowcharts are rich with information, but in the hands of a novice they are meaningless. But there are other flowcharts that can serve larger groups. Van Duyn (1982),for example, suggests that there are a variety of flowcharts that can be used to develop prototype system models, which include 0
0 0
0
0
0
Conceptual Flowcharts-pictorial presentations of the flow of information; General System Flowcharts- top-level visual presentations of the system intended for management inspection; Functional Flowcharts-visual presentations of the system, subsystem, or program showing the functions of data with no decision factors or other variables to distract the viewer; Logic Flowcharts-visual presentations of the flow of data through a subsystem and/or program, the location of decision processes, and the control of logic through switching or complicated decision processes. Logic flowcharts are the conventional ones intended to reduce coding and debugging time; Job Step Flowcharts-Visual presentations of a computer processing operation which often consists of one or more programs that process all input and generate all output; and Work Flowcharts-Visual presentations of the flow of paper or manual work.
2.9.3
Generic Model-Based Methods
Figure 8 presents some “off-the-shelf” modeling techniques that can be used to represent a particular system. So long as the problem area “fits” the model (and vice versa) one or more of the models may work, but one must be very careful to match the right model with the right requirements definition.
APPRDACH
DccisionTheory
Hodels
tlodcls o f Human Information Processing
DESCRl PT ION
COMHENTS
These models concern the decision-making behavlor o f the user. They require the Specification of: ( 1 1 a set o f possible states o f the world, with their estimated probabilities, and ( 2 ) a set of possible declslons, or courses o f action, which might be taken, together with their expected values and cost in the various possible states o f the world. Considering the values and costs, together with the evidence o f particular world states, a decision-theoretlc model can select courses of action.
Decision-theoretic models can be used to suggest 'optimal- decisions or to describe the observed decision-making behavior o f users. in both modes, these models are frequently used in decision aids. If it is reasonable to describe user behavior in terms o f such a model, these models can also be useful to the system designer, as by suggesting Information required by the user.
In general, these models Involve a characterization of: ( 1 ) the task environment,including the Problem and means o f solution available. (2) the problem space employed by the subject to represent the problem and Its evolving solution. and (3)the procedure developed to achieve a SOlUtiOn. The method used to develop such models involves intensive analysis of the problem to be solved and of protocols obtained from problem solvers during solution.
Ideally, such efforts might lead to an integrative model o f human information procexing useable in a variety o f design applicatlons. However, existing models are either too task-specific for thls use o r are insufficiently detailed. Futhermore,relstlonships between task requirements and human performance capabilities and limitations are inadequatetly understood for human Information processing tasks. There are many good models applicable to very specific tasks. I
Computer
System Hodels
These models attempt to describekhe behavior of t h e computer component of an interactive system. but do not attempt to model user performance in detail. Some o f the models do characterize user behavio- in terms o f the statistical properties of user commands f o r a particular application. The models usually attempt to predict such as system response time. CPU and memory loads, and I/O requirements.
1
,
These models tend to be relatively crude.but can be useful In determining whether or not user requirements with respect to response time and other gross system performance measures can be satlsfied by a proposed system. They are of little assistance in determining what the user requirements are.
FIG.8. Some modeling techniques.
APPROACH
DESCRl PT ION
COH Pl ENTS ~~~
Network tlodels
~
~
~
~~
These models treat user and system as equivalent elements in the over-all process. The individual task performed b y both the user and the system are described in terms of expected performance and in terms o f logical predecessor-successor relationships The relationships define a network of tasks which i s used as a performance model o f the user-computer system. Such models are usually used to predict either the probability o f failure o r -success', or the completion time, o f an aggregate set o f tasks.
Network models allow performance data about users and computer systems t o be integrated i n a single model even though original data came from a variety of sources. However, performance data must be provided f o r each task.as must rules for combining performance data from each individual task t o obtain aggregated performance predictions. This i s oefln difficult because of questionable o r lacking empirical data, and because performance interactions among tasks (especially cognitive tasks o r tasks performed in parallel) may be v e r y complex. Performance distributions are often assumed without data. In spite o f these difficulties, the process o f constructing such models i s often a valuable source o f understanding.
These models are based on control theory,statistical estimatlon, and decision theory. The user i s regarded as an element i n a feedback control loop. Such models are usually used to predict over-all performance o f the user-computer system in continuous control and monitoring tasks.
Control-theoretic models are more quantitative than other performance models. They may address user-computer communication broadly.but they ordinarily do not deal w i t h details o f the interface, such as display design. Therefore, their utility as an aid to the interface system designer may be limited. Not much work has y e t been done in applying these modeling techniques to situations in which the main user activities are monitoring and decisionmaking. w i t h infrequent control actions.
~~
ControlTheory tlodels
FIG. 8 (Cont'd). Some modeling techniques.
28
STEPHEN J. ANDRIOLE
Figure 8-from Ramsey and Atwood (1979)-describes following models: 0 0 0 0 0
and discusses the
Decision-theory models; Human information processing models; and Computer system models. Network models; Control-theory models;
2.9.4 Screen Display and Storyboarding Methods Perhaps the most useful prototype is one that displays to users precisely what they can expect the system to do-at least hypothetically. Paper copies of screen displays are extremely useful, since they permit users to inspect each part of the interactive sequence. Boar (1984) regards screen displays as acceptable “hybrid prototypes.”
FIG.9. Illustrative storyboard (I). This display suggests how the work space within the menu structure can be filled with accessible data and information. In this case, the data is map-based, suggesting to the user that the primary “object” of analysis and manipulation will be the tactical map.. . the display also suggests that the planner can access any of the “elements” of tactical planning which reside along the left side of the display . . .
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
29
While useful, paper screen displays pale against the impact of computergenerated (and animated) screen displays. Dubbed “storyboards,” computergenerated displays simulate man-computer interaction. With new animation packages, it is now possible to animate the storyboard and thereby mimic sophisticated interactive graphics capabilities. The animated storyboard and its paper equivalent provide users with the best of both worlds. The computer-generated storyboard permits them actually to experience the system, while the paper copy enables them to record their comments and suggestions. Each “run” through the storyboard becomes a documented experiment filled with information for the design team. The paper copies also comprise a permanent record of the iterative modeling process, an invaluable contribution to corporate or military institutional memories. Figures 9-13 suggest what storyboards look and “feel” like. A typical storyboard will have over a hundred such displays intended to communicate to users what the system will do, how it will d o it, and how users will be expected to work with it. When strung together, these storyboards will communicate an interactive system concept to users, who are then free to comment upon and criticize the concept, thus triggering the development of
FIG. 10. Illustrative storyboard (11). This display suggests how a user can execute planning tasks; in this case, the user is interested in the Mission and has selected information regarding the Mission’s Objectives. . .
FIG.It. Illustrativestoryboard(III).Thisdisplaydisplays theMission’sObjectivesto theuser via integrated text and graphics . . . the Corps area of interest/operations/influenceis displayedto the planner; the Objectives are also described in abbreviated text.. . the integration of text and graphicssupports important cognitive functions . . .
FIG.12. Illustrative storyboard(IV). The planner selects Blue COAs . . . 30
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
31
FIG. 13. Illustrative storyboard (V). Blue COA #Z is displayed . . .
the next prototype (or the enhancement of an evolutionary one)-all suggested by Fig. 1.
as
2.9.5 Software Specification and Engineering
Only after a credible system concept has emerged should the design team turn toward the specification and development of software. There are a variety of tools available for software specification, including the now-popular computer-aided software engineering (CASE) tools, which permit the implementation of a variety of diagramming techniques, such as data flow diagramming and other forms of structured modeling. Some of the more sophisticated CASE tools provide software engineering environments that support actual programming. 2.9.6 Hardware RequirementslHardware-Software Configura lion
Figure 1 suggests that the information systems engineer also needs to consider hardware requirements and the overall hardware/software configuration. In more than a few instances hardware will be pre-selected prior to
32
STEPHEN J. ANDRIOLE
the requirements analysis. Unfortunately (for designers), the target system will have to be developed on existing hardware (and often in a pre-specified, highlevel language). On those occasions when there is hardware flexibility,then the hardware configuration should match user requirements and what the software engineers believe is the best way to implement the system concept. 2.9.7
Testing and Evaluation
There are at least two ways to think about testing and evaluation. On the one hand, software should be tested to make sure it is doing what it is supposed to do. The emphasis here is on algorithmic testing. There are a number of tools and techniques available to the software tester, including quality assurance, fault tolerance and redundancy testing, among others (Fairley, 1985). But on the other hand is the kind of evaluation conceived at a much higher level of abstraction. Here evaluation is focused on measuring the extent to which the system satisfies user requirements. If a system’s algorithms fire beautifully but the system itself fails to support its users, then the enterprise has failed. Methods for system evaluation include multi-attribute utility (MAU) assessment (Adelman, 1990), cost-benefit and hybrid methods. 2.9.8 Documentation and Maintenance
Systems are not complete until they are documented and a feasible maintenance plan is developed and implemented. Documentation should include (at a minimum) a users’ manual, a functional description of the system, and the software specifications. A “manager’s guide” is also useful. Documentation can be embedded in the system and/or paper-based. Good documentation bridges the gap between a system’s description and training. A good maintenance plan is realistic and field-tested before it is installed. It is essential that users not be left on their own and that the design and development team be ready, willing and able to support their system. This has budgetary implications that must be appreciated throughout the design and development life cycle.
3.
The Domain of Command and Control
3.1 The Command and Control Process
Command and control (C’) is part of the force effectiveness process, as Fig. 14 suggests. C2 is an element of force effectiveness, as well as a means for
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
33
OVERALL FORCE EFFECTIVENESS
&
PERFORMANCE
/ I
W E A P 0 N
S
/
Deve 1opment DEPLOYMENT
&
IMPLEMENTATION
’ . . ,
PROCUREMENT
TEST I NG
4
4
WEAPONS SYSTEMS DEVELOPMENT
PERSONNEL SYSTEMS DEVELOPMENT
P E R S 0
N N E L -
FIG.14. The force effectiveness process.
the enhancement of human performance. Figure 15 suggests that computerbased decision aids and larger information systems can support a number of activities, including C2. Command and control (C2)is the process by which military a,nd civilian “commanders” exercise authority and direction over their human and material resources to accomplish tactical and strategic objectives (JCS, 1976). C2 is accomplished via the orchestrated implementation of a set of facilities, communications, personnel, equipment, and procedures for monitoring, forecasting, planning, directing, allocating resources, and generating options to achieve general and specific objectives. In industry, managers and corporate leaders identify market objectives and then mobilize resources to achieve them; in the military, commanders plan and execute complicated, phased operations to fulfill their missions. Commanders in industry mobilize factories, aggressive managers, line workers, and their natural and synthesized resources to produce superior products. Commanders in the military mobilize weapons, troops, and sophisticated communications apparatus to defend and acquire territory and associated military and political objectives.
PLATFORMS SENSORS
7
WEAPONS C2, INTELLIGENCE
INFORMATION & DECISION SYSTEMS DOD PERSONNEL
-b SELECTIOND -1 ORGANIZATION & TRAINING
FIG. 15. The range of information and decision systems applications.
READINESS & EFFECTIVENESS
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
35
Decision-making lies at the heart of C2. While commanders will always need data, information, and knowledge to inform their decisions, the decisionmaking process itself can be supported by C2 decision support systems. Such systems support the “cognitive” functions of the commander. Some of these include the nature of threats, assessments of his or her organizational capabilities, and the identification of operational opportunities. C2 decision and information systems also recognize decision-making constraints, such as limited time and incomplete and ambiguous information. Figure 16 suggests the range of Cz information and decision systems opportunities (Andriole, 1987a-e). There are currently a variety of decision and information systems that support decision-making in the cells in the matrix. There are systems that support decision-making at the National Military Command System level, for the Unified and Specified Commands, the Services and in the field. Note also that Fig. 16 indicates that there are strategic, theater, allied and tactical levels, and that decision-making is presumed to be very different at various points along the war-peace continuum. Figure 17 suggests the range and complexity of the C2 decisions that a Tactical Flag Commander must make. Commanders at all levels and in all branches of the military must solve similar problems and make the same kinds of decisions. 3.2 Command and Control Information and Decision System Requirements Perhaps the best way to understand where C2decision aids and information systems can help the most is to identify the special problems that commanders routinely face. Some of these problems include 0
0
Sub-optimal information management information “overload” difficulty finding key information poor information presentation incorrect information ambiguous information incomplete information Limited option generation and implementation Limited alternative generation sub-optimal option evaluation limited scenario generation capabilities limited real-time simulation capabilities.
36
PLAN REVIEW/
COMMANDER
PREPARATION
FIG. 17. Tactical flag command decision-making.
37
38
STEPHEN J. ANDRIOLE
r
DIFFICULT DATA BASE SEARCHING
& & POOR INFORMATION PRESENTATION
I
INFORMAT’ION OVERLOAD A
t
INEFFECTIVE INDIVIDUAL h GROUP DECISION MAKING FIG.18. C2 information processing and decision-making problems.
These and additional problems are summarized in Fig. 18. Figure 18 also suggests where C2 information and decision systems can yield the greatest payoff. Since approximately 1979, a date that marks the beginning of the widespread proliferation of microcomputers throughout the defense establishment, decision aids and larger information systems have been designed, developed and applied to a variety of C2 decision-making problems. C2 information and decision support systems help commanders discriminate among alternatives, simulate the implementation of options, and evaluate the impact of decisions made by commanders in various situations. They help commanders test assumptions, perform “what-if” analyses, and conduct decision-making “post mortems.” C2 information and decision systems support the C2 process in a variety of ways. Figure 19 suggests the range of C2 information and decision systems applications. It also suggests the varied nature of C2requirements. The C2 information and decision systems engineering process is correspondingly broad and complex, as suggested in Section 4.
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
39
FIG. 19. C2 requirements and aiding applications areas.
4.
Command and Control Information and Decision Systems Engineering 4.1
C2Information and Decision Systems Requ irements Ana Iys is
The design and development of information and decision systems intended for use by relatively inexperienced computer users to solve analytical problems is fundamentally different from the design and development of systems intended to provide inventory control support to frequent users. Those that design and develop C2 information and decision systems have, accordingly, perfected their requirements analysis techniques. Many of these techniques rely upon the structured interviewing of commanders to determine their decision-making needs. C information and decision support systems designers have also endorsed the “rapid prototyping” approach to systems design, since it is so difficult to capture C2 decision-making requirements the first time. C2 systems designers thus build several working prototypes of their systems to help validate requirements before proceeding with full-scale system
40
STEPHEN J. ANDRIOLE
development. Finally, the C2 ISE community has devoted a great deal of attention to how systems can be evaluated. C2 systems designers identify and refine decision-making requirements by employing a number of methods. These include survey and questionnaire methods, methods based upon interviews and (direct and indirect) observation, and simulation and gaming-based methods. The key to C2 requirements analysis lies in the identification of the essential decision-making tasks, tasks that when well performed can significantly enhance C2 decision-making performance. The requirements analysis methods are employed to identify not only critical C2 decision-making tasks, but profiles of the users and organization in which the system will reside as well. User profiles, tasks profiles and profiles of the organization comprise the requirements equation, as suggested in Fig. 20. Figure 20 presents a three-dimensional requirements matrix that illustrates the intersection of tasks, users and organizational characteristics. Each cell in the matrix represents a requirements challenge. The tasks in the matrix are generic; in practice, a C2 requirements analyst would convert those generic tasks into very specific tasks (pertaining to, for example, resource allocation problems, tactical planning, and target value analysis). Perhaps the same requirements analyst would specify users in greater detail than simply inexperienced/experienced/infrequent user/frequent user. Organizationaldoctrinal characteristics might also be specified in greater detail. Regardless of the level of detail (and the methods used to achieve it) the requirements matrix suggests that prudent designers of C2 information and decision systems deal with all three dimensions of the requirements challenge. 4.2
C2System Modeling and Prototyping
Prototyping is sanctioned by the C2 design community because it is so difficult to identify and refine C2 requirements (especially decision-making requirements) the first time through the requirements process. The prototyping premise calls for the design and development of a working model of the information or decision support system under development, the solicitation of reactions to the model from prospective users, and the refinement of the model when requirements can be validated. Prototyping calls for iteration. It also calls for the development of two kinds of prototype systems: “throwaway” and “evolutionary” systems. Throwaway systems are used when requirements are especially difficult to capture; evolutionary ones are used when C2 requirements are less elusive. A great many C2 information and decision systems tackle problems for the very first time. Many Cz functions and tasks have been manual for years; because information technology has evolved so quickly, requirements that
o
F
/
Activate Adjust Syncronize...
/
Advise
,f
Request...
@
Information Processing Problem-Solving and Decision-Making ...
1$
Search For / Receive Information Identify Objects, Actions, Events...
/
INEXPERIENCED (1)
EXPERIENCED (€1
I/E INFREQUENT
FIG.20. C2 user/task/organizational-doctrinalrequirements matrix.
42
STEPHEN J. ANDRIOLE
were once believed too difficult to satisfy are now yielding to creative systems designs. While the results are often impressive, “evolutionary development” is almost always necessary.
4.3 Analytical Methods for C2Information and Decision Systems Engineering Those who design, develop, and evaluate C2systems call upon the social, behavioral, engineering, mathematical, computer and management sciences. C2 information and decision systems design and development is multidisciplinary by nature and necessity. A variety of analytical methods and other tools and techniques are available to the designer of advanced C2 systems. Figure 21 identifies the major decision-aiding technologies. Czinformation and decision systems designers have a variety of analytical methods at their disposal. The key lies in the correct matching of analytical methods to problems. There are several primary methods classes worth discussing here. They include decision analytic methods, operations research methods, methods derived from computer science and artificial intelligence, and methods derived from the field of human factors engineering. 4.3.1
Decision Analytic Methods
Some of the methods, tools and techniques used to drive Cz systems include utility/value models, probability models, and mixed value-probability models. Utility/value models come in a variety of forms. Some are based upon conventional cost-benefit models and assumptions. Some are based upon the treatment of value as “regret,” that is, the “flip side” of utility, since many Cz commanders perceive costs more vividly than benefits. Others are based upon multi-attribute utility assessment (MAUA) models. MAUA models are powerful tools for assessing the relative value of alternative courses of action. The methodology is generic. It can be used to assess the relative value of courses of action, personnel, or objects or processes of any kind. In the civilian sector, MAUA models are used to assess the value of alternative sites for new factories, alternative business plans, and corporate jets. In the military, they are used to assess alternative tactical plans, the performance of competing weapons systems, and the value of alternative investments in high technology. Probability models, including probability trees, influence diagrams, and Bayesian hierarchical inference models, identify data, indicators, events, and activities that when taken together predict the likelihood of single or multiple events. Figure 22, from the Handbook for Decision Analysis (Barclay, et al., 1977), presents a Bayesian hierarchical inference structure intended to determine the likelihood of a country developing a nuclear weapons capability.
Berations Research 0 Deterministic Models/ Optimization 0 Stochastic Models Pattern Recoanition 0 Discrimination Technique 0 Classification Techniques
->ANALYSES
Decision Analvsis Utility/Value Methods Probability Models Mixed (ValueProbabillty) Models Knorledaa-Based lechniaues Expert Systems Planning b Problem Solving Pattern Directed Inference
HAWAGEHENT
Database Management Document Retrieval Message Processing
INTERFACES
Advanced Interface Techniaues 0 MapITerrain Display Systems Natural Language Interfaces New Input Techniques (Voice Recognition) Hum811F8ctors Enaineerinq Iechni aues Man/Machine Interfaces Embedded Training
FIG.21. Major information processing and decision-aiding technologies.
44
H1 H2
STEPHEN J. ANDRIOLE
- Country A Intends t o develop a nuclear weapons capability within 5 years - Country A does not intend to develop a nuclear weapons capability within 5 years
DATUH 1
ACTIVITY I
DATUM 2
COUNTRY A THREATENS COUNTRY B WITH USE OF "DRASTIC" WEAPONS IF TERRORISM CONTINUES
NUCLEAR RhD PROGRAfl
R&D PROGRAM DIRECTORS MEET IN WEEK-LCNG CONFERENCE
INDICATOR I
INDICATOR 2
ENRICHMENT PLANT EXPANSION
INCREASED USE OF NUCLEAR
PHOTO-RECON. ADDITIONAL COOLING TOWERS
20% INCREASE HEAVY WATER
DATUM 3 NO OBSERVED CHANGE IN
PLUTONIUM WATER TO OTHER COUNTRIES CANCELLED
ACTIVITY 2 HIGH-EXPLOSIVE R h D PROGRAM
INCREASED SCIENTIFIC ACTIVITY
DECREASE IN PUBLICATION ON HIGH-EXPLOSIVE RESEARCH
FIG.22. Hierarchical inference structure for nuclear weapons production.
In its computer-based form, the model permits analysts to determine how new evidence affects the likelihood of a given country's intention to develop nuclear weapons. The model works via assessments of the relationships among data, indicators, and activities that chain-react up the model to determine the probability of the hypotheses that sit at the top of the structure. Mixed value-probability models often drive C2systems. The most common form of the mixed model is the probability tree, which generates values for outcomes given the likelihood of events and the value of their occurrence. 4.3.2
Operations Research Methods
There are a number of tools and techniques that comprise the range of operations research methods (Thierauf, 1978). Several that deserve special
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
45
mention include linear programming, dynamic programming, integer programming, queuing theory, aspects of systems analysis, and even the classic quantitative-empirical inferential statistical methods. Linear programming is representative of operations research methods that seek optimization. Linear programming methods can be applied to complicated resource allocation and optimization problems when the following conditions exist (Thierauf, 1978): 0 0 0 0 0
the parameters of the problem constitute a linear function; alternative resource mixes are possible; the linear functions (and constraints) can be expressed nathematically; the mathematical relationships among variables can be mapped; and resources are finite (and quantifiable).
Linear programming enables a problem-solver to optimize the allocation of resources according to a specific goal. There are two primary linear programming methods: the graphic method and the powerful and popular simplex method. The graphic method involves the plotting of the linear function and constraints in a multidimensional space and then solving the simultaneous equations of the plotted lines. The simplex method involves the implementation of an iterative mathematical process until the best solution is found. Linear programming methods are flexible because they permit asset, constraint, and goal manipulation. Dynamic programming methods also account for time intervals. These and related optimization methods can be used to solve a variety of C2 problems, including especially route planning, resource allocation, weapons assignment, equipment reliability assessments, production planning, and the numerous “assignment” problems that surround so many C2 decisions.
4.3.3 Computer Science and Artificial Intelligence Methods Computer science is a discipline with roots in information theory and mathematics that links electronic data processing with data and models for data storage and retrieval. The tools and techniques of computer science make it possible to implement a variety of analytical methods that are more accurately located within one or more of the above categories. Pattern recognition, queuing, networking, inventory modeling, and simulation, while quite frequently considered computer science methods, really belong to the operations research community. Database management methods really belong to the management science community, while document retrieval methods belong to librarv and information science. The key to understanding
46
STEPHEN J. ANDRIOLE
the range of methods (from any of the classes) lies not in strict definitions of disciplines or fields of inquiry, but in the development of comprehensive, nonredundant taxonomies of methods. Ideally, such taxonomies will be anchored in the social, behavioral, engineering, computer, mathematical and management sciences. “Conventional” algorithmic methods refer to those used to collect, refine, store, route, process and create data and information for specific problemsolving purposes. In many cases, this amounts to writing algorithms to implement decision analytic, operations research, or management science methods. On other occasions it reduces to the development of tabular and graphic displays, while on still others conventional computer science methods are applied to database housecleaning chores. Artificial intelligence (AI) methods seek to identify, codify and process knowledge. A1 systems differ from conventional ones in a number of important ways. First, conventional systems store and manipulate data within some very specific processing boundaries. A1 systems store and apply knowledge to a variety of unspecified problems within selected problem domains. A1 systems can make inferences, implement rules of thumb, and solve problems in certain areas in much the same way humans solve problems. The representation of knowledge is the mainstay of AI. There are a number of options available to the “knowledge engineer,” the A1 systems analyst with responsibility for converting problem-solving processes into executable software. The most popular knowledge representation technique is the rule, an “if-then” formalism that permits knowledge engineers to develop inferential strategies via some relatively simple expressions of knowledge. For example, if a tank will not start, it is possible to write a series of rules that represent the steps a diagnostician might take to solve the problem: if if if if
the engine will not start, then check the battery; the battery is OK, then check the solenoid; the solenoid is OK, then check the fuel tank; and the fuel tank is full, then check the starter.. .
These simple rules might be expanded and re-ordered. Hundreds of rules can be used to perform complicated diagnostic, maintenance, and planning tasks. Some other knowledge representation techniques include frames, inference networks, and object-attribute-value triplets (Andriole, 1986). All knowledge representation techniques strive to represent the processes by which inferences are made from generic knowledge structures. Once knowledge is represented it can be used to drive expert systems, natural language processing systems, robotic systems and vision systems. Expert systems
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
47
embody expert knowledge about problem-solving; natural language systems permit free-form interaction with analytical algorithms; robotic systems use knowledge to identify objects and manipulate them through complex environments; and vision systems intelligently process images and “see,” recognize, and respond to their micro environments (Andriole and Hopple, 1988). 4.3.4
Human Factors Engineering Methods
In addition to the analytical methods discussed above, there are several classes of methods that the C2 information and decision systems designer must understand. These include all those pertinent to the actual means by which the system is connected to its user. Here the reference is to the usercomputer interaction routines, the use of appropriate displays, error handling, response time, and all those issues relevant to how easy the system is to use and how productive it makes its user. All of these issues, tools, techniques and methods fall under the general auspices of human factors engineering (Norman and Draper, 1986). 4.3.5
The C2TaskslMethods Matching Process
Analytical methods are best exploited when they “match” a specific C2 requirement. Figure 23 suggests that the selection of an analytical method cannot be made independent of the requirements the system under development is intended to satisfy. The tasks/methods matching step in the C2 systems design and development process is critically important to the application of successful systems (Andriole, 1989b). 4.4
C2Systems Evaluation
C2 information and decision systems are evaluated somewhat more comprehensively than conventional software systems. The reason why is simple. C2 information and decision systems are inherently user-oriented. Evaluations of their performance must therefore attempt to measure the extent to which the system supports requirements, interfaces well with its users, supports the organizational mission it is intended to serve, contains efficient algorithms, can be maintained, and is (well or badly) documented. In other words, systems evaluation deals with all of the conventional software performance issues as well as those that pertain to how well or badly the system serves its users and organizations. One of the very best approaches to the evaluation of decision aids and support systems belongs to Adelman (1990). Figure 24 presents his
I I
I
C 2 REWIREHENTS
C 2 PROBLEM ASSESSHENT
I
ANALYSIS
c IDENTIFICATION OF CANDIDATE ANALYTICAL HETHODS I
8
I
I
I
I
DECISION ANALYTIC METHODS
Al HETHODS
OPERATIONS RESEARCH HETHODS
C2 TASKIMETHODS HATCHING
+
FUNCTIONAL SYSTEH HODELIN6 & PROTOTYPING
h SOFTWARE
FIG.23. C2 information systems engineering and analytical methods selection process.
1
0.0 I .C A i d N s e r Interface 1 .I flatch w i t h personnel 1.1.1 Training h technical background
1.1.2 Work style, workload and interest 1 . I .3 Operational needs 1.2 Aid characteristics 1.2.1 General 1.2.1 . I Ease of use 1.2.1.2 Understanding Aid's processes 1.2.1.3 Ease of training 1.2. I .4 Response time 1.2.2 Specific 1 2.2.1 User interface 1.2.2.2 Completeness o f data files 1.2.2.3 Accuracy o f expert judgements 1.2.2.4 Ability to modify judgements 1.2.2.5 Understanding o f Aids algorithms 1.2.2.6 Utility of graphs 1.2.2.7 Utility o f print-outs 1.2.2.8 Understanding o f text
Overall Utility
2.0 User/Aid Organization 2.1 Efficiency factors 2.1 .l Acceptability o f time for 2.1.1.1 Task accomplishment 2.1.1.2 Data management 2.1.1.3 Set-up requirements 2.1 .2 Perceived reliability under average battle conditions 2.1.2.1 Skill availability 2.1.2.2 Hardware availability 2.2 flatch w i t h organizational factors 2.2 1 Effect on organizational procedures and structure 2.2.2 Effect on other people's position in the organization 2.2.2.1 Political acceptability 2.2.2.2 Other people's workload 2.2.3 Effect on information flow 2.2.4 Side effects 2.2.4.1 Value in performing other tasks 2.2.4.2 Value to related organizations 2.2.4.3 Training value
3.0 Organization/Environment
FIG.24. Adelman's multi-attribute utility evaluation model.
3.1 Decision accuracy 3.2 Match between Aid's technical approach and problem's requirements 3.3 Decision process quality 3.3.1 Ouallty o f framework f o r incorporating judgement 3.3.2 Range o f alternatives 3.3.3 Range o f objectives 3.3.4 Weighting o f consequences o f alternatives 3.3.5 Assessment o f consequences o f alternatives 3.3.6 Re-examination o f decision= making process 3.3.7 Use o f information 3.3.8 Consideration o f implementation and contingency plans 3.3.9 Effect on group discussions 3.3.10 Effect on decisionmaker's confidence
50
STEPHEN J. ANDRIOLE
multiattribute utility assessment structure for the evaluation of information and decision systems. Note the orientation to users, organizations, and accuracy. In a full-blown evaluation this structure would be used in conjunction with a conventional software quality assurance model to determine how well the system operates and supports users.
5.
Case Studies in the Deslgn, Development, and Applicatlon of C2Information and Decision Systems 5.1 The Range of Applications
Cz information and decision systems have been applied to a range of Cz problems. Many of these systems have remained as prototypes, since they were intended only to demonstrate or prove a concept. Many others, however, have enjoyed operational success. 5.1.1
Some Working Prototypes
CONSCREEN is a prototype system designed to assist tactical planners in the identification and evaluation of alternative courses of action (Martin, et al., 1983). The method that it uses is multi-attribute utility assessment. The system calls for the user to evaluate a number of courses of action vis-a-vis a set of criteria designed to measure each course of action’s desirabilitycriterion by criterion. Figure 25 presents the criteria in the multi-attribute utility structure used by planners to evaluate alternative courses of action. After planners assess the alternative courses of action, the system calculates the plan’s overall utility, or value, to the planner. The system generates a rankordering of plans according to the extent to which they satisfy the criteria. OBlKB is an expert system that helps tacticians manage the order of battle (Weiss, 1986). OBlKB uses rules to help commanders make real-time battle management decisions. The system uses a graphic (map-based) interface intended to widen the communications bandwidth between the system’s knowledge base and its users. The specific tasks that the prototype performs include tracking enemy movements, identifying enemy location and disposition, and generating estimates of the battlefield situation. KNOBS (Knowledge-BasedSystem)assists tactical Air Force commanders in mission planning, specifically offensive counter-air mission planning (Tachmindji and Lafferty, 1986). The system assists commanders in the allocation of their resources, selecting the weapons that should be matched with known targets, prioritizing the air tactics that should be used, and the assessment of the impact of constraints, such as adversary behavior and ’
MISSION ACCOtlPLISHtlENT
I
THE~TER CONSIDERATIONS
GLOBAL CONSIDERATIONS I
I
READINESS
DOMESTIC
OBJECTIVE OFFENSIVE
1
ECONOMY OF FORCE
UNITY OF COMMAND
I
SURPRISE
I
I
INTERNATIONAL
52
STEPHEN J. ANDRIOLE
weather. KNOBS is knowledge-based, that is, it uses a knowledge base consisting of frames and rules about mission planning. 5.7.2 Operational Systems
Information and decision systems are used by commanders in a number of environments. They are used to assess the value of alternative targets. They are used for complex route planning. They are used to match weapons to targets. Many of these systems are “embedded” in much larger weapons systems, while others are “stand-alone” adjuncts to larger systems. Several prototypes have led to the issuance of “required operational capability” memoranda which have, in turn, produced support for the development of full-scale decision-making and larger support systems. In the late 1970s, for example, the U.S. European Command (EUCOM) required the capability to perform real-time decision analyses of complicated tactical option selection problems. This requirement led to the development of several systems for EUCOM use. The U.S. Air Force is embedding decision aids in cockpits as well as in the Tactical Air Commands. The Navy is using information systems for the placement of sonar buoys, for battle management planning, and for distributed decision-making. The Worldwide Military Command and Control System (WWMCCS) is perhaps the military’s largest information system that embeds a variety of decision aids. Information and decision systems are used in the intelligence community to estimate adversary intentions, forecast international crises, and assess the likelihood of new weapons programs in selected countries.
5.2 The Group Planning Prototype The tactical group planning systems engineering process represents how generic ISE principles are often applied to design and develop Cz information and decision systems. This section describes how requirements were converted into a working prototype, This section describes progress made by George Mason University’s Department of Information Systems and Systems Engineering (GMU/ISSE) on a project funded by TRWs Internal Research and Development (IR&D) program. The research was clearly directed: design and develop some prototype decision support systems predicated upon the importance of group problem-solving and the arrival of high-resolution large-screen displays. It is described here because it represents an example of structured information systems engineering. Requirements were identified and modeled, and an interactive prototype was developed.
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
53
The research areas of interest during the project included the need for interactive systems to support group problem-solving, the need to design and develop systems that could be used in actual operational environments, and how large-screen display technology can be exploited for enhanced human problem-solving. The substantive area of interest was (Army) tactical planning at echelons above Corps.
5.2.7 Information Systems Engineering Backdrop The project assumed from the outset that a structured systems design and development would yield useful results. We adopted the classic systems engineering life cycle (DSMC, 1986) and then modified it for information systems engineering purposes. A substantial amount of project resources were thus devoted to requirements analysis and functional modeling (see Section 5.2.2). 5.2.2 Requirements Analyses We began the research with requirements analyses in both domains. The first analysis identified the tasks and sub-tasks that tactical planners (at echelons above Corps) must perform to develop Concepts of Operations. A “requirements hierarchy” was developed that identified and described each task and sub-task. We then developed a second hierarchy that focused exclusively upon user-computer interaction (UCI) requirements (Andriole, 1987a, 1987b). Both of these hierarchies-with narrative equivalentsappear in Appendix A. We then identified a set of group requirements as an overlay to the substantive and UCI requirements (Andriole, 1988). Some of these include the (system’s) capability to share data and knowledge, share and challenge perspectives, defuse biases, focus discussion, and present data, inferences, options, explanations and recommendations in ways compatible with the requirements, the characteristics of users, and the organization structures the systems might eventually support. Our requirements analyses also developed some profiles of planners and crisis managers, which by and large characterized these users as inexperienced with advanced computing and likely to make intermittent use of interactive group decision support systems (Ehrhart, 1988).
5.2.2.7 Substantive Requirements The substantive requirements that we identified-as suggested in the hierarchies-were varied, reflecting the variety of tasks and sub-tasks that decision-makers perform as they generate courses of action. Theater planners require information about terrain, adversary capabilities, their own combat capabilities, and other aspects of
54
STEPHEN J. ANDRIOLE
tactical planning. The substantive requirements hierarchies identify all such tasks and sub-tasks, arranging them in way that permitted us to convert them into system (interface) characteristics. 5.2.2.2 UCl Requirements In addition to substantive requirements hierarchies, hierarchies were developed that identified the unique user computer interface (UCI) requirements that the users and the substantive requirements would necessitate. The UCI requirements were decidedly visual, suggesting the need for interactive and animated graphic displays. This requirements finding was not surprising: planners and crisis managers are trained in the visual and graphic via the use of maps and the communication of complex concepts such as risk, opportunity, and constraints (Andriole, 1987e). All sorts of display requirements emerged. Some were anchored in the need for graphic equivalence (Andriole, 1986) of complicated data sets. Some required animation, while others required graphics for direct manipulation of battlefield data and processes. The system concept that emerged reflected these requirements and our response to them. 5.2.2.3 Group Problem-Solving Requirements We identified a set of group problem-solving requirements as well (Andriole, 1988). These requirements were screened with reference to the substantive and UCI requirements. The intention was to restrict group requirements to the domains, users and unique interface requirements therein; the system concepts that emerged reflected this screening. Group decision support systems design and development has received some increased attention over the past few years (Stefik, et al., 1987; Sage, 1988; DeSanctis and Gallupe, 1987; Hart, et al., 1985; Schweiger, et al., 1985; and Gladstein and Reilly, 1985). Much of this literature is abstract and theoretical, while much less is applied. The storyboard prototypes developed during the project were problem-driven, not theoretically inspired, except where theoretical findings mapped clearly onto the application area. (At the same time, it is safe to say that theoretical work in human factors, cognitive science, and UCI technology played a larger role in the design and development of the prototype.) 5.2.3 Storyboard Prototypes
The ISE process suggests that before programming can begin, requirements must be identified and modeled. Storyboarding is a technique that converts requirements into system concept, a model of what the system will do when it is programmed (Andriole, 1989b). The working model of the system is a
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
55
prototype designed to accelerate the requirements validation process. The purpose of the prototype is to enhance communication between systems analysts and users. We designed and developed several interactive storyboard prototypes for the project and we converted the substantive and UCI requirements into interactive system concepts that represented how the system might operate when actually programmed. 5.2.3.1 Master Menu Structures The master menu structures were designed with inexperienced users in mind, with substantive and UCI requirements in mind, and with the intention of eventually deploying the system via a large-screen display system. We thus gave up substantial portions of the displays-for stationary on-screen menu options-since the impact would be less keenly felt when projected on a larger screen. The master menu structure-along with a representative set of screen displays-appears in Appendix B. 5.2.3.2 Prototype Capabilities The storyboard prototypes were developed on an Apple Macintosh 11. Color was utilized to communicate a variety of concepts, options, data, and information. The displays in Appendix B suggest how the storyboard can be operated, its interface capabilities, and what the displays look like to users. The menu structure permits users to execute commands by clicking on just a few icons which represent the elements of tactical planning and counterterrorism crisis management and available options. Another way to describe the interface and operation of the prototype is to liken the elements to “objects” and system commands as “functions.” It is possible to convert narrative commands such as “show us the courses of action that G2 expects Red to consider” by clicking on two icons, “Show” and “Enemy COAs.” It is possible to overlay weather onto the COAs by clicking on “Overlay,” “Area Characteristics,” and “Weather.” It is possible to “mix and match” commands to create great flexibility, and thereby permit users to access information in the system in non-sequential random ways. This flexibility is important to the overall system concept. Many systems permit users to interact with the system’s contents in rigid ways; we tried to design a system concept and interface that would permit users relatively unrestricted access to the system’s data and knowledge bases. 5.2.3.3 System Sizing We undertook a “sizing” effort to determine how difficult it would be to convert the working prototype into a working system. It was determined that programming would indeed be feasible on the Apple Macintosh I1 (or on a variety of other systems); it would also be possible to
56
STEPHEN J. ANDRIOLE
develop the necessary data and knowledge bases for selected domains (Andriole, 1987e; Andriole, et al., 1988). 5.2.4 IS€ Results
Modern computing technology permits us to design and develop interfaces and interaction routines that a few years ago would have been prohibitively expensive. The storyboard prototypes demonstrated some new approaches to UCI design, group decision support, and the use of large-screen display technology. The prototypes suggested that large-screen-based decision support systems can be used to help solve analytical problems via option generation and inference-making. Previous large-screen display-based systems have excelled in conventional database management tasks but have failed to provide analytical support to users. We believe it is possible to go far beyond conventional display capabilities. The prototypes also suggested how requirements can be converted into system concepts that exploit advanced display technology. The UCI requirements hierarchies developed during the course of the project identified a variety of simple and complex displays that could be implemented only with powerful graphics capabilities. We learned how to leverage color, graphics, animation, and the like during the course of the project. We also designed an interface that is very flexible, thus relieving users from any significant learning burden, It is possible to use the prototypes immediately. If the actual system existed, it is difficult to imagine the need for a users’ manual longer than a few pages. The prototypes also illustrated the power of direct manipulation interfaces (Potter, 1988). Next-generation personal’ and group-oriented workstations will employ such interfaces, since they link users directly with system capabilities, permit easy interaction, and provide systems designers with modular flexibility for enhanced evolutionary development. Many of the requirements identified during the requirements analysis phase of the project could not be satisfied with the project’s hardware/software configuration. Here the reference is to visual requirements such as actual maps, photographs, video information and sound, among other “media.” Next-generation systems will incorporate multimedia technology directly into their designs (see Section 6.9). Planners will, for example, be able to drive down highways, peer over bridges, and assess key terrain via film footage of the area of operations. Groups will be able to move from conventional alphanumeric data and information to multimedia with ease. Information systems engineers will be able to satisfy user requirements with a larger arsenal, providing users with presentation technologies that will narrow the communications bandwidth between users and system capabilities.
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
6.
57
Next Generation Command and Control Information Systems Engineering
6.1
Emerging Issues and Challenges
The design, development and use of information and decision supportswithin and beyond the domain of command and control-will change dramatically as we approach the 21st century. Our expectations for what these systems should do are rising as rapidly as the requisite technology is evolving. By the year 2000, problem-solvers will use advanced systems to deal with all sorts of simple and complex problems. They will also benefit from systems capable of providing much more than database support and low-level inference-making. Just as important, the distribution of computing power will be expanded beyond specialized professionals; information and decision systems will be available to us all on and off the job. This section examines the new trends, and attempts to describe how C 2 information and decision systems will be designed, developed and used a decade or so from now. It is thus speculative, though anchored in trends definable today. The section addresses the following questions: 0
0
0
How will definitions of information and decision support evolve over time? How will emerging methods, models and technologies affect the design, development and use of next-generation C2 information and decision systems? What role will future systems play in the aggregate information systems world? 6.2
The Range of
C’ Information and Decision Support
It is safe to say that many information and decision systems support command and control decision-making indirectly. There are systems that manage projects, provide easy access to operational data, and otherwise deal with relatively structured problems. This distinction between “structured” and “unstructured” targets of opportunity is important to understanding the range of today’s systems and the promise of tomorrow’s. Early proponents of information technology hoped that their systems would help decisionmakers generate, compare, and implement decision options, but most systems indirectly support these activities. Real-time option generation and evaluation has evaded designers-except in some rare instances where a special (single) purpose system was developed to address a very well-bounded problem (Andriole, 1989a).
58
STEPHEN J. ANDRIOLE
The range of next-generation C2 information and decision support systems applications will grow considerably. Next-generation systems will be capable of addressing operational, tactical and (some) strategic structured and unstructured problems via the application of data, knowledge, and models that exploit their integration. Figure 26 suggests where the action will be. The technology in applied analytical methodology, user-computer interface (UCI) techniques, and display technology, will grow considerably as the nature and depth of application domains expands. The new, applications perspective on Cz information and decision systems support will be extremely broad, reflecting the capabilities of new systems that will be embedded and functional on many levels. Future systems will permit decision-makers and information managers, resource allocators and administrators, and strategic planners and inventory controllers to improve their efficiency. The broad perspective will be permitted by the new technology (see Section 6.3) that will emerge over the next several years and by centralization in the military workplace, a centralization that will not be “ideological” but rather driven by the same technology that will permit the design and development of more powerful systems. In all likelihood, the movement toward technological networking and system integration will translate into new imperatives for the management of information and decision-making. In short, command centers of the future will transform themselves because new technology will permit the change and, significantly, because it will demand it. What will be driving what? Will new C2 requirements suggest new system requirements, or will new systems suggest new requirements? Would nextgeneration systems look the way we expect them to look if they were conceived in an applications vacuum, or are the interpretation and anticipation of applications driving the “form” that future systems will take? While definitions of decision support will grow, so too will our understanding of computer-based problem-solving. Decision support, while very broad in concept and application, will nevertheless be subsumed under the general rubric of computer-based problem-solving which, over time, will also experience radical change. Expectations about what computers can do for users will continue to rise. Consistent with the evolution in expectations about the power of next-generation systems, computer-based problem-solving systems of all kinds must satisfy analytical requirements. 6.3
Advanced Information Technologies
There are a variety of tools, methods, techniques, devices and architectures available to the information systems engineer; many more will emerge as we
1980s C 2 INFORNATION h DECISION PROBL ENS
\
I
ANALY 1ICAL MODELS & NETHODS
L
NEXT GENERATION PROCESS1NWDISPLAY TECHNOLOGY
W ul
ADVANCED USER- CONP UTER INTERFACE (UCI) TECHNOLOGY
L7+ C2 INFORMATION DECl SlON
PROBLEMS FIG.26. Technology opportunities for next generation Cz information systems engineering.
1
0 Probability
0
0
Assessment Anomalous Event flatrix 0 Brainstorming
0
0
0
QUALITATIVE
Case Study Panels structured Opinion Polling) 0 Simulated Opinion Polling
Influence Diagramming Hierarchical Inference 0 Decision Analysis 0 llultiattribute Utility
Cost-Benefit Analysis Change Signals monitoring
0 0
-
0 GE W I
2. STRUCTURED
-
3. TIME SERIES/
0
-
BOOTSTRAPPING
0 m
0
Growth Curves, Trends, h
0
Descriptive Profiling Correlation
0
Leading Indicators
5. STATISTICAL/ OPERATIONS
.
0 0
Cycles 0 Smoothing Hethods
0
QUANTITATIVE
meory
Relevance Trees
llarkw flodels Bayesian
0 Qiantai
Choice
Sampling Pattern Recognition 0 Linear Programming 0 Dynamic Programming 0 Qieuing Theory
0
0
0 Econometric Hodels 0
System Dynamics plodels (Slmulatlon)
Conventional Algorithmic Nethods 0 Hassage Processing 0 Scheduling flethods
0
INFORnATlON Expert Systems Natural Language Processing 0 Others 0
0
FIG.27. A taxonomy of methods and models (Hopple, 1986).
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
61
move toward the 21st century. The challenge-as always-lies in the extent to which designers can match the right tool or method with the appropriate problem. This section looks at a number of technology options now available to the designer, options that will evolve quite dramatically during the next five to ten years. 6.3.1 Models and Methods
Figure 27 suggests the range of methods and models available to the designer today (Hopple, 1986).The taxonomy-which expands the notion of methodological support introduced in Section 4.3-is by no means complete, though it is representative of the way methods, tools and techniques can be categorized and assessed. Figures 28,29, and 30 from Sage and Rouse (1986) suggest how several of the major methods classes can be described and assessed. “Assessment,” of course, is the key. Information systems engineers must know precisely what method (or methods) to apply to which requirements. Figure 31 from Andriole (1989a) suggests how methods can be rankordered against a set of requirements.
Cognitive Science
Artificial Intelligence
Objectives
I
Methods
Oprratlons Research and Control Theory
Software Performance
Explanat ion Of Cognition
Aiding Decisions
Prediction and Optimization
Symbolic ; Powerful Software Tools
Symbolic; ComputationOriented
Numerical ; Axiom at ic Dec. Rules
Numerical; Predictive Models
Decislon Prescript ions; Software/Tools
Opti ma1 Solutions
WellFounded and Structured
Breadth and Rigor
Biased Toward Choice; Not Judgement i n Context
Avoids Context; Assurnpt ion Laden
~~
Software and Tools
Products
Decislon Analysis
~~~
~~
Thew ies ; Training and ~~
Strength
I
ExDloitation of context; Software Tools
Pr mess Over Product of Behavior
I
weaknesses
I
Often Ad Hoc; Avoids Engineering Methodology
Often Ad Hm; Avoids Psychological Methodology
FIG.28. Methods/models descriptions (Sage and Rouse, 1986).
A l L Expert Systems Objectives/ Expectations Typical Analytical Concern or Ouestion to Subject Products
I
output Weaknesses: Input Process
Output
I
Human/System Interaction
-
Effective Interface (Adaptive, intelligent?) Between Humans and Decision Tools Transparent Interface
Design o f "Intelligent" Systems (KBS) Human-like Capacities Heuristics
Normative Modeling. Aids t o Option Generation, choice & selection
"Models" o f Human Processes, Search 8. Representation, Symbolic Processing
Decomposition, structure elicitation Prototyping; Performance Modelins o f Subjective Judgements, Evaluation o f Cost/Benefit MAU, Bayes Policy Capturing
LISP Machines, Shells
Process Aids (Computer. Other) Structure
Explicit Model. Ad Hoc
Normative Models, Face Validity o f Models
Explanation Facility
Audit Trail Sensitivity Analysis
I
Primitive
-
-
Strengths: Input Process
Decision Analysts h Decision Support
G e n i a i lnlensive
Comprehensibility, May Use Unrealistic Belief Structures
Normative Models
Comprehensibili ty
Sensitivity Analysis FIG.29. Multi-criteria methods assessment (Sage and Rouse, 1986).
I10 Devices, Configurations
OR h Control Engineering
Products w m
Strengths : Input Process output Weaknesses: Input Process output
-
Facilitate Storage, Retrieval, Manipulation o f Data (Information)
Understanding & Describing Human Cognition
Linear Programs, Math Modeling, Optimization, Dynamic Systems Analysis, Statistics
Relational, Hierarchical, Spatial Structuring, Retrieval Strategies
Modeling Theories Lens Model Attribution Theory Empirical/Experimental
System Model "Autopilot," etc. (Control Systems) Aids Modeling/Simulation Languages
DBMS Software, Database Designs
Human Models/Decision Heuristics (Validated? 1 Capabi lit y and/or Limitation Assessments
Objectives/ Provide Results of Quantitative Expectations Analysis t o Clients Typical Analytical Concern or Question t o Subject
Cognitive Science/ Psychology
Data Base Management
Can Use Massive Real Data Rigorous; Engineering Methods Quantification Requires Volumes o f Data
I I
Experimental /Empiri cat
Inflexible
"Artificial" Lack o f Face Validity (GIGO)
Mind Reading, laboratory Bound
Explanation Facility
Narrow Domains
FIG.30. Multi-criteria methods/models assessment (Sage and Rouse, 1986).
-
Activate
a
Close.
f
Adjust Syncronize...
3
5%
88
sI s ax +9
AA
O(
IV
Advise Inform Instruct Request...
ccs
ccs
ccs
Al
Al
Al
Al
dR
oh
I
&
MS DA
MS DA
MS DA
MS
css
MS
DA A1 OR DA At OR
MS OR A1 DA OR A1 DA
A
MS DA CCS
ccs
MS
ccs
MS
MS
OR Al DA INEXPERIENCED (I)
A1 OR DA EXPERIENCED (El
A1 OR DA
MS
ccs ccs
:
I
I
CCS MS At Al DA OR OR DA DA Al Al OR MS DA OR MS
ccs
Information Processing Problem-solving And Decision-making.. . Search For/ Receive Information Identify Objects, Actions, Events...
mnm
ccs
MS
ccsi ccs
ccs
Al OR CCS MS DA
I/E INFREQUENT
USERS FIG.3 1. Some models/methods ranking.
I
DA OR CCS
---
Legend
Decision Analysis Operations Research Conventional Computer Science A1 = Artifical Intelligence HS = Management Science
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
65
Over the past few years the ISE community has seen the preeminence of knowledge-based tools and techniques, though the range of problems to which heuristic solutions apply is much narrower than first assumed. It is now generally recognized that artificial intelligence (AI) can provide knowledgebased support to well-bounded problems where deductive inference is required (Andriole, 1990; Andriole and Hopple, 1988). We now know that A1 performs less impressively in situations with characteristics (expressed in software as stimuli) that are unpredictable. Unpredictable stimuli prevent designers from identifying sets of responses, and therefore limit the applicability of “if-then’’ solutions. We now know, for example, that expert systems can solve low-level diagnostic problems, but we cannot predict Soviet intentions toward Poland in 1995. While there were many who felt from the outset that such problems were beyond the applied potential of AI, there were just as many who were sanguine about the possibility of complex inductive pr oblem-solving. The latest methodology to attract attention is neural-network-based models of inference-making and problem-solving. As Fig. 32 suggests, neural networks are applicable to problems with characteristics that are quite different from those best suited to AI. Neural networks are-according to Hecht-Nielsen (as quoted in North, 1988)-“computing systems made up of a number of simple, highly interconnected processing elements which process information by their dynamic state response to external inputs.” Neural nets are non-sequential, non-deterministic processing systems with no separate memory arrays. Neural networks, as stated by Hecht-Nielsen, comprise many simple processors that take a weighted sum of all inputs. Neural nets do not execute a series of instructions, but rather respond to sensed inputs. “Knowledge” is stored in connections of processing elements and in the importance (or weight) of each input to the processing elements. Neural networks are allegedly non-deterministic, non-algorithmic, adaptive, selforganizing, naturally parallel, and naturally fault-tolerant. They are expected to be powerful additions to the DSS methodology arsenal, especially for datarich, computationally intensive problems. The “intelligence” in conventional expert systems is pre-programmed from human expertise, while neural networks receive their “intelligence” via training. Expert systems can respond to finite sets of event stimuli (with finite sets of responses), while neural networks are expected to adapt to infinite sets of stimuli (with infinite sets of responses). It is alleged that conventional expert systems can never learn, while neural networks “learn” via processing. Proponents of neural network research and development have identified the kinds of problems to which their technology is best suited: computationally intensive; non-deterministic; nonlinear; abductive; intuitive; real-time; unstructured or imprecise; and nonnumeric (DARPA/MIT, 1988).
66
STEPHEN J. ANDRIOLE
FIG.32. The applicability of artificial intelligence and neural network-based models and methods.
It remains to be seen if neural networks constitute the problem-solving panacea that many believe they represent. The jury is still out on many aspects of the technology. But like AI, it is likely that neural nets will make a measured contribution to our inventory of models and methods. What does the future hold? Where will the methodological leverage lie? In spite of the over-selling of AI, the field still holds great promise for the design and development of C2 information and decision systems. Natural language processing systems-systems that permit free-form English interaction-will enhance the efficiency of information and decision systems support and will contribute to the wide distribution of information and decision systems. The Artificial Intelligence Corporation’s INTELLECT natural language processing system, for example, permits users to interact freely with a variety of database management systems. The BROKER system, developed by Cognitive Systems Inc., permits much the same kind of interaction with the Dow Jones databases. These systems are suggestiveof how natural language interfaces will evolve over time and of how
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
67
users will be able to communicate with databases and knowledge bases in ways that are compatible with the way they address human and paper data, information and knowledge bases. When users are able to query their systems in much the same way they converse with human colleagues, then the way problem-solving systems will be used will be changed forever. Of particular interest in the disproportionate attention that natural language interfaces have received vis-a-vis expert systems. This imbalance will be redressed by the year 2000. Expert systems will also render many decision-making processes routine. Rules of tactical planning, resource allocation, and target-weapons matching will be embedded in expert information and decision systems. Problems that now have to be re-solved whenever a slight variation appears will be autonomously solved. Smart database managers wilI develop necessary databases long before decision support problems are identified. Next-generation systems will be capable of adapting from their interaction with specific users. They will be able to anticipate problem-solving “style” and the problem-solving process most preferred by the user. They will be adaptive in real-time and capable of responding to changes in the environment, such as a shortage of time. The kinds of problems that will benefit the most from A1 will be wellbounded, deductive inference problems about which a great deal of accessible and articulate problem-solving expertise exists. The community will abandon its goals of endowing computer programs with true inductive or abductive capabilities in the 1990s, and the dollars saved will be plowed back into socalled “low-level’’ AI. Future information systems engineers will also benefit from a growing understanding of how humans make inferences and decisions. The cognitive sciences are amassing evidence about perception, biasing, option generation, and a variety of additional phenomena directly related to modeling and problem-solving. The world of technology will be informed by new findings; resultant systems will be “cognitively compatible” with their users. Next-generation systems will also respond to the situational and psychophysiological environment. They will alter their behavior if their user is making a lot of mistakes, taking too long to respond to queries, and the like. They will slow down or accelerate the pace, depending on this input and behavior. The field of cognitive engineering-which will inform situational and psychophysiological system design strategies-will become increasingly credible as we approach the 2 1st century. The traditional engineering developmental paradigm will give way to a broader perspective that will define the decision-making process more from the vantage point of requirements and users than computer chips and algorithms. Principles of cognitive engineering will also inform the design and human-computer interfaces (see Section 6.3.2).
68
STEPHEN J. ANDRIOLE
Some future software will be generic and some will be problem-specific. Vendors will design and market generic accounting, inventory control, and option selection software. These models will be converted into templates that can be inserted directly into general-purpose systems, The template market will grow dramatically over the next five to ten years. It is extremely important to note the appearance of system dkvelopment tools. Already there are packages that permit the development of rule-based expert systems. There are now fourth-generation tools that are surprisingly powerful and affordable. These so-called “end-user’’ systems will permit onsite design and development of systems that may only be used for a while by a few people. As the cost of developing such systems falls, more and more throwaway C2 systems will be developed. This will change the way we now view the role of decision support in any organization, not unlike the way the notion of rapid application prototyping has changed the way application programs should be developed. Hybrid models and methods drawn from many disciplines and fields will emerge as preferable to single-model-based solutions, largely because developers will finally accept diverse requirements specifications. Methods and tools drawn from the social, behavioral, mathematical, managerial, engineering, and computer sciences will be combined into solutions driven by requirements and not by methodological preferences or biases. This prediction is based in large part upon the maturation of the larger design process, which today is far too vulnerable to methodological fads. Hybrid modeling for information and decision systems design and development also presumes the rise of multidisciplinary education and traini’ng, which is only now beginning to receive serious attention in academia and industry. 6.3.2 User-Computer Interface (UCI) Technology
Twenty years ago no one paid much attention to user interface technology. This is understandable given the history of computing, but no longer excusable. Since the revolution in microcomputing-and the emerging one in workstation-based computing-software designers have had to devote more attention to the process by which data, information and knowledge are exchanged between the system and its operator. There are now millions of users who have absolutely no sense of how a computer actually works, but rely upon its capabilities for their professional survival, A community of “thirdparty” software vendors is sensitive to both the size of this market and its relatively new need for unambiguous, self-paced, flexible computing. It is safe to trace the evolution of well-designed human-computer interfaces to some early work in places such as the University of Illinois, the Massachusetts
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
69
Institute of Technology, (in what was then the Architecture Machine Group, now the Media Lab), Xerox’s Palo Alto Research Center (Xerox/PARC), and, of course, Apple Computer, Inc. The “desk-top” metaphor, icon-based navigational aids, direct manipulation interfaces, and user guided/controlled interactive graphics-among other innovations-can all be traced to these and other organizations. Where did all these ideas come from? The field of cognitive science and now “cognitive engineering” is now-justifiably-taking credit for the progress in UCI technology, since its proponents were the (only) ones asking why the usercomputer interaction process could not be modeled after some validated cognitive information processing processes. UCI models were built and tested, and concepts like “spatial database management” (from MIT’s Architecture Machine Group (Bolt, 1984)),hierarchical data storage, and hypertext were developed. It is no accident that much UCI progress can be traced to findings in behavioral psychology and cognitive science; it is indeed amazing that the cross-fertilization took so long. UCI progress has had a profound impact upon the design, development and use of C2 information and decision systems. Because many of the newer tools and techniques are now affordable (because computing costs have dramatically declined generally), it is now possible to satisfy complex UCI requirements even on personal-computer-based systems. Early data-oriented systems displayed rows and rows (and columns and columns) of numbers to users; modern systems now project graphic relationships among data in highresolution color. Information systems engineers are now capable of satisfying many more substantive and interface requirements because of what we have learned about cognitive information processing and the affordability of modern computing technology. The most recent progress in UCI technology is multimedia, or the ability to store, display, manipulate and integrate sound, graphics, video and good-oldfashioned alphanumeric data (Ragland, 1989; Ambron and Hooper, 1987; Aiken, 1989). It is now possible to display photographic, textual, numerical, and video data on the same screen, as Fig. 33-from Aiken (1989)-suggests. It is possible to permit users to select (and de-select) different displays of the same data. It is possible to animate and simulate in real-time-and costeffectively.Many of these capabilities were just too expensive a decade ago and much too computationally intensive for the hardware architectures of the 1970s and early 1980s. Progress has been made in the design and execution of applications software and in the use of storage devices (such as videodisks and compact disks (CDs)). Apple Computer’s Hypercard software actually provides drivers for C D players through a common UCI (the now famous “stack”). Designers can exploit this progress to fabricate systems that are consistent with the way their users think about problems. There is no question
Linkages
.-.---.-.--
I * . ‘ . -
I
-J 0
Code FIG.33. Multimedia technology (Aiken, 1989).
u
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
71
that multimedia technology will affectthe way future systems are designed and used. The gap between the way humans “see” and structure problems will be narrowed considerably via the application of multimedia technology. Direct manipulation interfaces (DMIs) such as trackballs, mice and touch screens have also matured in recent years and show every likelihood of playing important roles in next-generation information and decision systems UCI design and development.While there is some growing evidence that use of the mouse can actually degrade human performance in certain situations, there are countless other situations where the payoff is empirically clear (Ramsey and Atwood, 1979; Ledgard et al., 1981; Bice and Lewis, 1989). Touch screens are growing in popularity when keyboard entry is inappropriate and for rapid template-based problem-solving (Smith and Mosier, 1984). The use of graphical displays of all kinds will dominate future UCI applications. Growing evidence in visual cognition research (Pinker, 1985) suggests how powerful the visual mind is. It is interesting that many problemsolvers-professionals who might otherwise use information or decision systems-are trained graphically, not alphanumerically. Military planners receive map-based training; corporate strategies use graphical trend data to extrapolate and devise graphic scenarios; and a variety of educators have taken to using case studies laden with pictures, icons, and graphics of all kinds. Complicated concepts are often easily communicated graphically, and it is possible to convert complex problems from alphanumeric to graphic form. There is no question that future C2 systems will exploit hypermedia, multimedia, and interactive graphics of all kinds. Speech input and output should also emerge over the next five to ten years as a viable UCI technology. While predictions about the arrival of “voice activated text processors” have been optimistic to date, progress toward even, continuous speech input and output should be steady. Once the technology is perfected there are a number of special-purpose applications that will benefit greatly from keyboard- and mouse-less interaction. The use of advanced UCI technology will foster a wider distribution of information technology. Early information and decision were used most productively by those familiarwith the method or model driving the system as well as interactive computing itself. In other words, in order to exploit information technology one had to have considerable computing expertise. Advanced UCI technology reduces the level of necessary computing expertise. Evidence suggests that training costs on the Apple Macintosh, for example, are lower because of the common user interface. Pull-down and pop-up menus, windows, icons, and direct manipulation via a mouse or trackball are all standard interface equipment regardless of the application program (and vendor). If you know how to use one Macintosh program, chances are you can use them all to some extent. Such interface uniformity is unheard of in other
72
STEPHEN J. ANDRIOLE
than Macintosh-based software systems, yet illustrates the enormous leverage that lies with the creative application of advanced UCI technology. UCI technology will also permit the use of more methods and models, especially those driven by complex-yet often inexplicable-analytical procedures. For example, the concept of optimization as manifest in a simplex program is difficult to communicate to the typical user. Advanced UCI technology can be used to illustrate the optimization calculus graphically and permit users to understand the relationships among variables in an optimization equation. Similarly, probabilistic forecasting methods and models anchored in Bayes’ Theorem of conditional probabilities, while computationally quite simple, are conceptually convoluted to the average user. Log odds and other graphic charts can be used to illustrate how new evidence affects prior probabilities. In fact, a creative cognitive engineer might use any number of impact metaphors (such as thermometers and graphical weights) to present the impact of new evidence on the likelihood of events. Finally, advanced UCI technology will also permit the range of information and decision support to expand. Any time the communications bandwidth between system and user is increased, the range of applied opportunities grows. UCI technology permits designers to attempt more complex system designs due to the natural transparency of complexity that good UCI design fosters. Some argue that the interface may actually become “the system” for many users. The innards of the system-like the innards of the internal combustion engine-will become irrelevant to the operator. The UCI will orchestrate the process, organize system contents and capabilities, and otherwise shield users from unfriendly interaction with complex data, knowledge, and algorithmic structures.
6.3.3 Hardware The hardware that supports the application of information technology and the information systems engineering process today is “conventional.” There are turnkey systems as well as generic hardware configurations that support the use of numerous information and decision systems. CPUs, disk drives, keyboards, light pens, touch screens, and the like can be found in a variety of DSSs. There are also microcomputer systems, as well as systems that require larger (minicomputer) hardware configurations. Next-generation C2 information and decision systems will be smaller and cheaper, and therefore more widely distributed. They will be networked, and capable of up-loading and down-loading to larger and smaller systems. Input devices will vary from application to application as well as the preferences of
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
73
the user. As suggested above, voice input will dramatically change the way a small set of systems are used in the future; voice activated text processing will expand system capabilities by linkages to word processing and report preparation in a “natural” unobtrusive way, though it is likely that robust voice activated systems will not appear until the late 1990s. Many systems will have embedded communications links to databases and knowledge bases, other systems on networks, and the outside world via conventional communication systems. IBM’s acquisition of Rolm suggests that the merger between computing and voice systems is well underway. Future systems will have (some selected) voice input capabilities, conventional headset communications, deep database linkages, and a “place” on a much larger information and decision support system network. Briefcase- and smaller-sized computers will become widespread. The embedding of spreadsheets in popular portable microcomputers suggests that information and decision support chips will be developed and embedded in future hardware configurations. In fact, not unlike some of the more powerful calculators of the 1970s, future systems will permit users to mix and match chips within a single processor. Future C2 information and decision systems will also be integrated with video display systems of several genres. There will be video-disk-based systems as well as packaged systems that integrate powerful computergenerated imagery capabilities. The cost of both video options is falling rapidly, and the military consumer of the future will be able to select the one that best serves his or her needs. It is safe to say that video will become integral to future information and decision support. Behavioral scientists have just about convinced system architects-via the amassing of tons of evidence-that information, concepts, and many ideas can be communicated much more effectively via graphic, symbolic, and iconic displays (Smith and Mosier, 1984; Schneiderman, 1987). Systems that do not have these and related capabilities will fail. The revolution in high-resolution display technology will exert a profound impact upon next-generation systems design and use. Many UCI technologies will exploit high-resolution displays, thereby accelerating the movement toward graphic computing. Processor technology is also evolving rapidly. Just a decade ago, most of us computed on Intel 8088 microprocessors, while today everyone is waiting for the 486. Processors such as the Motorola 68030 (and next-generation 68040) have placed enormous power not only in the hands of users, but-perhaps more important-system designers as well. It is safe to say that applications software is today lagging the capabilities of such chips; at the same time, even
74
STEPHEN J. ANDRIOLE
assuming a consistent lag, systems in the 1990s and beyond will benefit from applications software that exploits the revolution in microprocessor design. The issue of power, however, does beg the question of larger requirements. In other words, it is safe to assume that raw computing power will be ready for next-generation system concepts. The challenge-as always-will lie in the application of the power to validated user requirements. If the truth be told, there are many successfulsystems that today require less than 20% of available computational power; many future systems may well find themselves with abundant power-and nowhere to go! Regardless of available computing power, information and decision systems engineers will have to adhere to sound information systems engineering principles well into the 1990s and through the foreseeable future (Andriole, 1990). We are witnessing the demise of the distinction among mainframe, miniand microcomputers. Tomorrow there will be “workstations.” Some will be more powerful than others, but nearly all will be available to individuals at reasonable prices. The balance between capability and price will continue to perplex vendors, since users will demand more and more capabilities for less and less money. Pricing strategies will determine how much power becomes “affordable.” Future systems design and use will work within this changing marketplace and, because of some new usage strategies (see Section 3.4), will remain largely unaffected by the instability of the workstation marketplace. 6.4
Integrated C2Information and Decision Support
Information and decision systems will be used very differently in the future than they are today. They may well function as clearinghousesfor professional problems. They may prioritize problems for military commanders, and they may automatically go ahead and solve some of them. They will become problem-solving partners, helping us in much the same way colleagues now do. The notions of systems as software or hardware, and users as operators, will give way to a cooperative sense of function which will direct the design, development, and application of the best C2 information and decision systems. They will also be deployed at all levels in the military organization. The distribution of DSSs will permit decision support networking, the sharing of decision support data, and the propagation of decision support problemsolving experience (through the development of a computer-based institutional memory of useful “cases” that might be called upon to help structure especially recalcitrant problems). Efficient organizations will actually develop an inventory of problem/solution combinations that will be plugged into their larger computer-based problem-solving systems architectures.
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
75
Next-generation systems will also communicate with systems in other organizations in other parts of the world. Falling costs of satellite communications will permit global linkages and contact with databases, expert systems, inventories, and the like, thereby multiplying the capabilities of in-house systems by orders of magnitude. This global networking is not decades away, but only five to ten years away. The military’s Worldwide Military Command and Control System (WWMCCS) and its WWMCCS Information System (WIS) represent the most ambitious attempts to coordinate and network information and decision systems. Unfortunately, WWMCCS and WIS do not support users in ways that approach technological capabilities or operator expectations. Next-generation information and decision systems engineering will solve many of the most serious problems with networks like WWMCCS and WIS. Advanced technology will permit linkages and coordination that were not possible ten years ago. The most important change will occur in the way next-generation information and decision systems interface with other information systems. Most contemporary systems are “disembodied,” that is, distinct from larger corporate, government or military information systems. Actual use of many systems involves leaving one system to activate another. It is common in the military for users to work alternately with mini- and microcomputers, manually feeding the output from one system into the other. A good deal of this can be explained by acquisition and procurement craziness, but just as much can be traced to obsolete concepts of how computer-based problemsolving systems should be used. As the range of target problems and capabilities increases, fewer and fewer systems will be disembodied; on the contrary, the most successful systems will be embedded in larger organizational and executive information systems. Future Cz information and decision systems will provide “portals” for users to explore. It will be possible to perform all sorts of tasks via myriad application programs (that ideally will have common user-computer interfaces). The whole concept of “decision support” will evolve to accommodate changes in the larger corporate, governmental, and military information systems structure. Networking and advanced communications technology will permit linkages to databases and knowledge bases-and the routines to exercise them. Not only will distinctions among mainframe, mini- and microcomputing fade, but distinctions among management information, executive information, and decision support systems will also cloud. Ironically, the concept of centralization may reappear, not with reference to central computing facilities but with regard to enormous systems conceived functionally as hierarchies of capabilities. Users may well find themselves within huge computing spaces capable of supporting all kinds of problemsolving. Advanced communications technology will make all this possible;
76
STEPHEN J. ANDRIOLE
users will be able to travel within what will feel like the world's largest mainframe, which conceptually is precisely what a global network of data, knowledge, and algorithms is. The same users will be able to disengage the network and go off-line to solve specific problems. This freedom will expand the realm of analytical computing in much the same way microcomputing expanded the general user community. Finally, all this technology will permit designers to fulfill user requirements in some new and creative ways. Until quite recently, technology was incapable of satisfying a variety of user requirements simply because it was too immature or too expensive. We have crossed the capability/cost threshhold; now designers can dig into a growing toolbag for just the right methods, models, and interfaces. By the year 2000, this toolbag will have grown considerably. Talented C2 information systems engineers should be able to match the right tools with the right requirements to produce systems that are user-oriented and cost-effective. The future of C2 information and decision systems design, development and use is bright. While some major changes in technology and application concepts are in the wind, next-generation systems will provide enormous analytical support to their users. We can expect the range of decision support to grow in concert with advances in information technology.
7.
Summary and Conclusions
This chapter has covered a lot of ground. Its goal has been the description and analysis of the generic information systems engineering (ISE) process, the domain of military command and control (C'), and the application of the principles of multidisciplinary information systems engineering to C2 information and decision systems engineering. Several key arguments have been made. One suggests that the range of tractable problems is growing as our information technology (and design strategies) grow. We are now in a position to satisfy more user and system requirements than we were able to approach just five years ago. New opportunities for the application of advanced information technology are rising dramatically. Next-generation C2information and decision systems will look and feel very different to users; they will be far more powerful, much easier to use, and able to communicate with problem-solving cousins distributed across large, secure and reconstitutable networks. The generic ISE process will also grow over time. Its multidisciplinary flavor will expand to embrace more and more disciplines and fields of inquiry. The need for cross-fertilization will become self-evident as our understanding of
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
77
substantive and user interface requirements deepens. It is likely that the need for multidisciplinary ISE will be addressed by the industrial, military, larger governmental, and academic communities via the development of thoroughly integrated research and development, and educational and training programs. ISE represents a relatively new way to think about systems design and development; C2 represents an expanding applications domain; the marriage between ISE and C2 is likely to yield some creative system solutions to “old” and “new” requirements. But perhaps most important, the central theme of this chapter-and its essential argument-is that without structure the design and development process will almost always fail. The ISE state of mind calls for the consistent application of a set of tools and techniques that together constitute a structured design methodology. The chapter also recognizes the importance of perennial information technology assessment. System solutions are not found only in structured design methodology; there is considerable leverage in the application of advanced and emerging technologies. ISE is structured, yet flexible enough to exploit new technological opportunities. Finally, there is an educational and training challenge assumed by ISE, a challenge that calls for multidisciplinary education and training. If we are unable to produce competent information and decision systems engineers, then our design philosophy and methodology will barely affect the systems landscape.
Appendix A Group (Army Theater Level) Tactical Planning Substantive and User-Computer Interface Tasks and Requirements This appendix contains the lists of substantive and user-computer interface (UCI) requirements that were distilled from interview and simulation data. The substantive requirements list the functions and tasks that planners must perform to generate optimal courses of action, while the UCI requirements reflect the kinds of displays and interaction routines necessary to support relatively inexperienced computer users. The lists are in two forms. They are organized as graphical hierarchies and in narrative form. The narratives provide some detail about precisely what the (substantive or UCI) requirement actually calls for. The requirements were, in turn, converted into “storyboards” of how the group decision support system might actually operate. These storyboards were organized into a working prototype; several storyboards from the prototype appear in Appendix B.
78
STEPHEN J. ANDRIOLE
Substantive Planning Requirements R
Planning Requirements 1
Mission Statement 2
Military Objectives
3 1
Objective Rank-Ordering
Area Charmteristics
2
3
Topographic
3
CIimatic/Weather
Telecommunications
1 h b a t Capabilities 2
2
Red Capabilities
3
Location/Disposition
3
Time/Space Factors
3
"Efficiency"
Blue Capabilities
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
13I
StrengthlReinforwments
I
3 Composition
I 3 I Location/Disposition I 3 1 Time/SpaceFactors I 3 I “Efficiencf“ I
I I
I
‘
2 Relative Assessments
3 Strengths
I 4 1 Redstrengths I 4 IBlueStrengths
I I
3 Vulnerabilities
I 4 1 RedVulnerabilities I
I
4 lBlue Vulnerabilities
1 I I Operational Concepts
I
2m
I 3 1 AreaAssumptions 4
I
Suitability
I 4 I Acceptability 4
2
I
Success Probability
Pertinent Red Capabilities
I 3 I Red Military Objectives
I
79
80
STEPHEN J. ANDRIOLE
3 RedCDAs 3 RedVulnerabilities 1 Operations Concept
2 RedCapabilities
3
Operational Capabilities
3
Distilled RedCapabilities
2 BlueCOAs 3
ArhmtagWDiWantqes
3
COA Vulnerabilities
2 COASelection 3
3 3
AlternativeCOAs Relative COA Comparisons
COA Rank-Ordering
3
ForceAllocation &Training
3
Supporting Operations
3
4
Logistics Operations
4
OtherOperations
Command RelatiMS
3 Deployment Summary 3 Employment Summary
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
Substantive Planning Requirements Description R> Planning Requirements
,
List of functional plannlng requirements
1 MiSsion statement
Requirement to undwstandmlsslon
2, Military Oblectlw
Requlrement to understand mllitwy objectives
3, Spedfic Objectives Ob]ective Rank- OrderIng
Requirement to understendspecific objectives Requirement to understendrank-ordering of objectives
I > Area Characterlstics
Requirement to understandarea
2, w a p h i c
Need to understandgeoOraphicfeatures
3> T w e p h i c
Topographic Information requirements Wographic information requirements Climatlc/wealher information requirements
Hydrwaphic Climatic/Weather 2, Transportation Telmmunlcatlons
Transportation information requirements Telecommunicationsinformation regulrements
1, Combat Cepabilities
Relative combat capabilities information requirements
2, RedCapabilltles
Need to understandRed m b a t cap&ilities
3, Strength/Reinforcements
Overall and reinforcements strength information requirements Need to understandRed composition Need to Identify location and understanddisposition Need to understandRed time/space factors Need to assess "efficiency"
Composition Locatlon/Disposition Factors Time/"Efficiency"
2, Bluecapabilities
Need to understandBlue wmbat capabilities
3, Strength/Reinforwments
Need to assess Blue strength and reinforcements Need to understandBlue composition Need to identify Blue location mi understanddisposition Need to mess tlme/spw factws Need to BSSBSS Blue "efficiency"
Composition Location/Disposttion Time/Spsce Factors "Efflciency"
2, RelativeAssessments
Need to infer "net"effects
3, Strengths
Need to emss relative strengths
4, RedStrengths
Need to dtitermlne relatlve Red s t r w h s
Blue Strengths
3, vulnerabiiities 4,
Red Vulnerabllities Blue Vulnerabilities
Need to determine relative Blue strengths Need to assess relative vulnerabilities Need IodetermlneRedvulnerabllitles Needto &itermine Bluevulnerabllities
81
STEPHEN J. ANDRIOLE
82 1> Operational Concepts
Need to formulate initial COAS
2, CQQS
Need to develop strawman courses of action
3, Objectives Area Assumptions Strawman COAs
Need to re-assw military objectives Need to identify wea assumptionsvis-a-vis CMS Need to develop strawman CfMs F68sibility vis-a-vis "sultebility" Feaslbilityas to "acceptability" Need todetermine SUCC~SSprobability
4, Suitability Acceptability Success Probability 2, Pertinent RedCapabilitles
~ e e dto determine pertinent Redoapabllitiesvls-a-vis CfMs
3, Red Military Objectives
Need to revisit Red military objectlves EIeed to revisit likely Red COAs Need to revlsit Redvulnerabilities
1> Operationsbnwpt
~ e e dto develop concept of operations
2, Redcapabilities
Need to revisit Red capabilities
3, Operational Capabilities Distilled Red Capabilities
Re-determination of Red capabilities Distillation of Redcapabilities vis-a-vis Blue CW
2, BlueCW
Re-analysis of clue murses of action
3, Wantapes/DisadvantaOes "Sensitivity" Analysis COA Vulnerablllties
Determine advantm and disadvant#&s of each COA Naed for Sensitivity analysis via variation of assumptions Determlnevulnerabilitiesof each BlueCOA
2, COAselection
Need to analyze ad select mongalternative CMS
3, AlternativeCOAs
Revbitation of alternative of Blue CW Need tompareandcontrast alternatlveCfMs Final rank-ordering of Blue CoAS
Red COAs Red Vulnerabilities
RelatlveCM Comparlson CM Rank-Ordering 2, COA-bCDo
Need to translate Blue COA into concept of aperations
3, Forw Allocation & Timing
Ned to determineforce allocationsand timing Need to identify and & m i b e supporting operations
SupportingOperations 4, Logistics Operations Other Operations
Need to determine logistics operations information requirements Naed to identify o m Supportingoperations
3, Command Relations Deployment Summary Employment Summary
Need to determinem m a n d relations Naed to develop @loyment summary Need to develop employment summary (operation concept)
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
User-Computer Interface (UCI 1 Requirements
1 Area Dtsplay Requlrernents
2
2
Moblltty D i s p l w
3
OPFOR Moblllh/
3
Blue
Key Terrain Dtsplays
3 Major Obstacle Displays
4 Mountalns 4
Cltles
4
SwampArees
4 Other
4
Contours/Rel tef/Topography
4 Major Elevatlon Displays 4 Man-Made Objects Displays 2
Plannlng Dtsplays
3
OPFOR
4
Avenues of A p p r m h
4 AssemblyAress/Attack Positlons
83
84
STEPHEN J. ANDRIOLE
3
Blue 4 Avenues of Approach
1
4
AssemblyAreas/Attack Positions
4
Major Supply Points
2
Weather Displays
2
Other Displays
OPFOR Displays
2
Disposition Displays
2
Condition/Strength
3 Conventional Form
3
NuclearForces
2
Air Support Dlsplays
2
Major Logistics Displays
2 COAsDisplays 1
Blue Displays
2
Disposition Displeys
2
Condition/Strength
3
Conventional Form
3 NuclearForcss 2
Air Support Displeys
2
Major Logistics Displays
2
CMsDisplays
I
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
1 Interpretive Displays 2
"Qualitative" Displays
3 Risk Displays 3 Constraints Displays 3 Vulnerability Displays 3 Opportunity Displays 3 Other Qualitative Displays 2
2
"Quantitative" Displays
3
Relative OPFOR Capabilities
3
Relative Blue Capabilities
"Cognitive" Displays
3
Cognitive Consistency
4
Conceptual Equivalence
4
Transition Displays
4
Analogical Displgrs
4
Doctrinal Displays
5 Doctrinal Options 1 Interaction Displays
2 Navigational Displays
85
86
STEPHEN J. ANDRIOLE
3 "Fly-Around" Capabilities 3 "Hold& Wait" Capabilities 3 Process Madel Displays
2
4
PrimaryProcesses
4
"Active" Help Displays
4
"Passive" Help Displays
4
"Active" Training
Manipulation Displays
3 Graphic Equivalence Displays 4 Summary Data Displays
4 Explanations 3 Map-Based Displays 4 Overlays 4
Explanations
2 Dialogue Displays
4
Iconic
4
Other
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
User-Computer interface (UCI) Requirements Descriptions R, UCI Requirements 1> Arsn Display Requirements
Disply requirements for general area of interest
2, Mobillty Displays
Dlsplaysof Red end Blue mobllity m r i d o r s
3, OPFDR Mobility
Requirementsfor OPFOR mobility options displays Requirementsfor Blue mobillty optionsdbplys
Blue
2, Key Terrain Displays
Requirementsfor key terrain dlsplays
3, Major Obstacle D i s p l v
Major obstacles displays
4, River Displays Mountains cities Swamp Arm Other
Dlsplays of river obstacles Displays of mwntaln obstacles Displays of urban obstacles Displays of major swamp ares Other displays of major obstacles
3, "Feature" Displa/s
Key terrain "features" d i s p l m
4, Contours/Relief/Topography Major Elevation Displays Mm-Made O b j W Displays
Contours/relief/topcgraphy displays Major elevatlan displays Displays of man-made objects/features
2, Planning Displays
Displays for general planning
3, OPFOR
Diplays for OPFOR planning
4, Avenues of Approach Assembly Ares/Attack Positions Malor Communication Lines Major Supply Points
Displays of possible 8venw of apprwh (Red) Displaysof assemblyweasatidattack positions(Red) Dlsplays of malor mmunlcation lines (Red) Displays of major supply points (Red)
3, Blue
Displays for Blue plennlng
4, Avenues of Apprcah Assembly AreasIAttack Positions Major Communication Llnes Major Supply Points
Displays of possiblellvmues of approach (Blue) Displaysof bssemblyarmandattack positions(Blw) Displays of major ammunication lines (Blue) Displays of major supply points (Blue)
2, Weather Displays other Dlsplays
Displays of seasonallcurrent weather Displays of other weacharacteristics
1, OPFPR Displays
Displays of OPFOR characteristics and capabilities
2, Disposition/Displays Condition/Strength
Displays of Red disposition D l s p l m of mndition and strength of Red
87
88
STEPHEN J. ANDRIOLE
3, ConventimlForas Nuclear Forces
D i s p l a y s o f m @ t l l ~faasand l readiness Displays of nuclear forces end readiness
2, Air Support Displays Major Logistics Displays cQl\s Displays
Displaysof Red air support Displays of malor logisticalcapabilities Displays of likely Red coursBs Of action (0%)
1> Blue Displays
Displaysof Bluecharacteristicsandcapabillties
2, Dispositian Displays Condition/Strenpth
D i s p l ~ oBlueloostionwddisposition f Displays of Blue mdltion fxdstrength
3, Conventional Forces
Displays of conventional forces and readiness Displays of nuclear capabilities and readiness
Nuclear Forces
2, Air Support Displays Major Logistics Displays W Displays
Displaysof Blueair support capabilities Displays of major logisticscapabilities Displaysof feasibleBlueCOAs
1>
Displays that support interpretation of substance
interpretlve Displays
2, "Qualitative" Displays
Displays of "qualitatlve" phenomena
3, Risk Displays
Displays that convey risk Displays that communicate operational constraints Displays that communicatevulnerabilities (Redand Blue) Displays that communlcate opportunities (Red and Blue) Displays of other qualitative aspects of situation
ConstraintsDisplays Vulnerability Displays Opportunity Displays Other Qualitative Displays
2, "Quantitatlve" Displays
Displays of "quantitative" information
3, Relative WFOR Capabilities
Displaysof relativeOPFOR combat capabilities Displays of relative Blue m b a t capabilities
2, "Cognitive" Displays
Displays that support spacific aqnitive functions
3, Cqnitive Consistency
Displays that support doctrinal models of planning
4, Conceptual Equivalence Transition Displays
Displays that support mceptual equivalence Displays that support easyaqnitive transition
3, Option Qaneration
Displays that support option generation
4, Analogical Dlspiays
Displays that present analogical information
5, Current Analog Displays "Old Analog Displays
Displays of current relevant analogs (cases) Displays that present "old but pertinent cases
4, Doctrinal Displays
Displays that present information on doctrine
5, Definitional Displays Doctrinal Options
Displays of current doctrinal explanations Displays that present doctrinal planning options
1 > interaction Displays
Displays that support Smmth user interaction
2, Navigational D i s p l w
Displays that support efficient Wtem naviption
Relative Blue Capabilities
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
3, "Fly-Around" Capabilities "Hold h Wait" capabilities P r o m s ~ o d eDisplays l
Capability to "fly-around system options and data Capability to "hold" system or have system "wait"
Displays that present the problem-solving process
4, Primary Proasses Sub-Process Displays
Displays of primary (overall) problem-solving process Displays that present sub-process problem-solving models
3, Adaptive Help Displays
Displays that present help
4, "Active" Help Displays "Passive" Help Displays
D i s p l w that p r m n t system-controlled help Displays that respond to user queries for help
3, Adaptive Training
Displgs that support adaptive trainlng
41 "Active" Training "Passive" Training
Displays that support system-man@ training Displws that support training by user request
2, Nanipulation Displays
Displays for data/process manipulations
31 Graphic Equivalence Displqs
Graphichlphanumeric equivalence displays
4, Summary Data Displays Explanat ions
Displays of all data and information Explanation displays of system-generated options
3, Map-Based Displays
Displqs that support map manipulations
4, Overlays Explanations
Displqs that permit "mix and match" overlays Displays that support graphic/map- bawl explanatlons
2, Dialogue Displqs
Displqs that support appropriate dialogue
3, Alphanumeric Dialogue
Displays that support alphanumeric dialogue options Displays that support graphic interaction
Graphic Dialogue Displays 4, lmnic Other
Displays that support the u s of on- line imns Other displgs that support graphic dialogue
Appendix B Storyboards from the Group Planning Prototype
This appendix contains a number of storyboards (screen displays)extracted from the group decision support system prototype. These storyboards describe the overall "master menu" structure of the system, the sub-menu structure, and some of the actual functional displays and routines in the prototype. Storyboards represent integrated collections of screens that suggest to users what the system will do (as well as what it will not do). The menu options are "active" in storyboard prototypes. Users can select menu options, and the prototype will respond immediately to the query.
ENEMY CAPABILITIES
AREA CHARACTER!STI CS
ALL I ED CAPABlLlTl ES PAUSE
EXPLAIN SIMULATE
I
SEND
TH EATER- LEVE L PLANNING
SHARE
(NATO vs Warsaw Fact)
COMPARE
UPDATE OVERLAY
7 1
ENTER ~~
STOP ENEMY
ALLIED COAS
COAS
OPERATIONAL CONCEPTS
MISSION
Infantry Strength
PAUSE
Air Strength
I I
I SIMULATE
I
Armored
Strength
A r t i l l e r y Strength Nuclear Strength Location I D1sposlt ion
SHARE
COMPARE UPDATE
I OVERLAY I
I I
ENTER
I ZOOM
I
1
I
I
ENEMY
COAs
ALLIED COAs
OPERATIONAL CONCEPTS 90
STOP
MISSION ENEMY CAPABlLIT1ES
ALLIED L P A B I LIT1ES I
Telecommunications
1
SIMULATE
Social
E
I
Economic Poli t lcal
1
1 PAUSE
Transportation
EXPLAIN
COMPARE
f
Terrain Weather
SHOW
SHARE
I
UPDATE OVERLAY ENTER
I
ENEMY
COAs
I
ALLIED COAs
I
MISSION
Infantry StrETIgth
PAUSE
Air Strength Armored
EXPLAIN
A r t i l l e r y Strength Nuclear Strength
I ~
I 1
-
Strength
Location / Disposlt ion
S'MULATE
I
SHARE
COMPARE UPDATE OVERLAY ENTER FLY
ENEMY
COAs
ALLIED COAs
91
MISSION ENEflY CAPAEl LIT1ES
AREA CHARACTER1STI Cs
ALL IED CAPABILITIES PAUSE
SHOW
t EXPLAIN
SEND
SIYUlATE
SHARE
COMPARE
UPDATE
OVERLAY ENTER Likely ~~
Possible
ZOOM
I
STOP
Unlikely
OPERATIONAL CONCEPTS
~~
ENEflY CAPABlLlTl ES
AREA CHARACTER1STI CS
ALL IED CAPAElLIT1ES
SHOW
EXPLAIN SIMULATE SHARE COMPARE UPDATE
1
OVERLAY
ENTER
Likely
FLY
ZOOM
1
Possible Unlikely
OPERATIONAL CONCEPTS 92
I
STOP
93
94
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
95
REFERENCES Adelman, L. (1990). “Evaluating Decision Support Systems.” QED Information Sciences, Inc., Wellesley, Massachusetts. Adelman, L., and Donnell, M. L. (1986). Evaluating Decision Support Systems: A General Framework and Case Study. I n “Microcomputer Decision Support Systems: Design, Implementation and Evaluation” (S. J. Andriole, ed.). QED Information Sciences, Inc., Wellesley, Massachusetts. Aiken, P. H. (1988a). “An Evaluation of the Capabilities of a DSS Based on Hypermedia Technologies.” George Mason University, Fairfax, Virginia. Aiken, P. H. (1988b). “A Demonstration of the Contribution of Hypermedia Technologies to Decision Support.” George Mason University, Fairfax, Virginia. Aiken, P. H. (1989). “A Hypermedia Workstation for Software Engineering.” George Mason University, Fairfax, Virginia. Ambron, R., and Hooper, C. (1987). “Interactive Multimedia: Visions of Multimedia for Developers, Educators and Information Providers.” Microsoft Publishers, Inc., Redmond, Washington. Andriole, S. J. (1983).“Interactive Computer-Based Systems Design and Development.” Petrocelli Books, Inc., Princeton, New Jersey. Andriole, S. J. (1985). “Applications in Artificial Intelligence.” Petrocelli Books, Inc., Princeton, New Jersey. Andriole, S. J. (1986). Graphic Equivalence, Graphic Explanations and Embedded Process Modeling for Enhanced User-Computer Interaction. IEEE Trans. Systems. Man and Cybernetics SMC-16 (6), 919-926.
96
STEPHEN J. ANDRIOLE
Andriole, S. J. (1987a).“Interactive Decision Aids and Support Systems for Strategic and Tactical Command and Control.” International Information Systems, Inc., Marshall, Virginia. Andriole, S. J. (1987b).“Functional Modeling for Theater-Level Planning and Decision-Making.” Department of Information Systems and Systems Engineering, George Mason University, Fairfax, Virginia. Andriole, S. J. (1987~).“User-Computer Interaction Requirements for Theater-Level Tactical Planning.” Department of Information Systems and Systems Engineering, George Mason University, Fairfax, Virginia. Andriole, S. J. (1987d).“User-Computer Interaction Requirements for Counter-Terrorism Crisis Management.” Department of Information Systems and Systems Engineering, George Mason University, Fairfax, Virginia. Andriole, S. J. (1987e). “The Design and Development of an Intelligent Planning Aid: The TACPLAN Prototype.” International Information Systems, Inc., Marshall, Virginia. Andriole, S. J. (1988). “User-Computer Interaction (UCI) Requirements for Group Problem Solving.” George Mason University, Fairfax, Virginia. Andriole, S. J. (1989a).“Decision Support Systems: A Handbook for Design and Development.” Petrocelli Books, Inc., Princeton, New Jersey. Andriole, S. J. (1989b). “Storyboard Prototyping: A New Approach to User Requirements Analysis.” QED Information Sciences, Inc., Wellesley, Massachusetts. Andriole, S. J. (1990). “Information System Design Principles for the 90s: Getting It Right Through Multidisciplinary Information Systems Engineering.” AFCEA International Press, Fairfax, Virginia. Andriole, S. J., and Hopple, G. W. (1984).They’re Only Human: Decision-Makers in Command and Control. Signal, October, 61-66. Andriole, S. J., and Hopple, G. W. (1988). “Defense Applications of Artificial Intelligence.” Lexington Books, Inc., Lexington, Massachusetts. Andriole, S. J., Ehrhart, L. S., Aiken, P. H., and Matyskiela, W. W. (1988). “Storyboarding Prototypes for Group Planning and Decision-Making.’’ Department of Information Systems and Systems Engineering, George Mason University, Fairfax, Virginia. Barclay, S., Brown, R. V., Kelly, C. W., 111, Peterson, C. R., Phillips, L. D., and Selvidge, J. (1977). “Handbook for Decision Analysis.” Decisions and Designs, Inc., McLean, Virginia. Baroudi, J. J., Olson, M. H.,and Ives, B. (1986). An Empirical Study of the Impact of User Involvement on System Usage and Information Satisfaction. Comm. ACM 29 (3). Bernstein, A. (1985). Shortcut to Systems Design. Business Computer Systems, June. Bertcher, H. J. (1979).“Group Participation.” Sage Publications, Newbury, California. Bice, K., and Lewis, C. (1989).“Wings for the Mind: Conference Proceedings: Computer Human Interaction.” Addison-Wesley Publishing Co., Reading, Massachusetts. Boar, B. (1984).“Application Prototyping: A Requirements Definition Strategy for the 80s.” Wiley Interscience, New York. Boehm, B. W. (1976).Software Engineering. IEEE Trans. Computers C-25, December. Bolt, R. A. (1984). “The Human Interface: Where People and Computers Meet.” Lifetime Learning Publications, Belmont, California. Brooks, F. P. (1987). No Silver Bullet: Essence and Accidents of Software Engineering. IEEE Computer, April. Carlisle, J. H. (1973).“Comparing Behavior at Various Computer Display Consoles.” The RAND Corporation, Santa Monica, California. DARPA/MIT (1988).“DARPA Neural Network Study.” AFCEA International Press, Fairfax, Virginia. Dee, D. (1984). Developing PC Applications. Datamation, April. DeSanctis, G., and Gallupe, R. B. (1987).A Foundation for the Study of Group Decision Support Systems. Management Science 33 (5), 589-609.
COMMAND AND CONTROL INFORMATION SYSTEMS ENGINEERING
97
Ehrhart, L. S. (1988). “Storyboard Architectures for Group Problem-Solving.’’ Department of Information Systems and Systems Engineering, George Mason Universiiy, Fairfax, Virginia. Eisner, H. (1988). “Computer-Aided Systems Engineering.” Prentice-Hall, Englewood Cliffs, New Jersey. Fairley, R. (1985). “Software Engineering Concepts.” McGraw-Hill, New York. Fleischman, E. A., Quaintance, M. K., and Broedling, L. A. (1984). “Taxonomies of Human Performance.” Academic Press, New York. Galitz, W. 0. (1984). “Humanizing Office Automation.” QED Information Sciences, Inc., Wellesley, Massachusetts. Gladstein, D. L., and Reilly, N. P. (1985).Group Decision Making Under Threat: The TYCOON Game.” Academy of Management J . 28 (3), 613-627. Gomaa, H., and Scott, D. (1981). Prototyping as a Tool in the Specification of User Requirements. Fifth Intl. Conf. Software Engineering. Grief, I., and Sarin, S. (1986). Data Sharing in Group Work. Con$ Computer-Supported Cooperative Work, Proc. 1986, pp. 175-183. Hare, A. P. (1982). “Creativity in Small Groups.” Sage Publications, Newbury Park, California. Hart, S., Boroush, M., Enk, G., and Hornick, W. (1986). Managing Complexity Through Consensus Mapping: Technology for Structuring Group Decisions. Academy of Management Rev. 10 (3),587-600. Hendrick, C. (1987). “Group Processes.” Sage Publications, Newbury Park, California. Hice, G. F., Turner, W. S., and Cashwell, L. F. (1978). “System Development Methodology.” North-Holland, New York. Hopple, G. W. (1986). Decision Aiding Dangers: The Law of the Hammer and Other Maxims. IEEE Trans. Systems, Man and Cybernetics SMC-16 (6). Horowitz, E. (1975). “Practical Strategies for Developing Large Scale Software Systems.” Addison-Wesley, New York. JCS (Joint Chiefs of Staff) (1976). “Dictionary of Military Terms.” US. Government Printing Office, Washington, D.C. Lakin, F. (1986).A Performing Medium for Working Group Graphics. ConJ Computer-Supported Cooperative Work, Proc. 1986, pp. 255-266. Ledgard, H., Singer, A., and Whiteside, J. (1981).“Directions in Human Factors for Interactive Systems.” Springer-Verlag, New York. Lehner, P. E. (1986). “Decision Aid Design.” PAR Technology Corporation, Reston, Virginia. Leslie, R. E. (1986). “Systems Analysis and Design: Method and Invention.” Prentice-Hall, Englewood Cliffs, New Jersey. Martin, A. W., Esoda, R. M., and Gulick, R. M. (1983). “CONSCREEN-A Contingency Planning Aid.” Decisions and Designs, Inc., McLean, Virginia. Meister, D. (1976).“Behavioral Foundations of System Development.” John Wiley & Sons, Inc., New York. Norman, D. A,, and Draper, S. W., eds. (1986). “User Centered System Design.” Lawrence Erlbaum Associates, Hillsdale, New Jersey. North, R. L. (1988). Neurocomputing: Its Impact on the Future of Defense Systems. Defense Computing, January-February. Pinker, S., ed. (1985). “Visual Cognition.” MIT/Bradford Books, Cambridge, Massachusetts. Potter, A. (1988). Direct Manipulation Interfaces. Al Expert, October, 28-41. Pressman, R. S. (1987). “Software Engineering: A Practitioner’s Approach.” McGraw-Hill, New York. Ragland, C. (1989). Hypermedia: The Multiple Message. MacTech Quarterly, Spring. Ramsey, H. R., and Atwood, M. E. (1979).“Human Factors in Computer Systems: A Review of the Literature.” Science Applications, Inc., Englewood, Colorado.
98
STEPHEN J. ANDRIOLE
Royce, W. W. (1970). Managing the Development of Large Software Systems: Concepts and Techniques. In “TRW Software Series.” TRW, Inc., Redondo Beach, California. Sage, A. P. (1985). “Systems Engineering.” School of Information Technology and Engineering, George Mason University, Fairfax, Virginia. Sage, A. P. (1988).Group Decision Making: Can Information Technology Help the Go-Ahead Company? Decision Technologies, 39-50. Sage, A. P., and Rouse, W. B. (1986).Aiding the Decision-Maker Through the Knowledge-Based Sciences. IEEE Trans. Systems, Man and Cybernetics SMC-16(4). Schneiderman, B. (1987). “Designing the User Interface.” Addison-Wesley, New York. Schweiger, D. M., Sandberg, W. R., and Ragan, J. W. (1985). Group Approaches for Improving Strategic Decision-Making: A Comparative Analysis of Dialectical Inquiry, Devil’s Advocacy, and Consensus. Academy of Management J . 29 (l), 51-71. Smith, S. L., and Mosier, D. (1984).“Design Guidelines for the User Interface to Computer-Based Information Systems.” the Mitre Corporation, Bedford, Massachusetts. Srinivasan, A., and Kaiser, K. M. (1987). Relationships Between Selected Organizational Factors and Systems Development. Comm. ACM 30 (6). Stefik, M., Bobrow, D. G., Lanning, S., and Tater, D. (1986). WYSIWYG Revised: Early Experiences with Multi-User Interfaces. Proc. CSC W’86 Conf. Computer-Supported Cooperative Work, pp..276-290. Stefik, M., Foster, G., Bobrow, D. G., Kahn, K., Lanning, S., and Suchman, L. (1987). Beyond the Chalkboard: Computer Support for Collaboration and Problem-Solving in Meetings. Comm. ACM 30 (l), 32-47. Tachmindji, A. J., and LalTerty, E. L. (1986).Artificial Intelligence for Air Force Tactical Planning. Signal, June, 110- 114. Thierauf, R. J. (1978).“An Introductory Approach to Operations Research.” John Wiley & Sons, New York. Van Duyn, J. (1982).“DP Professional’s Guide to Writing Effective Technical Documentation.” John Wiley & Sons, New York. Weiss, A. H. (1986).An Order of Battle Adviser. Signal, November, 91-95.
Perceptual Models for Automatic Speech Recognition Systems RENATO DEMORI McGill University School of Computer Science Montreal, Quebec, Canada
MATHEW J. PALAKAL Purdue University School of Science at Indianapolis Department of Computer Science Indianapolis, Indiana
PIER0 COSl Centro di Studio per le Richerche di Fonetica C.N.R. Padova, Italy
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . 2. Speech and Speech Knowledge . . . . . . . . . . . . . . . 2.1 Acoustic Characteristics of Phonemes . . . . . . . . . . . 2.2 Speech Recognition Systems . . . . . . . . . . . . . . 3. A Multi-Layer Network Model for ASR Systems . . . . . . . . . 4. The Ear Model: An Approach Based on Speech Perception. . . . . . 4.1 Speaker-Independent Recognition of Ten Vowels In Fixed Contexts . 4.2 The Recognition of Phonetic Features . . . . . . . . . . . 4.3 Recognition of New Vowels and Diphthongs . . . . . . . . . 4.4 WordModels . . . . . . . . . . . . . . . . . . . . 5. The Vocal Tract Model: An Approach Based on Speech Production. . . 5.1 The Skeletonization Algorithm. . . . . . . . . . . . . . 5.2 Line Description . . . . . . . . . . . . . . . . . . . 5.3 Description of Frequency Relations Among Spectral Lines . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
.
100 101 103 11I 127 129 136 141 146 148 150 154 158 159 167 169 169
99 ADVANCES IN COMPUTERS, VOL. 31
Copyright 01990 by Academic Press. Inc. All rights of reproduction in any form reserved ISBN 0-12-012131-X
100
RENATO DEMORI e t a / .
1.
Introduction
Speaker-independent Automatic Speech Recognition (ASR) by computers of large or difficult vocabularies is still an unsolved task, especially if words are pronounced connectedly. Efforts and progress toward the solution of this problem are reported in the recent literature (Bahl et al., 1983; Levinson, 1985; Kopec and Bush, 1985; Kimball et al., 1987). During the past two decades, there has been substantial progress toward the goal of constructing machines capable of understanding and/or recognizing human speech. One of the key improvements has been the development and application of mathematical methods that permit modeling the speech signal as a complex code with several coexisting levels of structure. The ultimate goal of research on automatic speech recognition is to give the machine capabilities similar to humans to communicate in natural spoken languages. Such research is of great interest from both the application point of view and the research point of view. Since speech is our most natural mode of communication, we should have the potential of machines that more fully accomodate to the human user, rather than perpetuating the trend of our mechanical slaves actually enslaving us in unwanted diversions, such as learning keypunching, typewriting, and complex programming methods (Lea, 1979). From the application point of view, there are several advantages of voice input to machines: voice input leaves eyes and hands free, it needs little or no user training, and it permits fast, multimodal communication, freedom of movement and communication. Speech recognizing machines have been possible application in many areas, such as office automation, assembly line inspection, airline reservation and aids for the handicapped. From the research point of view, automatic speech recognition is a difficult problem extending over the past four decades. Even though significant progress was made in the past, the ultimate goal, a perfect listening machine, is yet to be achieved. Several areas of human perception of voice have yet to be explored, and the findings of such research must be exploited for building listening machines. What has been done so far is based mostly on analytical methods, and only very recently have researchers incorporated detailed speech knowledge in their recognition models. Another key improvement has been the development of various recognition models for ASR systems, such as syntactic models, probabilistic parsing models, expert system models, and network models. Under network models, the most successful ones are stochastic models, procedure network models, and artificial neural network models. In Section 2 of this chapter we will discuss the fundamentals of speech production and speech knowledge, various techniques that are used in
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
101
speech recognition systems, and, briefly, some of the most successful speech recognition systems. In Section 3, we will discuss some of the recent advances in speech recognition research, such as the use of artificial neural network models and a special case of Hidden Markov models. In this section we will also compare and contrast speech-perception-based ASR systems (the ear model) and the conventional speech-production-based ASR systems (the vocal tract model). Some preliminary results are also presented for the earmodel-based system that uses multi-layer networks and also for the more conventional system that uses Markov models with line parameters based on the fast fourier transform (FFT).
2.
Speech and Speech Knowledge
In this section we present a review of the various components in the human speech production system. A brief discussion on the characteristic features of various sound classes is also presented. Finally, we review some of the basic and advanced methods for automatic speech recognition and describe some of the popular ASR systems. Figure 1 shows the organs of our speech production system. Speech sound is produced when air flows through and resonates in the vocal tract. Different sounds are produced because of different vocal tract configurations. For a class of speech sounds, such as vowels, there is a set of resonant frequencies characterizing each sound in the class. Also, different sounds are produced depending upon the source of excitation. During speech production, the articulators move continuously, rather than discretely, resulting in a continuum of acoustic characteristics. There are two basic types of sound sources in speech: periodic vibration of vocal folds, and turbulent noise. When the speaker exhales, air passes through the larynx. If the glottis is partially closed, then the air passing through the constriction causes the vocal cords to open and close quasi-periodically, producing the voiced sound. The rate of vibration, which is controlled by vocal cord tension, is called the fundamental frequency or pitch. When excitation is at the glottis, the vocal folds remain open and cause a weak turbulence, to produce aspiration sounds. A constriction in the vocal tract causes turbulent noise, which has a flat spectrum, and is called a frication sound. The peaks in the spectrum of voiced sound are called formants and are labeled as F1, F2, . . . , Fi, where F1 is called the Jirst formant and so on. The lips, tongue, jaw, and velum can be moved to change the shape of the vocal tract. The resultant vocal tract acts as a cascade of resonators, which filter the
102
RENATO DEMORI e t a / .
1. Lips 2. Teeth 3. Teeth-ridge 4. Hard plate 5. Velum
6. Uvula 7. Blade of tongue 8. Front of tongue 9. Back of tongue
10. Pharynx 11. Epiglottis 12. Vocal cords 13. Tip of tongue 14. Glottis
FIG.1. Organs of speech production.
source. The poles of the vocal tract transfer function generate spectral peaks called the formants. In the case of nasals, sound passes through the nasal cavity, but the mouth cavity, which is closed, acts as a side branch and introduces zeros in the spectrum. The interaction of the poles and zeros can change the frequency of the formants (Schwartz, 1982); Early speech scientists described speech sounds in terms of particular characteristics of speech, such as voiced, unvoiced, front, back, etc. (Oppenheim and Schafer, 1968; Jakobson et al., 1952). During speech production, the articulators move relatively slowly from one position to another. The articulators often do not reach their “target” positions due to contextual effect of neighboring phones: this is called coarticulation (Heffner, 1950). Therefore, the spectral sequence associated with a particular phone can vary widely depending on the adjacent phones. Different speakers’ vocal apparatus can vary in terms of the source spectrum, the length of the vocal tract, and the relative shape of the vocal tract. For this reason, the speech of adult males and females differs; typically, the
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
103
pitch period of female speech is about 20% shorter, causing an average 20% increase in the formant frequencies (Fant, 1966). In addition to the differences due to speaker, dialect, and phonetic context, there is also random variation in the pronounciation of speech sounds. Even for the speech of a single speaker, the spectral properties present cannot be converted back to a phonetic string without the use of higher-level knowledge. The articulatory movements vary for different speech sounds: for some the vocal tract configurations are stable, while for others, they are not. For example, the configuration for the sound /miis more stable than for an / r / . In the velum is lowered, while the sound / r / the vowel-to-nasal transition for /m/, is produced by retroflexing the tongue. The tongue cannot move as fast as the velum, and this causes the difference in the configuration dynamics. 2.1
Acoustic Characteristics of Phonemes
A brief discussion of the acoustic nature of each phonetic group is considered now. More details on these materials can be found in (Schwartz, 1982; Zue and Schwartz, 1979; Rabiner and Schafer, 1978; Skinner, 1977).
Vowels Vowels are produced by exciting the open vocal tract with a periodic (voiced) source. Vowels are often characterized by substantial energy in the low- and mid-frequency regions. The energy in the high frequencies above 3500Hz are less important for vowel characterization. The characteristics of different vowels depend on the location of the tongue, position of the jaw, and the degree of lip rounding. The resulting shape of the vocal tract determines the formant frequencies. The three classes of vowels- back, central, and front-occur as a result of the tongue position. In general, when the tongue moves forward, the second formant rises; as the tongue moves higher or the jaw rises, the first formant decreases. Lip rounding lowers all of the first three formants. Figure 2 shows acoustic waveforms (Rabiner and Schafer, 1978), and Fig. 3 shows the resonance characteristics of vowels (Wakita, 1977). Many vowel recognition methods measure the first three formants in the middle portion of the vowel and compare those values against stored targets. These values are called vowel loci. Variances of formant frequency distributions for each vowel around vowel loci are speaker independent. Consonants
The consonants are divided into several groups depending on the manner in which they are articulated. The five such groups in English are the plosives, fricatives, nasals, glides, and affricates. Consonants from different “manner-of-articulation” groups often have different acoustic correlates.
I
I
/
FIG.2. Acoustic waveforms of vowels (Rabiner and Schafer, 1978).
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
105
U
d
“f W 0
a
J
n I 4
FREOUENCY
(kHz 1
FIG.3. Resonance characteristics of vowels. Reprinted with permission from “Normalization of Vowels by Vocal Tract Lens and Its Application to Vowel Identification,” H. Wakita, IEEE Transactions on Acoustic Speech and Signal Processing, April 1977. 01977 IEEE.
Consonants within a “manner-of-articulation” group differ in their voicing characteristics and the position of constriction. The acoustic properties of consonants differ both within the consonants and in the adjacent vowels in the form of formant transitions. This problem must be considered in order to recognize consonants. Nasals The nasals (/m/, /n/,/ng/)are always adjacent to a vowel, and are marked by a sharp change in intensity and spectrum, corresponding to the closing of the oral cavity and opening of the velum. Nasal sound is produced by a glottal excitation and vocal tract constriction at some point. By lowering the velum, the air flow is forced through the nasal cavity. Nasals are very difficult to recognize, since nasal murmur differs significantly from speaker to speaker, because of differences in the size and shape of the nasal and sinus cavities. Nasal murmur is also heavily affected by phonetic environment. Some of the nasal characteristics are the prominence of a low-frequency spectral peak at around 300 Hz, little energy present above 3k Hz, and sharp spectral discontinuity between the nasal murmur and the adjacent vowel. Figure 4 shows acoustic waveforms of the nasal sounds /m/and /n/.
---1
1
1
1
I l l I ' l l
" I l l
I I l (
(-7
FIG.4. Acoustic waveforms of nasal sounds (Rabiner and Schafer, 1978).
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
107
Liquids and Glides This group is also sometimes known as semi-vowels. Such consonants often appear next to vowels as in the case of nasals. These sounds are produced by a constriction in the vocal tract that is smaller than that of vowels but still large enough so that no turbulence is generated. Each phoneme in this group has a close association with certain vowels, such as
l w l => Iul, i r i => 131,
IYl * lil, III * 101.
These consonants are distinguished from other consonant groups in that the rate of articulatory movement is considerably slower, which implies slower formant transitions and the formants of these sounds have the following qualitative relation with respect to the formants of adjacent vowels: / w / has lower F1 and F 2
/ I / has lower F1 and F2 with higher F 3 frf has lower F 3 / y / has very low F1 with higher F 2 . The formant patterns within the phonemes are similar to some vowels, and their distinguishing characteristics are often detected by comparison with those of the adjacent vowels. Plosives Plosives are also known as the “stop” consonants. Plosive sounds are classified into two groups: voiced or lax ( / b / ,Id/, / g / ) and unvoiced or tense ( f p f , ftJ, flcf). Voiced plosive sounds are produced by building up pressure behind a total constriction somewhere in the oral tract. During the total constriction no sound is radiated through the lips. However, a small amount of low-frequency energy is radiated through the walls of the throat called the voice bar. Unvoiced plosive sounds are produced in the same way as voiced sounds, except that during total constriction, vocal cords do not vibrate. Plosive consonants are considered to be the most difficult consonants to recognize, for the following reasons:
The production of a stop is dynamic, involving a closure and release period. The complicated nature of this production results in many diverse acoustic cues. The acoustic events during the production of the sound can be omitted or severely distorted.
/UH-P-A/
/UH-B-A/
FIG.5. Acoustic waveforms of plosive sounds (Rabiner and Schafer, 1978).
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
109
Some of the characteristics of voiced and voiceless stops are the following: (a) The plosives are characterized acoustically by a period of prolonged silence, followed by an abrupt increase in amplitude at the consonantal release. The release is accompanied by a burst of frication. (b) For voiceless stops, the aspiration noise is generated at the glottis. (c) The voice onset time (VOT), which is the duration between the release and the onset of normal voicing for the following vowel, is longer for unvoiced (30 to 60 ms) than for voiced (10 to 30 ms) stops. (d) The voiced stops are often prevoiced, creating the voice bar in the lowfrequency region. (e) The amplitude of the burst is significantly different between voiced and voiceless plosive sounds. Figure 5 shows acoustic signals for samples of voiced and voiceless plosives. Fricatives Like plosives, there is a voiced fricative group ( / v / , / T H / ,/z/, / z h / ) and a voiceless fricative group consisting of (If/, 101, Is/, / s h / ) . Unvoiced fricatives are produced by exciting the vocal tract by a steady air flow that becomes turbulent in the region of a constriction in the vocal tract. Unlike the unvoiced fricatives, voiced fricatives are produced by vocal cord vibration and excitation at the glottis. Since the vocal tract is constricted at some point, the air flow becomes turbulent. Voiced fricatives often have simultaneous noise and periodic excitations which cause great amount of low-frequency energy in the beginning of frication. Voiced fricatives are also shorter than unvoiced fricatives. Acoustic signals for some of the fricative sounds are shown in Fig. 6. Affricates The affricates ( / c / ,/ j / ) are often considered as a plosive followed by a fricative. These sounds are often modeled as a sequence of two phonemes (/c/ as /t-s/ and / j / as /d-z/). The duration of frication is often very short compared to other fricatives.
The properties of each class of phonemes can be considered as acoustic knowledge about the speech sounds. However, these properties are not independent, and they may be distorted or omitted. Because the cues for a phoneme are so redundant, the human speaker tends to be rather careless about producing these prototypical features for a given phoneme. Distortions may vary from speaker to speaker, or even over time for the same speaker. Despite the above mentioned problems regarding human speakers, the human listener somehow has no trouble discarding the “bad” features and
/UH-V-A/
/UH-ZH-A/
I
r
w -oo- . t
FIG.
mrr
6. Acoustic waveforms of fricative sounds (Rabiner and Schafer, 1978).
c
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
111
accepting only the “good” ones. This is possible because “higher-level context” is available, and also because humans use phonotatic constraints in decoding distorted syllables. This is a clear indication that enough information to decode the phonemes is present in the acoustic signal. Therefore, phonetic recognition algorithms must consider several features jointly, rather than a particular feature. Given several features that each contribute toward making phonetic distinctions, the Acoustic Phonetic Recognizer must enlist the aid of a multi-dimensional feature selection and pattern recognition algorithm to design the optimum classifier (Schwartz, 1982). Many approaches have been used to incorporate various features that are present in speech sound. Some of those important techniques are discussed in the next section. 2.2
Speech Recognition Systems
During the past two decades, there has been substantial progress toward the goal of constructing machines capable of understanding and/or recognizing human speech. One of the key improvements has been the development and application of mathematical methods that permit modeling the speech signal as a complex code with several coexisting levels of structure. For any speech recognition system, the spectrum is usually represented by Fourier coefficients, zero crossing rate, or the parameters of some local model of signal such as linear prediction coefficients. Temporal information can be directly obtained as in the case of voice onset time. Prosodic information is often extracted by estimating fundamental frequency to represent pitch and the logarithm of energy integrated over 45-ms intervals to measure intensity (Levinson, 1985). Presently, features obtained this way are neither robust nor invariant with respect to speaker. As a result of some psychophysical experiments, (Miller et al., 1951), there is an assertion that speech is a composite signal, hierarchically organized so that simpler patterns at one level are combined in a well-defined manner to form more complex patterns at the succeeding level. Such an organization strategy is easily explained in terms of information-theoretic principles. The structures at each level of the hierarchy serve to constrain the ways in which the individual patterns associated with that level can be combined. The constraints build redundancy into the code, thereby making it robust to errors or variations caused by a speaker. This way relatively few primitive patterns can be combined in a multilevel hierarchy according to a complex code to form a rich, robust information-bearing structure (Levinson, 1985). Spectra and prosodics, the primitive patterns according to linguistic theories, can be combined in several ways to form phonemes (Cohen and
112
RENATO DEMORI et a / .
Mercier, 1975; Nakatsu and Kohda, 1978; Woods, 1975), broad phonetic catagories (Shipman and Zue, 1982; Chen, 1980), diphones (Scaglioda, 1983), demisyllables (Rosenberg et al., 1983; Ruske and Schotola, 1982), syllables (DeMori et al., 1985),supra-segmental phrases (Lea et al., 1975; Mercier et al., 1979), and sentences (Perennou, 1982; Walker, 1975).For the implementation of theories, data structures such as templates (Bourlard et al., 1984; Haton and Pierrel, 1980; Aldefeld et al., 1980), formal grammars (Myers and Levinson, 1982;Bahl et al., 1979; Levinson, 1979), Markov chains (Baker, 1975b; Jelinek and Mercier, 1980; Rabiner et al., 1983), fuzzy sets (DeMori, 1973), and hash tables (Kohonen et al., 1980) have been used. In nonparametric methods the primitive measurements of the speech signal can be compared without regard for their temporal location. However, sequences of these measurements, such as the one required to represent speech signals of greater temporal extent, must, due to the nonstationarity of the speech signals, take account of time to be meaningfully compared (Levinson, 1985). A comparison approach between the two methods, according to Levinson, is shown in Fig. 7. It has been argued that nonparametric methods are easy to train, but as a classifier the parametric methods perform just the opposite in terms of com-
Training Phase Vector Quantization &
Parameter Estimation
I
Models
0
z
.CI
E
&'
c
Time Aligned Distance Computation
4
Probability Estimation
4 Vector Quantizer
FIG. 7. Comparison of parametric and nonparametric methods (Levinson, 1985).
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
113
plexity. Template matching, stochastic modeling, and probabilitic parsing were the most successful models. Some of the benchmark systems developed using the above approach are described briefly in the following sections. Most of the following discussions are summarized from Moore (1984), DeMori and Probst (1986) and Haton (1984). 2.2.1
Template Matching
Template matching is based upon principles of nonparametric estimation of likelihoods by means of invariant metrics. In template matching, each recognition unit (a unit could be a word or a phoneme, etc.) is represented by at least one template, created from a set of training utterances. Each template contains a sequence of patterns extracted in time. The patterns are spectral and/or prosodic features. During matching processes, the target pattern is matched against stored templates, and the matched template with a minimum distance is the selected candidate. The whole success of the pattern matching approach lies in the comparison process. Absolute Pattern Match The most basic comparison process is simply to correlate the timefrequency word patterns produced by a pre-processor in order to determine the distance between an unknown word and each template. This may not be possible, because words are often of different duration and their corresponding patterns are of different sizes. A potential solution to this problem can be obtained by aligning the beginnings of all the patterns, and by correlating only over the areas of overlap. This simple technique, experimented with by White and Fong (1975), requires N vector comparisons per pattern match, where N is the number of vectors in the smallest pattern. Best Absolute Time Alignment An alternative to aligning the beginnings of words in order to perform an absolute comparison is to adjust their relative timing to maximize the correlation of the overlap. That is, starting with the beginnings aligned, the patterns are shifted with respect to each other until the ends align. The similarity of the pattern overlap is calculated at each shift, and the highest similarity is the result of the comparison. Computationally this scheme is expensive. Recognition experiments carried out using this method by Moore (1984) found no significant improvements in recognition. Linear Time-Normalization The previous techniques do not consider the fact that the same word is very rarely the same duration on different occasions. In order to handle this
114
RENATO DEMORI eta/.
problem, the patterns are uniformly time-normalized to make them the same size. This is known as linear time-normalization. For practical applications, either the template patterns are time-normalized to the unknown pattern, or all patterns are time-normalized to a pre-set duration. If a very small vocabulary (10 to 30 words) is used, such techniques perform well. A commercial system made available by Interstate Electronics called VRM used a linear time-normalization approach. Several other commercial systems used such techniques. The performance also depends on the inherent confusability of the words, consistency of speakers, type of features used, and the number of training samples allowed. Nonlinear Time-Normalization Linear time-normalization does not perform well for larger vocabularies. The reason is that making the pattern of fixed length is not an adequate model of what actually happens when people make words longer or shorter. A model of time-scale distortion that allows different sounds to be distorted differentially would align the pattern more meaningfully. One approach to computer recognition of speech requires that we compare two sequences of elements and compute the distance between them by finding an optimal alignment or correspondence between the elements of one sequence and those of the other. In speech research, these sequence comparison methods are capable of using dynamic programming to perform nonuniform dynamic time warping (DTW). The name refers to allowing nonlinear distortions of time scales in computing the acoustic similarity between a reference prototype and features extracted from the input utterance, thereby taking into account speaking-rate variability as well as sustitution, insertion, and deletion errors. Dynamic programming is an efficient algorithm by which both optimal alignments and the resulting distances are computed at the same time. Data and prototypes to be matched are represented by discrete sequences produced after either synchronous or asynchronous sampling of the continuous speech signal; due to normal speech variability, two sequences arising from two utterances of the same word may exhibit a number of local differences. These local differences may be that one element has been substituted for another, that an element has been inserted, or that an element has been deleted. Other local differencemodels are conceivable, such as one that allows expansion of a single element into several elements or compression of several elements into a single element, as independent types of local difference. Given two sequences and costs (or weights) of the local differences, an alignment is assigned a cost equal to the sum of the costs of the local differences in it; the distance between the two sequences is the least cost of any alignment. DTW is essentially a two-stage process. Figure 8 illustrates the first stage. Two abstract speech-like patterns are shown, one vertically and one hori-
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
115
zontally. Each pattern has time frames consisting of three-element vectors; the vertical pattern has four frames, and the horizontal has five. The matrix in the center is known as the distance matrix. It contains numbers that correspond to the distances between the frames in one pattern and the frames in the other pattern. For example, the number 20 in the top right hand corner indicates that the first frame of the vertical pattern is quite different from the last frame of the horizontal pattern. Similarly, the 1 indicates that the second frames of each pattern are very similar. The distance is actually calculated by taking the sum of the squares of the differences between each pair of frames. The second stage is to find the path through the distance matrix, from the top left hand corner to the bottom right hand corner, that has the minimum accumulated sum of distances along its length. This path is the required nonlinear relationship between the timescales of these two patterns, and it is found by dynamic programming. Dynamic programming involves the regular application of a local optimization procedure which ultimately leads to an overall global solution. In this case a local decision function is used, together with the distance matrix, to construct a second matrix called the cumulative distance matrix. Figure 9 illustrates the process. The local decision function is shown in Fig. 9a. It defines that a path may arrive at any particular point either vertically, horizontally or diagonally, and is applied as follows.
FIG.8. Distance matrix obtained after comparing two abstract patterns. Reprinted with permission from “Systems for Isolated and Connected Word Recognition,” R. K. Moore, New Sysfems and Architectures for Automatic Speech Recognition and Synthesis, Springer-Verlag, 1985.
116
RENATO DEMORI et al.
(a)
A
?
t
FIG.9. Demonstrationof Dynamic Time Warping (Moore, 1984).(a) Local decision function. (b) Partially completed cumulative distance matrix. (c) Completed cumulative distance matrix. (d) Record of local decisions. Reprinted with permission from “Systems for Isolated and Connected Word Recognition,”R. K. Moore, New Systems and Architectures for Automatic Speech Recognition and Synthesis, Springer-Verlag. 1985.
For each point in the cumulative distance matrix, add the cheapest cost of getting to that point to the cost of being at that point, and enter it in the matrix. The cheapest cost of getting to a point is the smallest of the values in the previous entries (as defined by the local decision function), and the cost of being at a point is simply the value taken from the corresponding position in the distance matrix. Hence, if this process is applied iteratively, starting at the top left hand corner of the matrix, it is possible to complete all the entries in the cumulative distance matrix. Figure 9b shows the cumulative distance matrix in the process of being filled in. The “?” indicates the point being considered, and the three previous points are highlighted. The cost of getting to the point is the minimum of 19,13 and 21, and the cost of being at that point is 12 (from the distance matrix in Fig. 8). Hence the cumulative distance entered at that point is 25 (13 + 12).
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
117
Figure 9c shows the cumulative distance matrix completely filled in. The number in the bottom right hand corner is highlighted because this is the overall distance between the two patterns; it is the sum of distances along the least-cost path through the distance matrix. To find the path it is necessary to remember at each point in the calculation exactly which local decisions were made (horizontal, vertical or diagonal). Figure 9d shows all of these decisions. It can be seen that they form a tree radiating from the top left hand corner (where the calculation started). The actual minimum-cost path is found by tracing back along the local decisions, starting at the bottom right hand corner (where the calculation ended). Referring back to the distance matrix (Fig. 8), the calculation shows that the least-cost path takes the route 7 + 1 + 5 + 12 + 2; no other path has a cumulative sum less than 27. The formulation for this dynamic programming is the following recursive expression: D(i, j ) = d(i, j )
+ min [ D ( i
-
1, j ) , D ( i - 1, j - 11, D(i, j - 111,
where 1 I i I I and 1 I j I J and ( I and J are the numbers of frames in the two patterns being compared), d is a distance measure between two frames, and the initial condition is D(0, 0) = 0. The overall distance between the two patterns is D(1, J ) . Dynamic programming techniques originally developed for isolated word recognition have also been applied to the problem of recognizing connected words. Here, the spoken input is a sequence of words from a specified vocabulary, and matching occurs against isolated word reference templates. We are given an input pattern with some number of time frames. We also possess a set of reference templates, where each template has a length equal to the number of frames in that template. The goal is to find the sequence of templates that best matches the input pattern for a particular match criterion. The concatenation of templates is referred to as a super reference pattern. Two proposed solutions to this problem can be found in the two-level algorithm of Sakoe (1979) and the level-building algorithm of Myers and Rabiner (1981). Also worth mentioning in this context is the one-stage dynamic programming algorithm of Ney (1984). A brief description of Myers and Rabiner’s level-building dynamic timewarping (DTW) algorithm for connected word recognition is as follows. The underlying idea is that the matching of all possible word sequences can be performed by successiveconcatenation of reference patterns. At the beginning, the time registration of the test pattern against a given super reference pattern is considered; it is observed that the algorithm can be implemented in levels, that is, one reference (of the super reference pattern) at a time. The computation matches test frames only against frames within a particular reference; the set of accumulated distances between different segments of the
118
RENATO DEMORI et a/.
test pattern and that reference is saved and used as a set of initial distances for the next level. This idea is then extended to a level-building algorithm with multiple reference patterns, that is, when each reference of the super reference pattern is one of a set of reference patterns. The recognition performance of isolated word recognizers based on DTW techniques is significantly better than that obtainable from linear timenormalization. This is because DTW provides a far more realistic timescale compensation process; greater variability can be accommodated, hence larger vocabularies may be used. Also, by using relaxed endpoint constraints (the position where the timescale registration path is allowed to start and end), DTW does not suffer from the same dependency on endpoint detection as linear time-normalization. Hence the segmentor can be much simplier, and it is left to the DTW process to decide precisely where the words begin and end.
2.2.2 Network-Based Systems In the previous section we studied dynamic time-warping systems originally developed for isolated word recognition and later extended to recognition of strings of connected words. In this section we look at two representative network-based systems, Carnegie- Melon University’s Harpy system and IBM’s Markov modeling system, which are directed toward the more difficult problem of continuous speech recognition. In the general form of this problem we are interested in large-vocabulary, speaker-independent recognition; the two systems under consideration restrict the problem considerably by introducing grammatical and/or task constraints so that a simple finite-state model may be built of the entire language to be recognized. Both systems compile knowledge at different levels of the language model into an integrated network. In the Harpy system, phonetic, phonological, lexical, and syntactic constraints have been combined into a single model which generates all acceptable pronunciations of all recognizable sentences;in the IBM system, each word of the top-level language model is replaced by a phonetic subsource, and then each phone is replaced by an acoustic subsource, yielding a model of all acoustical realizations of sentences in the language. An important difference between the two networks is the fact that, in the IBM system, all sources and subscources are Markov models, while in Harpy, Markov networks have given way to transition networks with no a priori probabilities associated to symbols that label transitions; as already mentioned, in both cases the integrated language models are finite-state models. Another important difference is that Harpy uses segmentation while the IBM system does not. In Harpy, the acoustic signal is divided into variablelength segments that represent “stable” portions of the acoustic signal;
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
119
spectral characteristics of each segment are then determined for use in phone template matching. The assumption here is that, given enough allophone templates, it is reasonable to attempt labeling of segments using pattern matching techniques. Asynchronous segmentation is performed top-down and then the network is used to select prototypes to be matched with the data. In the IBM system, no attempt is made to segment the speech into phonemelike units: instead, a time-synchronous acoustic processor produces parameter vectors computed from successive fixed-length intervals of the speech waveform. A parameter vector coming from a 10-ms frame is matched against a set of prototypes: the parameter vector is then labeled by giving it the name of the prototype to which it is closest. Another possibility is that of using the input vector for retrieving a priori probabilities of different labels.
2.2.3 The Harpy Sysrern The Harpy system is an attempt to combine the best features of the Hearsay I system and the Dragon system (Baker, 1975a).The most significant aspects of the system design are an integrated network language model (knowledge representation) and the use of beam search through the network during recognition. Segmentation is attempted, phonetic classification depends on unique templates, and word juncture knowledge is an integral part of the network. A word network exists such that any path through the network gives an acceptable sentence. Each word is replaced by a pronunciation network which represents expected pronunciations of the word. After words have been replaced by their subnetworks, word juncture rules are applied to the network to model phone string variations due to influences of neighboring words. During compilation into the composite network, various optimization heuristics are applied to yield an efficient phone network, that is, a network of acceptable pronunciations. During the recognition process, Harpy attempts to find an optimal sequence of phones satisfying two criteria: a) the sequence must represent a legal path through the network, and b) the sequence should consist of phones with high acoustic match probabilities. It is possible that the best fit to a particular segment in the left-to-right search does not correspond to the correct interpretation; to compensate for this, beam-search strategy is used in which a group of near-miss alternatives around the best path is examined. When the end of the sentence is reached, the phone sequence with the lowest total distance is selected; backtracing through the globally best sequence obtained at the end of forward searching yields the desired phone and word assignments. A pronunciation dictionary and phone characteristics allow us to replace words with their subnetworks. A simplified subnetwork for the word “was” is
120
RENATO DEMORI e t a / .
shown in Fig. 10. As before, redundant paths are removed; phonetic symbols are taken from the ARPAbet (Lesser et al., 1975). So far we have seen illustrations of syntactic knowledge and lexical knowledge, although information about phone duration has been deliberately omitted from the latter. The phonetic network attempts to capture intraword phonological phenomena; word boundary phonological phenomena, on the other hand, are represented by word juncture rules, which contain examples of insertion, deletion, and substitution of phones at word boundaries. The word juncture rules are then applied to the network. Finally as before, redundant paths are removed. Harpy’s front end performs rough segmentation based on zero-crossing rates and peaks in smoothed and differenced waveform parameters called the zapdash parameters. Quasi-stationary segments derived from the zapdash parameters are matched against phone templates. These phone templates are linear prediction spectral templates, and comparison is based on Itakura’s minimum prediction residual error measure, which computes similarity in spectral shape. The spectral templates are talker specific but new templates may be learned automatically, for example, by adapting speaker-independent templates. The beam-search strategy for searching the finite-state graph prunes from further consideration paths scoring less than a variable threshold, rather than using a priori probabilities to find the most likely path through the network. In systems like Harpy and Hearsay I1 (Erman et al., 1976), segments are detected asynchronously and then labeled. Labeling a variable-length segment consists of recording, for each allophonic template, the probability that the segment represents an occurrence of that particular template. In contrast, synchronous nonsegmenting systems consider successive fixed-length frames of the speech signal. For each frame, we obtain a vector x of parameters representing the spectrum of that frame of speech. In vector quantization, our problem, for each such x, is to find that codeword x iin a codebook of stored prototypes whose spectral distance from x is a minimum. In this speech coding technique, we have the collection of possible reproduction vectors
FIG. 10. Nonredundant phonetic network for the word “was” (DeMori and Probst, 1986).
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
Find X-
121
9 with
minimal d(x,? ),
FIG.11. Vector quantizer encoder (DeMori and Probst, 1986).
xlr x 2 , . . . , x,, which is stored in the reproduction codebook or simply the codebook of the quantizer; the reproduction vectors are called codewords (or templates). Moreover, we have a distance classifier which allows us to compare vectors according to a spectral distance. The encoding is illustrated in Fig. 11. Problems in constructing a good vector quantizer include choosing a good set of spectral representatives in the codebook, usually through training. More details about vector quantization can be found in Gray (1984).
2.2.4 The ISM System
IBM has developed two benchmark systems. The first one is a speakertrained continuous speech recognition system with a recognition of 91% on words contained in sentences in the 1000-word vocabulary set (Jelinek et al., 1975;Jelinek, 1976).The second system is an isolated wordrecognition system with an accuracy of 95% on an 8000-word office correspondence vocabulary (Bahl et a!., 1983). This system has recently been enhanced by expanding the vocabulary to 20,000 words (Averbuch et al., 1987). The system is based on Markov models of language and has been implemented using two control strategies: a) a Viterbi algorithm, and b) a leftto-right stack decoder algorithm that estimates the probability that a given partial hypothesis can be extended to yield the actual sentence. Important aspects of the system design include the presence of a priori transition probabilities in the finite-state language model and the formulation of speech
122
RENATO DEMORI e t a / .
recognition as a problem of maximum-likelihood decoding. As such, statistical models of the speech production process are required. The choice among these two control strategies and the decoding methods mentioned earlier is a function of degree of task constraint, the size of the state space. In the IBM approach, the allowed sentences are either described a priori by an artificial grammar or else limited by a vocabulary and a task domain in which models may be constructed from observed data. The distinctive feature of the IBM approach is that speech recognition is formulated as a problem in communication theory. The speaker and the acoustic processor are conceptually combined into a single unit, the acoustic channel. Figure 12 shows the relation between the text generator, the acoustic channel, and the linguistic decoder. In Fig. 12, w is a string of words generated by the text generator, y is a string of acoustic processor output symbols (more specifically, a string of prototype identifiers, one for each 10 ms of speech),and w’ is the word string produced by the linguistic decoder as an estimate of the word string w. The acoustic channel provides the linguistic decoder with a noisy string from which it must attempt to recover the original message. The linguistic decoder searches for a word string w that maximizes the probability P(w,y) of the joint observation of (w,y) at the two ends of the channel. A stochastic model of the acoustic channel will account for both the speaker’s phonological and acousticphonetic variations and for the unvarying performance of the acoustic processor. Given models that specify both P(w)and P ( y 1 w), the linguistic decoder may determine w using some algorithm that is appropriate to the size of the language. The model of text generation is a language model. Both the language model and the acoustic channels are Markov sources consisting of states connected by transitions; with each transition there is an associated output word. A probability is attached to each transition. In the IBM system, the language model assigns probabilities to strings of words. For the acoustic channel model for single words, a phonetic Markov subsource is associated to each word. The possible output strings, drawn from an alphabet of phones, are all the different phonetic pronounciations of the word. An example is shown in Fig. 13. For each word there is a set of phonetic subsources, and for each phone there is a set of acoustic subsources. An acoustic subsource-for a phone is a Markov source whose output alphabet
Text
Generator
W
b Acoustic Channel
A
0 Linguistic decoder 4
FIG.12. Speech recognition as a communication problem (DeMori and Probst, 1986).
c
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
T,2/3
IH,1/3
DX,1/3
P,2/3
123
FIG. 13. Phonetic subsource for the word “two” (DeMori and Probst, 1986).
contains the output symbols of the acoustic processor and which specifies both possible acoustic processor outputs for each phone and their probabilities. More details on stochastic decoding and its performances can be found in Bahl et al. (1983) and Schafer and Rabiner (1975). Results obtained on stochastic-model-based network systems show that there is no significant difference in recognition accuracy from that of the DTW approach. However, from a computational point of view the Markov models require an order of magnitude less storage and execution time; where the DTW based techniques have a very simple training phase (only data collection) and a very complicated recognition phase, Markov models are just the reverse. It has been overwhelmingly agreed that Markov models provide the correct balance for any practical system (Moore, 1984). 2.2.5 Knowledge-Based Systems The purpose of a “good” recognition model is to take knowledge and generalize it appropriately to assess new events. This is possible only by a proper understanding of the variabilities involved. The conclusion drawn after the ARPA project on Speech Understanding Research (SUR) was that there still exists a great need for integrating more speech knowledge into ASR models in order to solve difficult tasks, as well as to achieve better recognition results. The performance of an automatic speech recognizer ultimately depends on the amount and quality of the training material. However, if the dimensionality of the representation is raised, then recognizers are always going to be undertrained. It is therefore vital to know how the knowledge embedded in the training material can best be structured and hence utilized. In theory, it ought to be possible to extract a great deal of structural information from the speech signal itself since humans can do it. So the question is how to obtain more knowledge from speech data. The main characteristic of knowledge is that it is highly domain-dependent. Abstractly speaking, knowledge is made up of descriptions, relationships, and
124
RENATO DEMORI et el.
procedures corresponding to a given domain of activity. In practice, knowledge can take many diverse forms. It roughly consists of “objects” and their relationships in a domain, together with the procedures and heuristics for manipulating the relationships. It is obvious that the error-prone nature of speech data makes it necessary to have an efficient cooperation between highly diversified knowledge sources: knowledge concerning phonetics, phonology, prosody, lexicon, syntax, semantics, and pragmatics. The choice of adequate structures for representing the available knowledge sources is a crucial problem in speech understanding, as well as in any A1 system. Several approaches were taken in the past which include several interesting ideas. One possible solution is to use a single structure in which all the diverse knowledge sources are integrated. This was the solution chosen in the Harpy system. Harpy integrates knowledge of all levels in a precompiled network which contains the various phonetic transcriptions of all syntactically legal sentences. The only disadvantage of this approach is that the size of the network becomes too large and storing all possible sentences causes the system to be too rigid. A second solution is at the other extreme from that of Harpy, a total independence of the various knowledge sources. A hierarchical method for implementing such a scheme, called the blackboard model, was used in the Hearsay I1 system (Lesser et al., 1975). Figure 14a shows an example of a blackboard organization. In this approach the knowledge sources are independent processes which, in principle, are not aware of each other and which asynchronously post hypotheses at various levels (phoneme, syllable, word, etc.) to a global data base called the blackboard. This way a sentence is described at different levels. Invoking a given knowledge source is datadirected in the sense that specific preconditions must be fulfilled to access the blackboard. Blackboard schemes have been successfully applied to various other A1 areas such as vision (Prager et ul.,) and signal interpretation. A third solution, which is intermediate between the network model and the blackboard model, is the hierarchical model as shown in Fig. 14b. In this approach the processing is controlled by some kind of control structure or supervisor. In contrast with the data-driven, asynchronous activation of Hearsay 11, the knowledge sources in this model are activated by the supervisor. This way control strategies can be tested by modifying the supervisor. The Hwim system of BBN was based upon such an approach. Rule-Based Expert Systems Recent results obtained in A1 are largely due to sophisticated problemsolving systems called expert systems. For a well defined and restricted
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
Lexical Access
125
1
BOARD
FIG. 14. (a) Example of a blackboard model. (b) Example of a hierarchical system.
domain, expert systems are able to reach the level of expertise of a human expert. Instead of the classical two-tiered organization of data and program, expert systems introduce a more flexible, three-level structure of data, knowledge base, and control (Haton, 1980). Figure 15 shows the overall organization of an expert system.
126
RENATO DEMORI e t a / .
MAN MACHINE INTERFACE
I
Language Processing .Data and Knowledge Acquisition
CONTROL STRATEGY Inference Explations about Reasoning
*
KNOWLEDGE BASE Knowledge Sources Rules Data and Facts
The knowledge base is used by the system for analyzing the problem deductively. It typically incorporates some kind of data-activated operators whenever specific preconditions are met during the problem-solving process. Expert systems make it possible to split a complex expertise into a large number of relatively simple rules. A human expert often seems to use a production rule scheme while reasoning. Therefore, these systems can be successfully applied to various aspects of speech recognition that require solving specific and limited problems. Some of the attempts made so far which use expert systems approach are in speech spectrogram interpretation, multiexpert structure for accessing large lexicon. In the multi-expert system, different experts in the society execute in parallel various algorithms derived from a task-decomposition of the speech recognition algorithm. The major difference between Hearsay 11’sblackboard model and the expert system lies at the level of the control structure. In Hearsay I1 the only way the various knowledge sources (the experts) can communicate is by asynchronously posting hypotheses in a data base. The knowledge sources are triggered when specific preconditions in the data base are satisfied. In the expert society each expert is provided with a specific control strategy and communicates directly with other experts. This strategy makes use of planning algorithms and is related to the A1 concept of frames which provides an interesting framework for the predictive use of knowledge. An example of this can be found in (DeMori et al., 1987a). We have seen several approaches to speech recognition each using varying amounts of speech knowledge and different ways of knowledge representation. Template matching techniques use constraints to define a manageable task and are easy to develop, but they use very little speech knowledge. The
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
127
Harpy system used more speech knowledge integrated into its network model. The IBM system showed techniques for combining speech knowledge with well defined mathematical models. Such techniques did manage to take into consideration speaker variations and coarticulation effects to a certain extent. The Hearsay I1 and Hwim systems showed the importance of using independent knowledge sources even though both used different types of knowledge representation. Expert systems were shown to be promising for knowledge representation in a natural way, and they are especially attractive if the domain is specific, small, and well defined. Most of the past work on ASR clearly demonstrates that the solutions of difficult speech recognition tasks involving speaker-independence, connected speech or large vocabularies need much more speech knowledge, knowledge from all levels, in their models. Faster and larger computing systems may help in solving this problem. It is also understood that speech signals contain sufficient knowledge, since humans seem to process it very easily. In order to integrate knowledge from different sources that are of different nature and that are available at different levels, one should adopt different types of knowledge integration techniques, by applying the most appropriate ones at each level, rather than adopting just one specific model, in a recognition model. In the next section of this chapter we will discuss the use of artificial neural network models and a special case of Hidden Markov models that can be used in ASR systems. We will also compare and contrast speechperception-based systems (the ear model) and more conventional speech recognition systems. The ear-model-based speech recognition system uses multi-layer networks, whereas the more conventional system uses Markov models with FFT-based line parameters.
3. A Multi-Layer Network Model for ASR Systems Coding speech for automatic speech recognition (ASR) can be performed with multi-layer networks (MLN). This approach is interesting because it allows one to capture relevant speech properties useful for ASR at the stage of coding. MLNs are networks with an input layer of nodes, one or more hidden layers and an output layer whose nodes represent a coded version of the input. Nodes are connected by links. Weights are associated to links. All the links bringing a signal into a node contribute to the calculation of the excitation of that node. The excitation is the sum of the product of the weights of each link and the value of the output coming from the node the link carries its signal from. The output of a node is a function of the node excitation. By choosing the link weights a large variety of coders can be designed having specific properties. Link weights can be obtained by a learning process. Learning can
128
RENATO DEMORI e t a / .
be supervised or unsupervised. When learning is supervised, the network input is fed by sets of patterns. Each set corresponds to a class of patterns that have to be coded with the same values appearing at the output nodes. The output nodes are clamped with the desired values, and algorithms exist for computing the values of the link weights in such a way that the network codes the sets of input patterns as desired. These learning algorithms have a relevant generalization capability. Many scientists are currently investigating and applying learning systems based on MLNs. Definitions of MLNs, motivations and algorithms for their use can be found in Rumelhart et al. (1986), Plout and Hinton (1987), and Hinton and Sejnowski (1986). Theoretical results have shown that MLNs can perform a variety of complex functions (Plout and Hinton, 1987). Applications have also shown that MLNs have interesting generalization performances capable of capturing information related to pattern structures as well as characterization of parameter variation (Bourlard and Wellekens, 1987; Watrous and Shastri, 1987). Algorithms exist for MLNs with proven mathematical properties that allow learning to be discriminant and to focus on the properties that make different patterns belonging to different classes. Furthermore, in MLNs the knowledge about a set of competing classes(in our case Speech Units or phonemes) is distributed in the weights associated to the links between nodes. If we interpret each output of the coder as representing a phonetic property, then an output value can be seen as a degree of evidence with which that property has been observed in the data. Two important research problems can be studied with such an approach. The first problem investigates the possibility of learning the features of each phoneme only in some phonetic contexts and relying on the generalization capability of a network for generating correct hypotheses about phonemes in contexts that have not been used for learning. The second problem is similar to the first, but deals with the possibility of learning all the required features and using them for correctly hypothesizing phonemes that have not been used for learning. In order to study the second problem mentioned above, it is necessary to code the output with some features in order to learn features and to represent each class (phoneme or speech unit) as a combination of features. We have chosen as main features the place of articulation and the manner of articulation related to tongue position. The reason is that these features are well characterized by physical parameters that can be measured or estimated. Phoneticians have characterized vowels and other sounds by discretizing place of articulation and manner of articulation related to tongue position, which in nature are continuous acoustic parameters. We have inferred an MLN for each feature, and we have discretized each feature with five
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
129
qualitative values, namely P L 1 , . . . , P L i , . . . , PL5 for the place and MN1, ..., M N j ,..., M N 5 for the manner. We have used 10 vowels pronounced by many speakers in a fixed context for training the two networks, each vowel being represented by one of the PLi and one of the M N j . In order to describe all the vowels of American English with enough redundancy, we have introduced another network with two outputs, namely T = tense and L = lax. We have also inferred the weights of a network with 10 outputs, one for each vowel. The performances of this network have shown that it is possible to obtain an excellent generalization of the parameters when training is performed on a limited number of male and female speakers using data that make evident acoustic properties having little variance across speakers when the same vocalic sound is pronounced. The performances of this network have also been used as reference. Tests have always been performed with new speakers. The first test consists of pronouncing the same vowels in the same context as in the data used for learning. This test is useful for comparing the results obtained from a mathematical model of the ear with those obtained from the more popular Fast Fourier Transformation (FFT).This test is also useful for assessing the capabilities of the network learning method in generalizing knowledge about acoustic properties of speakers pronouncing vowels. The second test has the objective of recognizing vowels through features. This test has been useful for investigating the power of the networks with respect to possible confusions with vowels not used for learning. The third experiment consists of attempting to recognize new vowels pronounced by new speakers in order to investigate the capability of the networks to detect the same features used for learning but integrated into sounds that were not used for learning. This generalization capability was verified with eight new sounds pronounced by 20 new speakers. Without any learning on the new sounds but just using expectations based on phonetic knowledge on the composing features and their time evolutions, an error rate of 7.5% was found. In the next section we describe in detail the mathematical model of the ear and a multi-layer network model that is used in speaker-independent recognition of 10 vowels in fixed contexts. 4.
The Ear Model: An Approach Based on Speech Perception
Cochlear transformations of speech signals result in an auditory neural firing pattern significantly different from the spectrogram, a popular timefrequency-energy representation of speech. In recent years basilar membrane, inner cell and nerve fiber behaviour have been extensively studied by auditory physiologists and neurophysiologists,
130
RENATO DEMORI e t a / .
and knowledge about the human auditory pathway has become more accurate. A considerable amount of data has been gathered in order to characterize the responses of nerve fibers in the eighth nerve of the mammalian auditory system using tone, tone complexes and synthetic speech stimuli (Seneff, 1984, 1985, 1986, 1988; Delgutte, 1980; Delgutte and Kiang, 1984a-d). Phonetic features probably correspond in a rather straightforward manner to the neural discharge pattern with which speech is coded by the auditory nerve. For these reasons, even an ear model that is just an approximation of physical reality appears to be a suitable system for identifying those aspects of the speech signal that are relevant for recognition. The computational scheme proposed in this chapter for modeling the human auditory system is derived from the one proposed by Seneff (1984).The overall system structure, which is illustrated in Fig. 16, includes three blocks.
INPUT SIGNAL
1
40
- channels
Critical Band Filter Bank BASILAR MEMBRANE RESl'ONSE
Hair Cell Synapse Model FIRING PROBABILITY
Synchrony Detector
SYNCHRONY SPECTRUM
FIG.16. Block diagram of the ear model.
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
131
The first two of them deal with peripheral transformations occurring in the early stages of the hearing process, while the third one attempts to extract information relevant to perception. The first two blocks represent the periphery of the hearing system. They are designed using knowledge of the rather well known responses of the corresponding human auditory stages (Sinex and Geisler, 1983). The third unit attempts to apply a useful processing strategy for the extraction of important speech properties, such as spectral lines related to formants. The speech signal, band-limited and sampled at 16 kHz, is first pre-filtered through a set of four complex zero pairs to eliminate the very high and very low frequency components. The signal is then analyzed by the first block, a 40-channel critical-band linear filter bank. The transfer functions of the filters are depicted in Fig. 17. Filters were designed to optimally fit physiological data (Sinex and Geisler, 1983) and are implemented as a cascade of complex high frequency zero pairs with taps after each zero pair to individual tuned resonators. Figure 18 shows the block diagram of the filter bank. The transfer functions of the filters are Hi(z) = PF(z)SC,(z)
for i = 1,. . . , 40,
(1)
with PF(z) = (1 - ZlZ+)(l
-
ZTz-')(l
-
Z2ZP)2(1 - z,z-')2(1 - Z4z-')2,
and SCi(Z) =
[(l - z q z - ' ) ( l - ZP:z-')-y [(I - Ppiz-'(I - PP:z-')12
fl
j = 4 0 , ...,i
(1 - zsjz-')(l - Z s y ) (3)
where i = 1, . . . ,40 and ZS are the zeroes of the serial branch, ZP the zeros of the parallel branch and PP are the poles of the parallel branch. Filter resonators consist of a double complex pole pair corresponding to the filter center frequency ( C F )and a double complex zero at half its CF. Frequencies and bandwidths for zeros and poles were designed almost automatically by an interactive technique developed by S. Seneff and described in her thesis (Seneff, 1985). The second block of the model, whose block diagram is shown in Fig. 19, is called the hair cell synapse model. It is nonlinear and is intended to capture prominent features of the transformation from basilar membrane vibration, represented by the outputs of the filter bank, to probabilistic response properties of auditory nerve fibers. The outputs of this stage, in accordance with Seneff (1985), represent the probability of firing as a function of time for a set of similar fibers acting as a group. Four different neural mechanisms are
132
RENATO DEMORI et al.
Input Signal
I
f
I
Filter Bank
Basilar Membrane Response FIG. 17. Frequency responses of filters introduced to simulate the basilar membrane.
modeled in this nonlinear stage. A transduction module which half-wave rectifies its input has the transfer function y(n) shown in Fig. 20, corresponding to the following relation: y(n)
=
GHw{l + A tan-' [ B x ( n ) ] } G eABx(n) HW
for x(n) > 0 for x(n) I 0.
(4)
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
133
FIG. 18. Structure of the filter bank.
The rectifier is applied to the signal to stimulate the high-level distinct directional sensitivity present in the inner hair cell current response. The shortterm adaptation, which seems due to the neurotransmitter release in the synaptic region between the inner hair cell and its connected nerve fibers, is simulated by the so-called “membrane model.” The mathematical equations
134
RENATO DEMORI et al.
FIG. 19.
Block diagram for obtaining the firing probability
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
135
Transduction Module ( Half-Wave Rectifying Model)
FIG.20. Transfer function of the transduction module.
describing the mechanism that influences the evolution of the neurotransmitter concentration inside the cell membrane are given by the following:
The meaning of the signals in Equation ( 5 ) is defined in Fig. 21. The third unit in Fig. 19 represents the observed gradual loss of synchrony in nerve fiber behaviour as stimulus frequency is increased. It is implemented by a simple low-pass filter with the following transfer function: H(z) =
[
- %R
1 - GLSRZ-l
],
N
= 4.
The last unit is called “Rapid Adaptation.” It performs “Automatic Gain Control” and implements a model of the refractory phenomenon of nerve fibers. It is based on the following equation:
where (x(n)) is the expected value of x(n) obtained by sending x(n) through a first-order low-pass filter having the transfer function
136
RENATO DEMORI eta/.
Adaptatlon Module ( based on ‘Membrane Model’ )
Membrane s(t) source concentration
_j, >‘
target-region concentration
$ 1
> > ’‘ > ‘ > > ‘‘
><
1:
FIG.21. Signals involved in the adaptation module.
The third and last block of the ear model in Fig. 16 is the synchrony detector, which implements the known “phase locking” property of the nerve fibers. It enhances spectral peaks due to vocal tract resonances. Auditory nerve fibers tend to fire in a “phase-locked” way responding to low-frequency periodic stimuli, which means that the intervals between nerve fibers tend to be integral multiples of the stimulus period. Consequently, if there is a “dominant periodicity” (a prominent peak in the frequency domain) in the signal, with the so called Generalized Synchrony Detector (GSD)processing technique (Young and Sachs, 1979; Sachs and Young, 1980), only those channels whose central frequencies are closest to that periodicity will have a more prominent response. The block diagram of the GSD, as applied to each channel, is shown in Fig. 22. The linear smoothing low-pass filter used as envelope extractor is based on the following relations:
4.1
Speaker-Independent Recognition of Ten Vowels in Fixed Contexts
A first experiment was performed for speaker-independent vowel recognition. The purpose was that of training an MLN capable of discriminating among 10 different American-English vowels represented with the
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
Firing
137
Probability
I GSDl Generalized
Synchrony
Detector
b
T Saturating
Half-wave
Synchrony Spectrum FIG.22. Scheme of the Generalized Synchrony Detector (GSD).
ARPABET by the following VSET: VSET : {iy, ih, eh, ae, ah, uw, uh, ao, aa, er}. The interest was to investigate the generalization capability of the network with respect to inter-speaker variability. Some vowels (iz, ax,ey, ay, oy, aw, ow)
138
RENATO DEMORI et el.
were not used in this experiment because we attempted to recognize them through features learned by using only VSET. Speech material consisted of five pronunciations of 10 monosyllabic words containing the vowels of VSET. The words used are those belonging to the following WSET: WSET:{BEEP, PIT, BED, BAT, BUT, BOOT, PUT, SAW, FAR, FUR}. (12) The signal processing method used for this experiment is the one described in the previous section. The output of the Generalized Synchrony Detector (GSD) was collected every 5 msec and represented by a 40-coefficient vector. This type of output is supposed to retain most of the relevant speech spectral information. The GSD output of the stationary part of the signal was sent to an MLN. The performances of an MLN depend on its architecture, on the method used for learning and for producing an output, and on the type of input and the way the output is coded. In order to capture the essential information of each vowel, it was decided to use 10-equally spaced frames per vowel, for a total of 400 network input nodes. A single hidden layer was used with a total of 20 nodes. Ten output nodes were introduced, one for each vowel as shown in Fig. 23. Vowels were automatically singled out by an algorithm proposed by DeMori et al.(1985), and a linear interpolation procedure was used to reduce to 10 the variable number of frames per vowel (the first and the last msec of the vowel segment were not considered in the interpolation procedure). The resulting 400 (40 spectral coefficients per frame x 10 frames) spectral coefficients became the inputs of the MLN. The Error Back Propagation Algorithm (EBPA) was used for training. EBPA was recently introduced (Rumelhart et al., 1986)for a class of nonlinear MLNs. These networks are made of connected units. The networks used in the experiments described in this chapter are feedforward (non-recurrent) and organized in layers. A weight is associated to each (unidirectional) connection between two nodes. Input nodes are on layer 0 and have no input connections. Output nodes have no output connections and are on the last layer. Nodes that are neither input nor output units are called hidden units. The network computes a nonlinear function from the input units to the output units. The architecture of the network determines which functions it can compute. A typical architecture used in the experiments described in this chapter is shown in Fig. 23. The nodes of the network compute a sigmoid function of the weighted sum of their inputs. Any output value takes values between 0 and 1 according to the following function:
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
139
I Data h u t
I Input Layer
000000000000000000
Hidden Layer
Output Layer
Network Output
FIG. 23. Structure of the neural network used for vowel recognition.
with 1
f (4= 1 + exp(-x)’ The sum in Equation (13) is over the J units with an out-going connection to is associated with the unit i ; the output value of this unit is The weight link between the output of unit i and the input of unit j. With EBPA the weights are computed iteratively in such a way that the network minimized a square error measure defined over a set of training input/output examples. These examples belong to a training set defined as follows:
x,
wj
u L1
1
K
training set =
(IN,, OUT,) ,
where INkis an input pattern and OUT, is a desired output pattern that can ,be represented by the following vector of values: OUTk1,OUT,,, . . . , OUT,,).
140
RENATO DEMORI et al.
The minimized square error measure is
where k varies over the training set of examples and m varies over the M nodes is the value of the mth output node computed by on the output layer. Ym(INk) the MLN when INk is applied at the input layer. EBPA uses gradient descent in the space of weights to minimize E (Rumelhart et al., 1986). The basic rule for updating link weights is AW = -learning-rate
aE
* -,aw
where aE/aW can be computed by back-propagating the error from the output units as described by Rumelhart et al. (1986). In order to reduce the training time and accelerate learning, various techniques can be used. The classical gradient descent procedure modifies the weights after all the examples have been presented to the network. This is called batch learning. However, it was experimentally found, at least for pattern recognition applications, that it is much more convenient to perform on-line learning, i.e., updating the weights after the presentation of each example. Batch learning provides an accurate measure of the performance of the network as well as of the gradient a E / a W . These two parameters can be used to adapt the learning rate during training in order to minimize the number of training iterations. In our experiments we used various types of acceleration techniques. The most effective one consisted of switching from on-line learning to batch learning and vice-versa, depending on the behaviour of the gradient and the evolution of performances. In contrast with Hidden Markov models, MLNs can learn from presentations of examples from all the classes that have to be recognized with the possibility of emphasizing what makes classes different and different examples of the same class similar. The voices of 13 speakers (seven male, six female) were used for learning with five samples per vowel per speaker. The voices of seven new speakers (three male, four female) were used for recognition with five samples per vowel per speaker. Data acquisition was performed with a 12-bits A/D converter at 16 kHz sampling frequency. Learning was accomplished with 62 iterations with zero error rate on the training set. As for the test set, the network produces degrees of evidence varying between zero and one, so candidate hypotheses could be ranked according to the corresponding degree of evidence. The confusion matrix represented in Table I was obtained. In 95.7% of the
141
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
TABLEI
PERFORMANCES OF THE VOWEL RECOGNITION SYSTEM USINGTHE ENTIRE VOCALIC SEGMENT. BAT
BED
BEEP
BOOT
BUT
FAR
FUR
PIT
PUT
SAW
THE
lael
lehl
liYl
luwl
lahi
laal
lerl
lihl
luhl
la01
laxi
0 35 0 0 0 0 0 0 0 0
0 0 35 0 0 0 0 0 0 0
0 0 0 34 0 0 0 0 0 0
0
1 0 0 0 0 34 0 0 0 4
0 0 0 0 0 0 33 0 1 0
0 0 0 0 0 0 0 35 0 0
0 0 0
0 0 0 0 0 0 0 0 0 31
0 0 0 0 0 0 0 0 1 0
BAT BED BEEP BOOT BUT FAR FUR PIT PUT SAW
0 0 0 33 1 1 0 1
0
1
0 0 0 0 32 0
cases, correct hypotheses were generated with the highest evidence, in 98.5% of the cases correct hypotheses were found in the top two candidates, and in 99.4% of the cases in the top three candidates. The same experiment with FFT spectra instead of data from the ear model gave 87% recognition rate in similar experimental conditions. The use of the ear model made it possible to produce spectra with a limited number of well defined spectral lines. This represents a good use of speech knowledge according to which formants are vowel parameters with low variance. The use of male and female voices allowed the network to perform an excellent generalization with samples from a limited number of speakers. Encouraged by the results of this first experiment, other problems appeared worth investigating with the proposed approach. The problems are all related to the possibilities of extending what has been learned for 10 vowels to recognize new vowels. An appealing generalization possibility relies on the recognition of vowel features. By learning a set of features in a set of vowels, new vowels can be characterized just by different combination of the learned features. Features like the place of articulation and the manner of articulation related to tongue position are good descriptors of the vowels generation system. It can be expected that their values have low variance when different speakers pronounce the same vowel. 4.2
The Recognition of Phonetic Features
The same procedure introduced in the previous section was used for learning three networks, namely MLNV1, MLNV2 and MLNV3. These networks have the same structure as the one introduced in the previous
142
RENATO DEMORI et al.
section, the only difference being that they have more outputs. MLNVl has five additional outputs corresponding to the five places of articulation PL1,. . . ,P L i , . . . , PL5. MLNV2 has five new outputs, namely MN1,. . . , MNj, . . . , MNS. MLNV3 has two additional outputs, namely T = tense and L = lax. The ten vowels used for this experiment have the features defined in Table 11. After having learned the weights of the three networks, the outputs corresponding to the individual vowels were ignored and confusion matrices were derived only for the outputs corresponding to the phonetic features. An error corresponds to the fact that an output has a degree of evidence higher than the degree of the output corresponding to the feature possessed by the vowel whose pattern has been applied at the input. The confusion matrix for the features is shown in Table 111.The overall error rates are 4.57%, 5.71% and 5.43%respectively for the three sets of features. Error rates were always zero after a number of training cycles (between 60 and 70) of the three networks. Several rules can be conceived for recognizing vowels through their features. The most severe rule is that a vowel is recognized if all three features have been scored with the highest evidence. With such a rule, 313 out of 350 vowels are correctly recognized, corresponding to an 89.43%recognition rate. In 28 cases, combinations of features having the highest score did not
TABLEI1 VOWELREPRESENTATION USINGPHONETIC FEATURES. Back ARPABET
lael /eh/ Jiyl /uw/
lahi Iaal /er/
lihl
luhl laol laxi
BAT BED BEEP BOOT BUT FAR FUR PIT PUT SAW THE
Place of Articulation Central Front
Low
Manner of Articulation Mid
High Lax
Tense
PL 1
PL2
PL3
PL4
PL5
MNl
MN2
MN3
MN4
MN5
L
T
0 0 0 0 0 1 0 0 0 1 0
0 0 0 1 0 0 0 0 1 0 0
0 0 0 0 1 0 1 0 0 0 1
1 1 0 0 0 0 0 1 0 0 0
0 0
1
0
0 1 0 0 1 0
0 0 0 0
0 0 0 0
0 0 1
0
0
0
1 0 1 1 0 1
0
1 0 0 0 1
0 0 1 1 0 0
1 0 0
0 1 0 0 1 0
0
0
1
0 0 0 0
1
0
1 1
0 0
1
0
1
0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
Features
0 0 1 0
143
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
TABLE I11 PERFORMANCES IN THE RECOGNITION OF
P BAT r BED 0 BEEP n BOOT 0 BUT U FAR n FUR C PIT e PUT d SAW
Back
Place of Articulation Central Front
PL1
PL2
PL3
1 0 0 3 0 0 0 3 0 0 0 0 3 2 0 0 0 3 0 34 0 1 0 2 2 9 0 0 0 3 0 3 5 0 3 5 0 0
PL4
PL5
4 0 5 0 0 3 5 3 0 5 0 0 0 4 0 5 0 0 0 0 0
FEATURES.
Low
Manner of Articulation Mid
High
Lax
Tense
MNl
MN2
MN3
MN4
MN5
L
T
5 0 0 0 8 5 1 0 0
0 35 0 0 27 0 1 0 2 0
0 0 0 0 0 0 31 0 0 0
0 0 0 1 0 0 2 35 33 0
0 0 35 34 0 0 0 0 0 0
0 35 0 2 32 8 0 35 33 31
35 0 35 33 3 21 35 0 2 4
3
3
5
3
~
~~
Recognized Features
correspond to any vowel, so a decision criterion had to be introduced in order to generate the best vocalic hypothesis. It is important to consider as an error the case in which the features of a vowel not contained in the set defined by Equation (1 1) receive the highest score. Considering these vowels as well as the vowels in Equation (1 I), an error rate of 2.57% was found. This leads to the conclusions that an error rate between 2.57% and 10.57% can be obtained, depending on the decision criterion used for those cases for which the set of features having the highest membership in each network do not correspond to any vowel. An appealing criterion consists of computing the centers of gravity of the place and manner of articulation using the following relation:
Let CGP and CGM be respectively the center of gravity of the place and manner of articulation. A degree of “tenseness” has been computed by dividing the membership of “tense” by the sum of the memberships of “tense” and ‘‘lax.’’ Each sample can now be represented as a point in a threedimensional space having CGP, CGM and the degree of tenseness as dimensions. Euclidean distances are computed for those sets of features not corresponding to any vowel with respect to the points representing theoretical values for each vowel. With centers of gravity and euclidean distance, an error rate of 7.24% was obtained.
144
RENATO DEMORI eta/.
Another interesting criterion consists of introducing a subjective probability for a feature defined as the ratio of the feature membership over the sum of the memberships of the other features. For example for feature PLi a probability R~is defined as follows:
The probability of a vowel is then defined as the product of the subjective probabilities of the features of the vowel. As the denominator of the probability of a vowel is the same for all the vowels, the vowel with the highest probability is the one with the highest product of the evidences of its features. By smoothing each membership with its neighbors and multiplying the memberships of the features of each vowel, an error rate of 8.8% was obtained. The error rate obtained with centers of gravity is not far from the one obtained in the previous section with 10 vowels. In this case the possibility of error was higher because the system was allowed to recognize feature combinations for all the vowels of American English. For those cases in which the features that reached the maximum evidence did not define a set corresponding to any vowel of American English, an error analysis was made. The conclusions of these analyses are shown by the error tree in Fig. 24, where the number of errors is indicated in parentheses. They suggest that most of the errors were systematic ( P L 2 confused with PL4, and M N 2 confused with M N 4 ) . Based on the tree in Fig. 24, features for maximum evidence can be used as a code for describing an unknown vowel. When this code does not correspond to any acceptable vowel, it can be mapped into the right one corresponding to the true features of the vowel when the wrong code always corresponded to the same vowel. When the wrong code corresponds to more than one vowel, a procedure is executed that computes euclidean distances on centers of gravity. With this criterion, which is derived from the test data, an error rate of 3.24% can be obtained. This error rate cannot be used for establishing the performances of the feature networks because it corrects some errors by recoding the memberships using a function that has been learned by analyzing the test data. Nevertheless, it suggests that feature-based MLNs may outperform a straightforward phoneme-based MLN if successive refinements are performed using more than one training set. In fact, after a few experiments, interpretations for the codes PL = oooO1, M N = oooO1 and PL = 010o0, M N = loo00 can be inferred and applied to successive experiments, leading to a correct recognition rate close to 96%.
I
Error Table
00010
?r
FUR(1)
lor/
I
00001
I & BUT(1) FAR(1) lahl laal
BUT(2) 1. h l
FIG.24. Error tree for the vowels classifiedwith a code that does not correspondto any vowel.
146
RENATO DEMORI e t a / .
4.3
Recognition of New Vowels and Diphthongs
In order to test the generalization power of the networks for feature hypothesis formulation, a new experiment was performed involving 20 new speakers from six different mother tongues (English, French, Spanish, Italian, German and Vietnamese) pronouncing letters in English. According to other experimental works on vowel recognition (Leung and Zue, 1988), there are 13 vowels and three diphthongs in American English. The vowels and diphthongs that were not used in the previous experiments belong to the NSET: NSET: { /ax/(the), /ey/(A), / ~ Y / ( U/oy/(boy), , /aw/(bough), /ow/(O)) The vowel /ax/ does not exhibit transitions in time of the parameters CGM and CGP, so its recognition was based on the recognition of the expected features as defined in Table 11. The other five elements of NSET exhibit evolutions of CGP and CGM in the time domain. For this reason, it was decided to use such evolutions as a basis for recognition. Furthermore, the sequences / y u / and / w a y / (corresponding to the pronunciation of letters U and Y)were added to NSET in order to have a larger set of classes for testing the generalization capabilities of the system. Although Hidden Markov models could be and will be conceived for modeling time evolution of centers of gravities, as will be introduced in the next section, a crude classification criterion was applied in this experiment. Recognition was based purely on time evolutions of place and manner of articulation according to descriptions predictable from theory or past experience and not learned by actual examples. The centers of gravity CGP and CGM were computed every five msec and vector-quantized using five symbols for CGP according to the following alphabet: Cl = / F , f,c, b, B / ,
(19)
where F represents “strong front.” Analogously, the following alphabet was used for quantizing the manner of articulation: 2 2
= /H, h, M, I, LI,
(20)
where H represents “strong high.” Coding of CGP and CGM is based on values computed on the data of the 10 vowels used for training the network. Transitions of CGP and CGM were simply identified by sequences of pairs of symbols from El and E2.Figure 25 gives an example of the time evolutions of CGP and CGM for letters A (ey) and Y (way) together with their codes. The following regular expressions were used to characterize the words
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
F
:
:( 4 )
H
“A”
Place of Articulation
147
1 h
-M
f
Manner of Articulation (*I
-
C
b
-
-
B
(a)
L I
Place o Articulation
I
1
I
Manner of Articulation
f
(*)
( 4 )
(b) 0
100
200
300
400
500
600
Time (ms)
FIG.25. Time evolution of CGM and CGP.
containing the new vowels and diphthongs: ’4:
(f,h)*(F, HI*
I: (b + c,1)*(f 0: (b + B,I)*@
+ F, h + H ) * + B, h + H)*
/oy/: (B, 1)*(f
+ F, h + H)*
l a w / : (C, I)*@
+ B,h + H ) *
+ F, f + H)*(b + B,h + H)* Y: (b + B, h + H ) * ( C ,1 + L ) * ( f + F , h + H)*
U : (f
The asterisk means, in theory, “any repetition,” but in our case a minimum of two repetitions was required. The symbol “+” means here logical disjunction, while a concatenation of terms between parentheses means a
148
RENATO DEMORIeta/.
sequence in time. A short sequence with intermediate symbols was tolerated in transitions B-F, L-H and vice versa. For each new word, 20 samples were available based on the idea that speaker-independentrecognition has to be tested with data from new speakers and repetition of data from the same speaker is not essential. The errors observed were quite systematic. For /ax/, one case was confused with /ah/. For ley/ (letter A), three errors were observed, all corresponding to a sequence (f,h)* meaning that the transition from /eh/ was not detected. For /ow/ (letter 0),three errors were observed corresponding to the sequence (b, I)* meaning that the transition from /oh/ was not detected which may correspond to an intention of the speaker. Three errors were found for Joy/ confused with Jay/ and two errors for /awl confused with /ow/. For the other transitions, the expectations were always met. The repeatability of the describing strings was remarkable. A total of 12 errors out of 160 test data was found, corresponding to an error rate of 7.5%. This provides evidence that a system made of an ear model followed by MLNs trained to recognize normalized values of the place and manner of articulation reliably generates feature hypotheses about vowels and diphthongs not used for training.
4.4
Word Models
An acoustic situation can be defined using descriptions of suprasegmental acoustic properties (DeMori et al., 1985). For situations in which spectrograms exhibit few narrow-band resonances the recognition paradigm proposed in this chapter can be applied. Other type of networks using different type of input parameters can be introduced and executed in different acoustic situations. The output of these networks should also be memberships of features. Word models can be conceived and represented by finite-state diagrams, each transition of which corresponds to a phoneme or a transition between two phonemes. Figure 26 shows a word model for the letter K. The first transition corresponds to the consonant / k / , other state transitions correspond to the vocalic segment /eh/,the speech transition l e y / , and the vowel /ih/. More sophisticated diagrams could be conceived for the same word involving more speech transition segments. For example, the transition /aei/ could be represented by several units, namely / e y l / , ley21 etc., where the segment ley11 represents the beginning of the transition of the diphthong l e y / just after the vocalic segment, ley21 represents the continuation of the transition and so on. Phonemes can be represented by conjunctions of features; transition
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
149
FIG.26. Example of a word model.
segments can be represented by evidences of transitions in the place or manner of articulation defined by their center of gravities computed with Equation (17). Word models can be compared with real data using dynamic programming or Markov models with probabilities derived from memberships. Another possibility is to use dynamic programming with manually derived prototypes just for alignment. Once the alignment has been performed, probability densities of features for state transitions can be learned together with probabilities of state transitions in word models. In such a way Continuous Densities Hidden Markov Models can be obtained for Speech Units (SU) and for a large speaker population.
150
RENATO DEMORI e t a / .
5.
The Vocal Tract Model: An Approach Based on Speech Production
In this section we discuss another method used for the recognition of English vowels that is based on a vocal tract model. This method uses the conventional FFT spectrograms of speech signals as input. From the speech spectrograms, a more unconventional type of properties, a set of morphological properties is extracted. Morphological properties are derived from biological concepts about pattern creation and perception rather than the traditional “number crunching” approach. In the following sections, we will show that such methods are most effective for characterizing speakerindependent properties of certain sound classes. The motivation for using such an approach is twofold. First, if large or difficult vocabularies have to be recognized when words are pronounced by many speakers, it is advisable to consider a (possibly small) set of Speech Units (SU) with which all the words and word concatenations can be represented by compilation. A relation between a word W and its SUs can be represented by a limited number of basic prototypes and a description of their distortions observed when W is pronounced by a large population of speakers in different contexts. Distortions introduce ambiguities in the relation Rl(W, SU) between W and SUs. In order to make ambiguous relations more useful, for example, for recognition purposes, their statistics can be taken into account. Second, the use of knowledge we have about production and perception of phonemes, diphones and syllables can be useful for conceiving prototypes of Speech Units. SU prototypes can be characterized by a redundant set of Acoustic Properties (AP).A relation R2(SU, AP) between a Speech Unit and its APs is ambiguous because acoustic properties can be distorted, missed or inserted in a particular instantiation of an SU. This is due to context, interand intra-speaker variability. A performance model of such alternations can be built using statistical methods. Whether knowledge about speech analysis, synthesis and perception should be taken into account in ASR is still the object of discussions among the researchers in the field. Investigating the possibility of using acoustic property descriptors for ASR is attractive. Nevertheless, an ASR system based on acoustic property descriptors is not very efficient if the set of properties used and the algorithms for their extraction are not well chosen and conceived. Notice that property descriptors describe the speech data and do not interpret them. Descriptors cannot be false or ambiguous, rather they can be insufficient or much too redundant for interpreting speech. For this reason it is important to start an investigation on property descriptors based on those properties that are expected to be robust speaker-independent cues of fundamental
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
151
phonetic events. Nevertheless, these properties and the algorithms that extract them may have different performances and degrees of success in different cases. For this reason, a certain redundancy in the number of properties used for characterizing a phonetic event may be useful. Remarkable work has been done so far on spectrogram reading (Zue and Lamel, 1986). A number of APs for SUs has been identified with this effort. Attempts have also been made to extract some properties automatically and use them for ASR (DeMori et al., 1987b). Knowledge about spectrograms is incomplete. We know that some properties that can be detected are relevant for perception. The same property may appear in slightly different patterns corresponding to different pronunciations of the same word because of inter- and intra-speaker variations. It is important to characterize knowledge about such variations. This characterization has to be statistical because we do not have other types of knowledge on how basic word pattern prototypes are distorted when different speakers pronounce that same word. On the other hand, it is very important to characterize word prototypes in terms of properties that are relevant for speech production and perception. Property-based prototypes of words or SUs may describe a wide variety of patterns not only because properties are distorted, but also because some properties are missed or some unexpected properties have been inserted. Insertions and deletions can be often characterized by deterministic rules reflecting basic coarticulation knowledge, but in many cases they cannot be fully explained and are better characterized by statistical methods. Based on the above considerations, the system proposed in this chapter represents an attempt to integrate knowledge-based extraction of relevant speech properties and statistical modelling of their distortions. Furthermore, the choice of APs is such that the essential information for reconstructing understandable speech is preserved. For spectrogram segments exhibiting narrow-band resonances, spectral lines are extracted from a time-frequency-energy representation of a speech unit using skeletonization techniques already used for image analysis (Naccache and Shinghal, 1984). These techniques have been adapted to spectrogram lines. Skeletonization can detect a variable number of lines with different durations inside an acoustic segment, thus avoiding the errors and the difficulties of tracking formants. Each spectral line is described by a vector of triplets (time, frequency, energy) that represents the lowest level (level-0) of a time-frequency morphology taxonomy. It is worth mentioning that spectral lines extracted with skeletonization always contain formants when they are detectable with peak-picking
1.0
0.5
42 43 44 45 46 47 48 49 50
51 52
53 54 55 56 57 58 59 60
61
62
63 64 65
66 67 68 69 70
71 72
73 74 75 76 77 78 79 Time Framer
I
sssss
owow YYYY
ZZZZZ
00000 00000
11111 11111 22222 22222 222 333 333 333 333 333 333 333 33333 33333 22222 33333 33333 22222 22222 22222 2222 2222
Xxxxx
222222 3333 6666 8888 9999 9999 9999 9999 9999 9999 8888 66666 55555 55555 44444 33333 2222
1111 1111 0000
zzzz
1111 0000 0000 0000
zzzz 2222
YYYYY Xxxxx WDlWm
-1
1.5
2.0
I
I
vvwvw vvw
3.0
2.5 00000
zzzzz
I
Ywyy
ouow
33333 1111 0000 55554444 22225555
00000
333
I
KHZ
zzzzz 111
66666 66666 55555 4 4 4 4 4 44444 3333 3333 5555 5555
rI I
I 444444 00000 3333331 11111 8888 555 22222 I 2222 8888 44444 33333 I 2222 7777 44444 44444 5555 555555 I YYYY 3333 888888 4444 4444 5555 I Xxxxx 3333 777777 55555 555555 666666 I xxxxxx 3333 8888 55555 6666 6666 I WWWW 2222 66666 3333 5555 44444 I 2222 777777 4444 666 55555 I 2222 666666 3333 777 44444 I 22222 6666 33333 66666 4444 I 00000 6666 22222 333333 5555 I 55555 3333 4444 33333 1 4444 1111 4444 333333 I 4444 2222 3333 2222 I xxxxx 3333 zzzzzz 2222 2222 I 33333 0000 444444 2222 I 22222 1111 222222 0000 I 11111 00000 WWWW 2222 wwwm ZZZZZ 0000 22222 wwww 0000 Xxxxx 1111
rmw
44444555555 22222222 8 8 8 8
m
m
RRR QQQQp
PPPPP
FIG.27. Relevant spectral peaks in a pattern of the diphthonglaeil. Intervals correspond to spectral peaks. Energy of peaks is coded by letter and digits. A represents the lowest energy; B represents an energy that is the double of that represented by A ; 0 represents an energy that is twice the energy represented by Z and so on.
0.5
1.0
1.5
2.0
45 46 41
48 49 50 51 52
53 54 55 56 57
58 59 60 2
VI
61 62 63 64 65 66 61 68 69 70
11 12
73 14
15 16
11 18
79 Tine
FIG.28. The pattern of Fig. 27 after thinning.
2.5
3.0
KRx
154
RENATO DEMORI eta/.
techniques, but very often contain other lines. The system of lines obtained in this way is richer than the system of formants that can be tracked interactively on spectrograms and used for reconstructing understandable speech. A recent paper by Kopec (1985) attempts to track formants using Markov models. In the approach proposed in this chapter a set of lines is tracked that is redundant with respect to a set of formants. Distortions, insertions and deletions of spectral lines are taken into account in each SU model. The motivation for such an approach is that we know that spectral lines are significant acoustic properties but we do not know exactly which of them, if any, are not essential. We know that different speakers produce similar lines when they pronounce, for example, the same vowel. Relative frequencies and amplitudes between lines may vary from speaker to speaker in a limited range, and bigger variations can be characterized as insertions or deletions. Distortions of relative line frequencies and amplitudes as well as insertions and deletions reflect inter- and intra-speaker variabilities and are described by knowledge we can systematically acquire and generalize. The above discussion is incompletebecause spectral lines as extracted in our system cannot completely describe every type of speech unit. In this section of the chapter, we will limit our attention to vowels and diphthongs considered as SUs that can be described by spectral lines. The following section describes the details of an MLN trained for the speaker-independent recognition of 10 English vowels. This section also introduces extraction and description of spectral lines that can be used with a continuous-parameter and frequency-domain-basedMarkov model for vowel recognition. An algorithm for extracting spectral lines from speech spectrograms is presented in this section. This algorithm has been tested on spectra with interesting results. Work is in progress for applying it to the ear model spectra with the purpose of describing through spectral lines the dynamic properties of speech on large intervals. Spectral lines are extracted with a skeletonization algorithm from the timefrequency-energy patterns obtained by considering the 0-4 kHz portions of spectra computed with the Fast Fourier Transform (FFT) algorithm applied to the pre-emphasized speech signal.
5.1 The Skeletonization Algorithm
The time-frequency-energypattern for a given speech segment (see DeMori al. (1985) for the segmentation algorithm) generated by the FFT algorithm goes through two stages, namely thinning and preprocessing before deet
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
155
scription. The pattern is thinned using the Safe-Point Thinning Algorithm (SPTA) described by Naccache and Shinghal(l984). There are two important restrictions imposed on the choice of the skeletonization algorithm for our application, namely 1. connectivity of lines should be maintained by keeping the points at junctions, and 2. excess erosion shouldn’t be allowed.
The SPTA was chosen because it meets the above conditions. Figure 27 shows an example of such a pattern for the diphthong / a d / of /k/ before it is thinned, and Fig. 28 shows the thinned pattern. In Figs. 27 and 28, time increases along the horizontal axis and each printed line corresponds to a centisecond interval. Frequency is shown along the vertical axis. Intervals correspond to spectral peaks cut 6 dB below the maximum. Energy of spectral peaks is coded by letters and digits. Letter B represents twice the energy represented by letter A; digit 0 represents an energy that is twice the energy represented by Z, etc. Preprocessing on skeletonized patterns is performed for discarding all isolated, weak, and scattered points in the pattern. Preprocessing is carried out by applying an algorithm based on the strategy of tracing continuity. The Line Tracing Algorithm (LTA) retains properties like collinearity, curvilinearity, continuity etc. present in the pattern. The significant lines in speech patterns are usually surrounded by lines that are less significant. Thinning and preprocessing surface all significant and non-significant lines in the pattern and discard all scattered points. LTA accepts the skeletonized pattern and applies an algorithm for smoothing. The skeletonized pattern is a binary image which contains only dark and white points. The five neighbours of a point pi are defined to be the five points adjacent to 8. A continuous line, I, exists between points Pl and P, if there existsapathp,... e_le....P,such that e-l isaneighbourof e f o r 1 is n. A path between points pi and pi- exists if there exists at least one dark point among its neighbours. If more than one dark point exists among its neighbours points n,, - ni4, then the point nj with the maxinium energy is considered. If there exist more equally strong points, then the algorithm to find line 1 is recursively applied to find the line that is the longest from point 4. The algorithm, written in Pascal-like notation, is given in Table IV, and Fig. 29 shows the pattern in Fig. 27 after smoothing. The number of lines that appear in a pattern depends on thresholds that can be varied in order to have a desired effect. Our objective is to keep small the probability of losing formant lines. On the contrary, the methods for handling
-=
TABLE IV THELINETRACING ALGORITHM.
line-tracing-algorithm (pattern:spectrogram;var veetor :lines)
/ pattern is a binary image of the speech pattern / / vector will have all detected lines in the pattern / begin set line counter, k = 0; for each row in pattern do begin for each column in pattern do begin set line-end = false; while not line-end do begin look for dark-point, p . in pattern; compute-neighbours', n, of point p; lend-of-line found/ if n = 0 then fi rule 1 then begin increment line-counter, k; accept current-line as k; set line-end = true end if n = 1 then 11 neighbour for p j begin accept point p for line, k; set point p in pattern as white; set new neighbour as point p; continue tracing end /junction found/ if n > 1 then begin accept point p for line, k; set point p in pattern as white; set point p as strongest new neighbour; continue tracing end end-while end-do end-do end 1 : neighbours, n, is computed as,
n =ti-
+ +
+ + + + + 1, j )
1, j ) +(i - 1, j 1) (i, j + I ) (i I , j 1) (i where i and j point to the location of p in pattern.
2 : rule 1 = true, if k ( p ) > h and k(h) >
where, k is the kIh line p is the number of points in line k
h is the height of line, k and $* were empirically determined constants.
0.5
1.0
1.5
2.0
43 44 45 46 41 48 49 50 51 52 53 54 55 56 51 58 59 60
61 62 63 64 65 66 61 68 69 10 11
FIG.29. The pattern of Fig. 27 after smoothing.
2.5
3.0
KHa
158
RENATO DEMORI e t a / .
spectral lines that will be proposed in the following are well suited for taking into account redundant lines. Various solutions can be investigated for reducing the number of redundant lines due to pitch effects. They include the possibility of using pitch synchronous FFT or cepstral analysis in selected time intervals and using their results on a filter for lines generated with asynchronous FFT. Such filters are applied in such a way that a sufficient number of lines is kept in at least three frequency bands in which a set of formants may be present. Line filters are still under investigation.
5.2 Line Description Spectral lines can be described at several levels. At the lowest level, each line is described as an independent object whose relations with other objects (lines) are not considered. Higher-level descriptions involve relations between objects (lines) both in the time and frequency domain. A spectral line is described by a vector V, of triplets (tji,hi, eji)( j = 1,. . ., J ; i = 1,. ..,Z,) where tji is a time reference in centiseconds, Ji is a frequency value in Hz and eji is an energy value in dB. J is the total number of spectral lines in a pattern; Zj is the number of time frames (a time frame usually has a 10 msec duration) corresponding to the duration of the j t h line. The ith sample of the jth line is represented by its time value tji, its frequency Ji and
k
tbk
1. 2. 3.
41 41 42 42 41 46 45 51 58 65 74 74
4.
5. 6. 7. 8. 9.
A.
B. C.
tek
65 49
78 78 81 57 56 74 63 81 83 83
f bk
f ek
f rnk
fMk
eak
2754 3078 513 540 2100 1800 3024 2910 3207 2205 207 2016
2700 2016 540 783 2727 2187 3210 3375 3348 2565 207 3213
2511 2016 450 540 2100 1800 3024 2016 3267 2305 207 2016
2754 3078 540 783 2727 2187 3240 3375 3348 2505 324 3213
8.1 8.3 8.1 7.8 7.9 8.1 8.8 8.3 8.4 6.8 7.1 5.6
FIG.30. Description of the pattern of Fig. 14.
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
159
its energy eji. The line bandwidth is not considered because it is in principle redundant and in practice difficult to estimate. Figure 30 shows a description of spectral lines represented in Fig. 14. 5.3
5.3.1
Description of Frequency Relations Among Spectral Lines
Generalities
Frequency relations among Spectral Lines (SL) can be expressed in many ways. A particularly interesting set of descriptors is the class of Places of Articulation (PA) defined by the following vocabulary:
C , : { F P : front - place, C P : central - place, BP : back - place}. (22) The literature on acoustics and phonetics is rich in work relating place of articulation to spectral morphologies. From this knowledge we can expect different relations between SLs and PAS depending on the nature of speech segments. For some sounds, such as plosives, relations involve SL transitions; for some other sounds, such as non-nasalized sonorants, interesting relations can be established between PAS and spectral lines that are quasi-stationary in time. The inference of the latter type of relations will be discussed in this section. Any speech interval containing only horizontal lines can be assumed to be quasi-stationary. The same assumption can be made with other intervals obtained by segmenting larger segments into smaller ones in which lineparameter variations are modest. Large portions of a speech signal can be characterized in terms of quasi-stationary intervals of variable length. These intervals can be further segmented in order to obtain fixed-length intervals each one of which can be described by PA hypotheses using relations with SL. Place of Articulation is a very useful, although often not sufficient, feature for describing speech patterns. For some vocabularies, like the one consisting of letters and digits, PAS are different for characterizing most of the vowels and diphthongs. Different speakers produce different spectral lines for the same PA, but such variations have constraints that can be expressed statistically on distortions, insertions and deletions of spectral lines. In order to obtain more adequate descriptions of speech patterns, other features have to be considered. In this section the possibility of using spectral lines for the recognition of the manner of articulation together with the place of articulation will also be considered. This will make it possible to generate hypotheses about all the vowels. We will concentrate, in the rest of this section, on the recognition of vowels and PAS in quasi-stationary, non-nasalized speech intervals.
160
5.3.2
RENATO DEMORI eta/.
Parameter Characterization and Structure of Statistical Relations Between SLs and Phoneme Classes
Speech segments corresponding to vowels extracted from the pronunciation of letters, digits and words containing 10 English vowels have been used. In order to learn statistical relations of SLs, a learning set was prepared in which vowel labels were assigned to segments using an automatic procedure made possible by the choice of words used for learning. For each labeled interval, SLs were extracted and each spectral line was represented by two parameters corresponding to its frequency and its associated spectral energy. In order to introduce a sort of normalization, rather than using frequencies and energies, differencesbetween the frequency and energy of each line and the frequency and energy of a base-line are used. The base-line is the line of highest energy in the low-frequency range. The description of a quasi-stationary interval is a string of vectors of the form (23) Y = y1, y2 * * * Ym * Each vector of the sequence Yrepresents a line in the pattern. The first vector y1 corresponds to the base line. The remaining lines of the pattern are sorted by frequency. Each vector has two components defined as follows:
where B,, : frequency of the line, BI2: energy of the line, Bi, = J - B
11
,
Bi2
= eai - B12,
f;:
= frequency of
the ith sorted line in the pattern,
eai = energy of the ith sorted line in the pattern.
Figure 31 shows a speech interval with the corresponding vector Y as defined by Equations (23) and (24). A Markov source is introduced to model a process that generates frequencies and energies of spectral lines. The model includes formants,
I
( I l l 1 8 1 1 N d O O I , , I l l I I I I I I I I I I
I 1 I I
I I
I I
l
l
0
I N N
. I- I I d 1I I I
I
I
I
I
I
I
I
I
cnrlrlorloooo
162
RENATO DEMORI e t a / .
spurious lines and lines corresponding to a split of a formant into two lines. Frequency and amplitude distributions are associated with each transition in the model. The model is conceived in such a way that variances of the distributions are kept small, so that each distribution really represents variation due to inter-speaker differences of the parameters of a line having specific structural properties. Distortions of frequency and energy differencesare assumed to have normal distributions and to be statistically independent. A model without such simplifying assumptions would have been more realistic, but it would have implied practical complications. We decided to avoid them and to build a manageable model to be eventually compared in the future with more complex ones. The statistical relations between SLs and the corresponding classes are characterized by a CDHMM (Continuous Density Hidden Markov Model). A CDHMM is a Markov model in which transitions produce vectors of parameters. The probability p(sn,sk) is the probability of choosing the transition from state s, to state sk when the state s, is reached, and q(s,, sk, y i ) is the probability that the vector yi = B i l , Biz is generated in the transition from s, to sk. The collection of the probability distributions of frequencies and energies describes a transition. A transition T(s,,sk, yi)is then described by the following matrix:
where mi, is the mean and ailis the standard deviation of parameter Bit.In our case, I can be either 1 (frequency) or 2 (amplitude). 5.3.3 Learning and Recognition Method
The Forward-Backward algorithm (Baum, 1972) has been used both for learning and for recognition purposes. During learning and recognition a scaling technique similar to the one described by Levinson et al. (1983) has been adopted. In the recognition process, the probability p(Y/Mj)is computed with the Forward-Backward algorithm. Y is an input string of vectors as defined by Equation (23), and A4, is a CDHMM corresponding to a symbol of the vocabulary of features to be recognized. The string Y is assigned to the ith
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
163
class if
and
dYIMi)- P(y/Mk) > C i k ,
(27) where cik is the threshold of confusion between Mi and Mk(the Markov source corresponding to the second highest score). If P(YIMi) - P(Y/Mk) 5 Cik,
(28) then class is decided (if a local decision has to be made) according to rules. Two experiments of speaker-independent ASR have been performed. The vocabularies to be recognized were PAS and vowels. In a program for the automatic recognition of PA for vowels, the following rules can be used in order to improve the decision when probabilities for “back” and “front” places of articulation are very close: 1) if g1 > g2 then “PA = PB” 2) if g1 < g2 then “PA = PF,”
5.3.4
Experiments on the Recognition of Stationary Segments
In order to investigate the possibility of using spectral lines and CDHMM for ASR, an experiment has been set up for the recognition of English vowels. A signal database has been built by asking 20 speakers (10 male and 10 female) to pronounce the monosyllabic words shown in Table V. Each speaker read a randomly ordered list which included 40 occurrences of each word from Table V. Every pronunciation of every word was then processed using a network of HP workstations including one HP 9000-236 especially equipped for speech processing, an HP 9000-236 and an HP 9000-330.Table V contains also a five-word vocabulary containing vowels that are common to a number of languages other than English.
164
RENATO DEMORI et el.
TABLEV THEVOCABULARY USED FOR VOWELRECOGNITION. Vocabulary for vowel recognition
5 Vowels
10 Vowels
bed
bat bed beep boot but far fur pit Put saw
&P boot but saw
Task decomposition among units was performed as suggested by DeMori et al. (1985). Fourier transformations primary acoustic cues as defined by DeMori et al. (1985) and spectral lines were computed for each word in roughly 10 times real-time. For each word pronunciation, three vowel samples were automatically extracted using the PAC description. A vowel sample was extracted in the middle of the vowel in an interval of 60 msec duration. Learning was performed using data from 10 speakers (five male and five female). Recognition was performed using data from the other 10 speakers. Markov chains were built in the following way. The frequency range from 0.1 to 3.5 kHz was subdivided into intervals. A basic chain was built by considering a linear sequence of a state and a transition, each transition corresponding to a frequency interval. Other transitions were then added in order to allow each state to reach any of the states following it. Figure 32 shows the general structure of a CDHMM for the recognition of vowels as it is set up before starting a learning phase. A transition = (sn, sk, y i ) departing from state s, is associated with the mean of the difference between the frequency and energy of the nth line and the frequency and energy of the base line. Each transition is also associated with a transition probability not shown in Fig. 32 for the sake of simplicity. At the beginning all the transitions having the same destination state are associated with the same means and standard deviations. Chains conceived with the criteria mentioned above have been constructed and used for learning and testing vowel models. A tabular description of a Markov chain for a front vowel is given in Table VI.
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
165
FIG.32. General structures of CDHMMs for the recognition of vowels.
TABLE VI TRANSITION PROBABILITIES OF A CDHMM Startpoint
Endpoint
Probability
m1
01
m2
02
0 1 1 1 1 2 4
1 2 4 5 6 6 6 6 7 7 8 8 9 9
1.0 0.01 0.01 0.406 0.573 1.o 1.o 0.821 0.179 0.966 0.034 0.286 0.714
318.1 117.0 1445.0 1696.8 1978.5 1992.0 1953.0 2048.1 2533.4 2592.9 2890.3 3082.9 3275.0 3490.2
675 855 130502 12652 1781 248004 238388 8774 3228.5 4604 48880.5 6570 18647 14750
6.57 0.313 - 1.669 -0.924 0.567 1.072 0.7 0.539 0.325 0.811 0.383 0.106 0.319 0.023
0.081 0.006 0.174 0.923 0.138 0.072 0.031 0.173 0.057 0.121 0.663 0.409 0.118 0.220
5 5 6 6 7 7 8
1.o
m, = mean of relative frequency, ut = variance of relative frequency, m 2 = mean of relative amplitude, v2 = variance of relative amplitude
166
RENATO DEMORI e t a / .
The first experiment concerned learning and recognition of the place of articulation as defined by Equation (22) for the five-vowel vocabulary of Table V. A second experiment concerned learning and recognition of the fivevowel vocabulary. Two other experiments were performed using half of the data from each speaker for learning and the other half for recognition. The task of the latter experiments was learning and recognition of the place of articulation and of the five vowels in a multispeaker mode, while the task of the first two experiments involved the same classes in a speaker-independent mode. Finally, two other experiments have been conducted in the speakerindependent and the multispeaker mode using the 10-vowelvocabulary. The results of these experiments are summarized in Table VII. The results in Table VII clearly show that spectral lines and CDHMMs are comparable to those obtained with MNLs for the recognition of three places of articulation but not for vowels having remarkably different place or manner of articulation. This suggests that spectral lines can be useful for discrimination of vowels not having close place and manner of articulation. Nevertheless, the recognition of 10 vowels was not performed satisfactorily with the method proposed above. In order to improve the recognition performances of the 10 vowels, attempts were made to introduce discrimination rules for cases characterized by relations like those in Equation (28). In order to avoid the tedious work of manually inferring rules by experiments, another learning and recognition paradigm was tried base on Multi-Layered Networks (MLNs). The reason for such a choice is that MLNs allow one to perform competitive learning and to discover pattern regularities. These aspects were found particularly attractive for the case of vowels because some of them are so similar that competitive learning is a more suitable paradigm for discovering regularities that enhance differences among TABLE VII RESULTSOF THE RECOGNITIONOF VOWELS.
Recognition Results for Vowels Mode Task
Multispeaker CDHMM
Speaker-independent CDHMM
Speaker-independent MLN
place of articulation 5 vowels 10 vowels
97.1 97 73.6
95.2 95 69.9
96.9 96.6 87
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
167
pattern classes. Furthermore, MLNs can perform speaker normalization by learning functions of SLs in accordance with hypotheses made by other researchers that speaker normalization should involve relations between formant frequencies. In a first experiment 64 spectral samples were sent to an MLN with 40 nodes in the first hidden layer, 20 nodes in the second hidden layer and 10 nodes in the output layer. The weights of the connections among nodes were learned using the Error Back Propagation Algorithm (see Rumelhart et al. (1986) for details). An error rate of 20.4%was obtained. A second experiment was executed by using only spectral lines coded as proposed by Bengio and DeMori (1988) using an MLN with 320 input nodes and 200 nodes in each hidden layer. An error rate of 18.8% was obtained, showing that SLs are a good coding of speech spectrograms. The two MLN outputs were combined together using heuristic rules inferred from the training set, and an error rate of 13% was obtained. The obvious conclusions are that SLs contain enough information for discrimination among vowels and that MLNs show remarkable advantages especially when the task requires fine discrimination. 6.
Conclusions
In this chapter we have presented several past approaches and some current trends in automatic speech recognition research. The problem of speech recognition is approached in two ways: 1) using models based on speech production, and 2) using models based on speech perception. Most of the past work in this area used the first approach. Several systems with certain restrictions have been developed successfully so far. However, one of the most difficult problems in ASR, speaker-independent recognition of human speech, is still unsolved. Recently, much attention is being given to attacking the speech recognition problem using speech perception models rather than speech production models. Recent advances in understanding in the area of human auditory perception are the major reason for this new trend. At the same time, several new methods have emerged for the machine implementation of the various recognition models. Simple template matching, more reliable DP-matching techniques, sophisticated network models, and expert system models all contributed towards the building of today’s ASR systems. Again, a recent trend is to use artificial neural network models for the machine implementation of ASR systems. The work reported in this chapter shows that a combination of an ear model and multi-layer networks makes possible an effective generalization among speakers in coding vowels. This work also suggests that the use of speech
168
RENATO DEMORI e t a / .
knowledge organized as morphological properties is robust enough to handle inter- and intra-speaker variations. Results obtained from various experiments shows that ear models combined with MLNs are most desirable in speaker-independent ASR systems. Furthermore, it can be concluded from our investigation that robust speaker-independent properties can be obtained by using “neurograms” instead of spectrograms, as suggested in Section 5.1. The results obtained in the speaker-independent recognition of 10 vowels add a contribution that justifies the interest in the investigation of the use of MLNs for ASR. Furthermore, training a set of MLNs on a number of well distinguishable vowels makes possible a very good generalization to new vowels and diphthongs if recognition is based on features. By learning how to assign degrees of evidence to articulatory features, it is possible to estimate normalized values for the place and manner of articulation which appear to be highly consistent with qualitative expectations based on speech knowledge. Effective learning and good generalizations can be obtained using a limited number of speakers, in analogy with what humans do. Performance models of the time evolutions of evidences or derived parameters like CGP and CGM can be made using Hidden Markov models. Degrees of evidences can be used as “pseudo-probabilities,” parameters and evidences can be vector-quantized or their continuous densities can be estimated for the models. Speech coders that produce degrees of evidence of phonetic features can be used for fast lexical access, for word spotting, for recognizing phonemes in new languages with limited training, or for constraining the search for the interpretation of a sentence. The Error Back Propagation Algorithm seems to be a suitable one for learning weights of internode links in MLNs. A better understanding of the problems related to its convergence is a key factor for the success of an application. The choice of the number of MLNs, their architecture, the coding of their input and output are also of great importance, especially for generalization. The computation time of the system proposed in this chapter is about 150 times real-time on a SUN 4/280. The system structure is suitable for parallelization with special-purpose architectures and accelerator chips. It is not unrealistic to expect that with a suitable architecture, such a system could operate in real-time. The results show the effectiveness of the use of spectral lines and performance models of their distortions in the recognition of sequences of places of vowels. It is likely that a larger number of speakers and the use of MLNs for characterization transients would allow us to obtain a better characterization of spectral line distortions in quasi-stationary vocalic segments. Different
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
169
speaking modes are likely to produce different distortions on expected pattern morphologies. As the system is rather robust, a systematic analysis of its error should suggest the use of other transition properties, a better implementation of the actions for extracting them and a better statistical characterization of their distortions. ACKNOWLEDGMENTS This research was carried out at the Centre de Recherche en Informatique de Montreal and was supported by the Natural Sciences and Engineering Council of Canada under grant No. A2439. The donation of a workstation from the Hewlett Packard Research Laboratories (Palo Alto, California) is acknowledged. Yoshua Bengio implemented the MLN algorithms. Regis Cardin implemented a fast version of the algorithm for extracting spectral lines.
REFERENCES Aldefeld, B., Levinson, S. E., and Szymanski, T. G . (1980).A minimum distance search technique and its application to automatic directory assistance. Bell System Tech. J . 59, 1343-1356. Averbuch, A., et al. (1987).Experiments with the Tangora 20,000-word Speech Recognizer. Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Dallas. Bahl, L. R., Baker, J. K., Cohen, P. S., Cole, A. G., Jelinek, F., Lewis, B. L., and Mercer, R. L. (1979). Automatic recognition of continuously spoken sentences from a finite state grammar. Proc. IEEE Int. Con$ Acoustics; Speech, and Signal Processing, Washington, D.C., pp. 418-421. Bahl, L. R., Jelinek, F., and Mercer, R. L. (1983).A maximum likelihood approach to continuous speech recognition. IEEE Trans. Patter,n Anal. Machine Intell. PAMI-5, 179-190. Baker, J. K.(1975a).The DRAGON system-An overview. IEEE Trans. Acoustics, Speech, and Signal Processing ASSP-23,24-29. Baker, J. K. (1975b) Stochastic modeling for automatic speech understanding. In “Speech Recognition” (D. R. Reddy, ed.), pp. 521-542. Academic Press, New York. Baum, L. E. (1972). An inequality and associated maximization technique in the statistical estimation for probabilistic functions of Markov processes. Inequalities 3, 1-8. Bengio, Y., and DeMori, R. (1988). Use of Neural Networks for the Recognition of the Place of Articulation. Proc. IEEE Int. Conf. Acoustic, Speech and Signal Processing, New York, pp. 103-106. Bourlard, H., and Wellekens, C. J. (1987). Multilayer perception and automatic speech recognition. IEEE First Int. Conf.Neural Networks, San Diego, pp. IV407-IV416. Bourlard, H., Wellekens, J., and Ney, H. (1984). Connected digit recognition using vector quantization. Proc. Int. Conf. Acoustics, Speech, and Signal Processing, San Diego, pp. 26.10.126.10.4. Chen, F. R. (1980). Acoustic-Phonetic Constraints in Continuous Speech Recognition: A Case Study Using the Digit Vocabulary. Ph.D. thesis, MIT. Cohen, P. S., and Mercier, R. L. (1975). The phonological component of an automatic speech recognition system. In “Speech Recognition” (D. R. Reddy, ed.), pp. 275-320. Academic Press, New York. Delgutte, B. (1980). Representation of speech-like sounds in the discharge patterns of auditorynerve fibers. J. Acoustical Society of America 68, 843-857.
170
RENATO DEMORI eta/.
Delgutte, B., and Kiang, N. Y. S. (1984a). Speech coding in the auditory nerve: I. Vowel-like sounds. J. Acoustical Society of America 75,866-878. Delgutte, B., and Kiang, N. Y. S. (1984b). Speech coding in the auditory nerve: 11. Processing schemes for vowel-like sounds. J . Acoustical Society America 75,897-907. Delgutte, B., and Kiang, N. Y. S. (1984~).Speech coding in the auditory nerve: 111. Voiceless fricative consonants. J . Acoustical Society of America 75, 887-896. Delgutte, B., and Kiang, N. Y. S. (1984d). Speech coding in the auditory nerve: IV. Sounds with consonant-like dynamic characteristics. J . Acoustical Society America 75,897-907. DeMori, R. (1973).A descriptive technique for automatic speech recognition. IEEE Trans. Audio Electroacoust. AU-21,89-100. DeMori, R., and Probst, D. (1986).Computer Recognition of Speech. In “Handbook of Pattern Recognition and Image Processing” Academic Press, New York. DeMori, R., Laface, P., and Mong, Y. (1985). Parallel algorithms for syllable recognition in continuous speech. IEEE Trans. Pattern Anal. Machine Intell. PAMI-7 (l), 56-69. DeMori, R., Lam, L., and Gilloux, M. (1987a). Learning and plan refinement in a knowledgebased system for automatic speech recognition. IEEE Trans. Pattern Anal. Machine Intell. PAMI-9,289-305. DeMori, R., Merlo, E., Palakal, M., and Rouat, J. (1987b). Use of procedural knowledge for automatic speech recognition. Proc. Tenth Int. Joint Conf. ArtiJicial Intelligence, Milan. Erman, L. D., Fennel, D. R., Neely, R. B., and Reddy, D. R. (1976). The HEARSAY-I speech understanding system: An example of the recognition process. IEEE Trans. Comput. C-ZS, 422-431. Fant, G. (1966). A note on vocal tract size factors and non-uniform F-pattern sealings. Quarterly Progress and Status Report 4/66, Speech Transmission Laboratory, Royal Institute of Technology, Stockholm, pp. 22-30. Geisler, N. Y. S., Watanabe, T., Thomas, E. C., and Clark, L. F. (1965). “Discharge Patterns of Single Fibers in the Cat’s Auditory- Nerve Fibers.”MIT Press, Cambridge, Massachusetts. Gray, R. M. (1984).Vector Quantization. IEEE ASSP Magazine 1 (2), 4-29. Haton, J-P. (1980). Present Issues in Continuous Speech Recognition and Understanding. In “Trends in Speech Recognition” (W. A. Lea, ed.), pp. 3-50. Lawrence Erlbum Assoc., Hillsdale, New Jersey. Haton, J-P. (1984).Knowledge-Based and Expert Systems in Automatic Speech Recognition. In “New Systems and Architectures for Automatic Speech Recognition and Synthesis”(R. DeMori and C. Y. Suen, eds., NATO Advanced Study Institute). Springler-Verlag. Haton, J-P., and Pierrel, J. M. (1980). Syntactic-semantic interpretation of sentences in the MYRTILLE-I1 speech understanding system. Proc. Int. Conf. Acoustics, Speech, and Signal Processing, Denver, pp. 892-895. Heffner, R-M. S. (1950). “General Phonetics.” The University of Wisconsin Press, Madison, Wisconsin. Hinton, G. E., and Sejnowski, T. J. (1986). Learning and relearning in Boltzmann machines. In “Parallel Distributed Processing: Exploration in the Microstructure of Cognition,” Vol. 1, pp. 282-317. MIT Press, Cambridge, Massachusetts. Jakobson, R., Fant, C. G. M., and Halle, M. (1952). “Preliminaries to Speech Analysis: The Distinctive Features and their Correlates.” MIT Press, Cambridge, Massachusetts. Jelinek, F. (1976). Continuous Speech Recognition by Statistical Methods. Proc. IEEE 64 (4), 532-556. Jelinek, F., and Mercer, R. L. (1980).Interpolated estimation of Markov source parameters from sparse data. In “Pattern Recognition in Practice”(E. S. Gelsema and L. N. Kanal,eds.), pp. 381402. North Holland. Amsterdam. Jelinek, F., Bahl, L. R., and Mercer, R. L. (1975).Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech. IEEE Trans. Infor. Theory IT-21(S), 250-256.
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
171
Kimball, O., Cosell, L., Schwartz R., and Krasner, M. (1987). Efficient implementation of continuous speech recognition on a large scale processor. Proc. Int. Conf. Acoustics, Speech and Signal Processing, Dallas, pp. 852-855. Kohonen, T., Riittinen, H., Jalanko, M., Reuhkala, E., and Haltsonen, E. (1980).A thousand word recognition system based on learning subspace method and redundant hash addressing. Proc. Fifth Int. Conf. Pattern Recognition, Miami Beach, Florida, pp. 158-165. Kopec, G. (1985). Formant tracking using Hidden Markov models. Proc. Int. Conf. Acoustics, Speech and Signal Processing, Tampa, Florida, pp. 1 1 13-1 116. Kopec, G., and Bush, M. (1985). Network-based isolated digit recognition using vector quantization. IEEE Trans. Acoustics, Speech and Signal Processing ASSP-33, 850-856. Lea, W. A. (1979).The Value of Speech Recognition Systems. In “Trends in Speech Recognition” (W. A. Lea, ed.). Englewood Cliffs, New Jersey. Lea, W. A,, Medress, M. F., and Skinner, T. E. (1975).A prosodically guided speech understanding strategy. IEEE Trans. Acoustics, Speech, and Signal Processing ASSP-23,30-38. Lesser, V. R., Fennel, R. D., Erman, L. D., and Reddy, D. R. (1975).Organization of the Hearsay 11 Speech Understanding System. IEEE Trans. Acoustics, Speech, and Signal Processing ASSP-23, 11-24. Leung, H. C., and Zue, V. W. (1988).Some phonetic recognition experiments using artificial neural nets. Proc. Int. Conf. Acoustics, Speech, and Signal Processing, New York, pp. 422-425. Levinson, S. E. (1479).The effects of syntactic analysis on word recognition accuracy. Bell System Tech. J. 57, 1627- 1644. Levinson, S . E. (1985). Structural Methods in Automatic Speech Recognition. Proc. IEEE 73 ( l l ) , 1625-1650. Levinson, S . E., Rabiner, L. R.,and Sondhi, M. M.(1983).An introduction to theapplication of the theory of probabilistic functions of a Markov process to automatic speech recognition. Bell System Tech. J . 62, 1035-1074. Mercier, G., Nouhen, A., Quinton, P., and Siroux, J. (1979). The KEAL Speech Understanding System. In “Spoken Language Generation and Understanding,” Proc. NATO Advanced Study Institute, Bonas, France (J. C. Simon, ed.), pp. 525-544. D. Reidel, Dordrecht, The Netherlands. Miller, G. A., Heise, G. A., and Lichten, W. (1951).The intelligibility of speech as a function of the context of the test materials. J . Experimental Psychology 41,329-335. Miller, M. I., and Sachs, M. B. (1983).Representation of stop consonants in the discharge patterns of auditory-nerve fibers. J. Acoustical Society of America 74,502-517. Moore, R. K. (1984). Systems for Isolated and Connected Word Recognition. In “New Systems and Architectures for Automatic Speech Recognition and Synthesis” (R. DeMori and C. Y.Suen, eds., NATO Advanced Study Institute). Springler-Verlag. Myers, C. S., and Levinson, S. E. (1982).Speaker independent connected word recognition using a syntax directed dynamic programming procedure. IEEE Trans. Acoustics, Speech, and Signal Processing ASSP-30,56 1- 565. Myers, C. S., and Rabiner, L. R. (1981).A Level Building Dynamic Time Warping Algorithm for Connected Word Recognition. IEEE Trans. Acoustics, Speech, and Signal Processing ASSP-29, 284-297. Naccache, N. J., and Shinghal, R. (1984). SPTA: A proposed algorithm for thinning binary patterns. IEEE Trans. Systems. Man and Cybernetics SMC-14 (3),409-419. Nakatsu, R., and Kohda, M. (1978).An acoustic processor in a conversational speech recognition system. Rev. ECL 26, 1505-1520. Ney, H. (1984). The use of a one stage dynamic programming algorithm for connected word recognition. IEEE Trans. Acoustics, Speech, and Signal Processing ASSP-32,263-271. Oppenheim, A. V., and Schafer, R. W. (1968). Homomorphic Analysis of Speech. IEEE Trans. Audio and Electroacoustics AU-16 (2), 221 -226.
172
RENATO DEMORI eta/.
Perennou, G. (1982).The ARIAL I1 speech recognition system. In “Automatic Speech Analysis and Recognition,” Proc. NATO Advanced Study Institute (J-P. Haton, ed.), pp. 269-275. D. Reidel, Dordrecht, The Netherlands. Plout, D. C., and Hinton, G. E. (1987). Learning sets of filters using back propagation. Computer Speech and Language 2,35561. Prager, J., et al. Segmentation Processes in the Visions Systems. Fifh lnt. Joint Conf. Artificial Intelligence, Cam bridge. Rabiner, L. R., and Schafer, R. W. (1978).“Digital Processing of Speech Signals.” Prentice Hall, Inc., Englewood Cliffs. New Jersey. Rabiner, L. R., Levinson, S. E., and Sondhi, M. M. (1983). On the application of vector quantization and Hidden Markov models to speaker independent isolated word recognition. Bell System Tech. J . 62, 1075-1 105. Rosenberg, A. E., Rabiner, L. R., Wilpon, J. G., and Kahn, D. (1983). Demisyllable based isolated word recognition system. I E E E Trans. Acoustics, Speech, and Signal Processing, ASSP-31, 7 13-726. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning internal representation by error propagation. “Parallel Distributed Processing: Exploration in the Microstructure of Cognition,” Vol. 1, pp. 318-362. MIT Press, Cambridge, Massachusetts. Ruske, G., and Schotola, T. (1982).The efficiency of demisyllable segmentation in the recognition of spoken words. In “Automatic Speech Analysis and Recognition,” Proc. NATO Advanced Study Institute (J. P. Haton, ed.). Reidel, Dordrecht, The Netherlands. Sachs, M. B., and Young, E. D. (1980).Effects of nonlinearities on speech encoding in the auditory nerve. J . Acoustical Society of America 68,858-875. Sakoe, H. (1979).Two-Level DP-Matching-A Dynamic Programming-Based Pattern Matching Algorithm for Connected Word Recognition. I E E E Trans. Acoustics, Speech, and Signal Processing ASSP-27,588-595. Scagliola, C. (1983). Continuous speech recognition without segmentation: Two ways of using diphones as basic speech units. Speech Commun. 2, 199-201. Schafer, R. W., and Rabiner, L. R. (1975). Digital Representations of Speech Signal. Proc. I E E E 63,662-667. Schwartz, R. M. (1982). Acoustic Phonetic Recognition. Sixth lnt. Conf. Pattern Recognition, Munich, pp. 925-965. Seneff,S. (1984).Pitch and spectral estimation of speech based on an auditory synchrony model. Proc. lnt. Conf.Acoustics, Speech and Signal Processing, San Diego. Seneff, S. (1985). Pitch and spectral analysis of speech based on an auditory synchrony model. RLE Technical Report No. 504, MIT. Seneff, S. (1986). A computational model for the peripheral auditory system: application to speech recognition research. Proc. lnt. Conf. Acoustics, Speech and Signal Processing, Tokyo, pp. 37.8.1-37.8.4. SenefT, S.(1988).A joint synchrony/mean-rate model of auditory speech processing. J . Phonetics, January, 55-76. Shipman, D. W., and Zue, V. W. (1982). Properties of large lexicons: Implications for advanced isolated word recognition systems. Proc. lnt. Cons. Acoustics, Speech, and Signal Processing, Paris, pp. 546-549. Sinex, D. G., and Geisler, C. D. (1983). Responses of auditory-nerve fibers to consonant-vowel syllables. J . Acoustical Society of America 13,602-615. Skinner, T. (1977). Speaker Invariant Characterizations of Vowels, Liquids, and Glides Using Relative Formant Frequencies. J . Acoustic Society of America 62, supplement 1, p. 821. Stevens, K. N. (1977). Acoustic correlates of some phonetic categories. J. Acoustic Society of America 62, 1345- 1366.
PERCEPTUAL MODELS FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
173
Waibel, A., Hanazawa, T., and Shikano, K. (1988). Phoneme recognition: neural networks vs. hidden Markov models. Proc. Int. Conf. Acoustics, Speech and Signal Processing, New York, paper 8.S3.3. Wakita, H. (1977). Normalization of vowels by vocal tract length and its application to vowel identification. I E E E Trans. Acoustics, Speech, and Signal Processing, ASSP-25, 183-192. Walker, D. E. (1Y75).The SRI Speech Understanding System. I E E E Trans. Acoustics, Speech, and Signal Processing ASSP-23, 397-416. Watrous, R. L., and Shastri, L. (1987). Learning phonetic features using connectionist networks. Proc. Tenth Int. Joint Con$ ArtiJcial Intelligence, pp. 851-854. White, G. M., and Fong, P. J. (1975). k-nearest-neighbour decision rule performance in a Speech Recognition System. I E E E Trans. Systems, Man and Cybernetics, 5, 389. Woods, W. A. (1975). Motivation and overview of SPEECHLIS: An experimental prototype for speech understanding research. I E E E Trans. Acoustics, Speech, and Signal Processing ASSP23,2-10. Young, E. D., and Sachs, M. B. (1979). Representation of steady-state vowels in the temporal aspects of the discharge pattern of populations of auditory nerve fibers. J. Acoustical Society of America 66,1381-1403. Zue, V. W., and Lamel, L. F. (1986). An Expert Spectrogram Reader: A Knowledge-Based Approach to Speech Recognition. I E E E Int. Conf. Acoustics, Speech and Signal Processing, Tokyo, pp. 1197-1200. Zue, V. W., and Schwartz, R. M. (1979).Acoustic Processing and Phonetic Analysis. In “Trends in Speech Recognition” (W. A. Lea, ed.) Englewood Cliffs, New Jersey.
This Page Intentionally Left Blank
Availability and Reliability Modeling for Computer Systems DAVID I. HEIMANN AND NlTlN MITTAL Digital Equipment Corporation. Andover. Massachusetts
KISHOR S. TRlVEDl Computer Science Dept . Duke University Durham. North Carolina
1. Introduction . . . . . . . . . . 1.1 What is Dependability? . . . . 1.2 Why Use Dependability?. . . . 1.3 Where is Dependability Used? . . 2. Measures of Dependability . . . . . 2.1 Classes of Dependability Measures 2.2 Guidelines for a Choice of Measure 2.3 The Exponential Distribution . . 2.4 An Introductory Example . . . 2.5 System Availability Measures . . 2.6 System Reliability Measures . . . 2.7 Task Completion Measures . . . 2.8 Summary of Measures . . . . 3. Types of Dependability Analyses . . . 4. The Modeling of Dependability . . . 4.1 Model Solution Techniques . . . 4.2 Parameter Determination . . . 4.3 Model Validation and Verification. 5. A Full-System Example . . . . . . 5.1 System Description . . . . . . 5.2 Dependability Analysis . . . . 5.3 Evaluations Using Other Measures 6. Conclusions . . . . . . . . . . Acknowledgments . . . . . . . . References. . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
175 ADVANCES IN COMPUTERS. VOL . 31
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
176 176 177
179 180 180 180 182 184 186 190 196 198 200 201 205 209 216 218 218 219
225 229 230 231
.
Copyright 0 1990 by Academic Press Inc. All rights of reproduction in any form reserved. ISBN 0-12-012131 -X
176
DAVID I. HEIMANN e t a / .
1. lntroductlon
This paper addresses computer system dependability analysis, which ties together concepts such as reliability, maintainability and availability. It serves, along with cost and performance, as a major system selection criterion. Three classes of dependability measures are described: system availability, system reliability, and task completion. Using an introductory example, measures within each class are defined, evaluated, and compared. Four types of dependability analyses are discussed: evaluation, sensitivity analysis, specification determination, and tradeoff analysis. Markov and Markov reward models, commonly used for dependability analysis, are reviewed, and their solution methods are discussed. The determination of the parameters, such as failure rates, coverage probabilities, repair rates, and reward rates, is discussed, as well as model verification and validation. To demonstrate the use of these methods, a detailed dependability analysis is carried out on a full-system example representative of existing computer systems. 1.1 What is Dependability?
All kinds of people involved with computers, whether as designers, manufacturers, software developers, or users, are very much interested in determining how well their computer system is doing its job (or would do the job, if they are considering acquiring such a system). As with most other products and services, the people involved want to know whether their money has been (or would be) well spent and whether what they need is in fact being provided. At first, this assessment naturally takes the form of determining the faultfree performance, or level of service, of the system. People have become aware, often by bitter experience, that not only must they know how much service a computer system can deliver, but also how often it in fact delivers that intended level of service. Similar to other products, a computer system becomes far less attractive if it frequently deviates from its nominal performance. In fact, in many cases people would prefer a system that faithfully delivers its level of service to an alternative system that does not, even if the latter system delivers more service over the long run. There has therefore been a definite need to assess this “faithfulness to the intended level of service.” Generally, this assessment first takes the form of determining how frequently the system fails to function, or, similarly, the length of time the system operates until such a failure. This assessment has developed into the field of reliability. However, a complete assessment also requires consideration of the time
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
177
needed for the system to recover its level of service once a failure takes place, or, more broadly, what impact the failure has on service. The characterization of repair and recovery is embraced by the topic of maintainability. The concepts of reliability and maintainability have been combined to produce the concept of availability. Since terms such as reliability and availability are used in a precise mathematical sense as well as in a generic sense, the term dependability is used to refer to the generic concept. The International Electrotechnical Vocabulary (IEV 191, 1987) and Laprie (1985) define dependability as the ability of a system or product to deliver its intended level of service to its users, especially in the light of failures or other incidents that impinge on its level of service. Dependability manifests itself in various ways. In an office word-processing system, it can be the proportion of time that the system is able to deliver service. In a manufacturing process-monitoring system, it can be the frequency of times per year that its control-system failure causes the manufacturing line be shut down. In a transaction-processing system it can be the likelihood that a transaction is successfully completed within a specified tolerance time. Dependability, measuring as it does the ability of a product to deliver its promised level of service, is a key descriptor of product quality. In turn, quality is a key aspect, along with product cost and performance, that customers use not only in making purchases of specific products and services, but in forming reputations of hardware and software producers. 1.2 Why Use Dependability?
Dependability allows comparisons with cost and performance. Dependability covers one of three critical criteria on which decisions are made on what to purchase or use (see Fig. 1). When customers or users make such decisions, they ask three fundamental questions: What level of service can this system deliver to me, How much is the system, and How likely is the system to actually deliver its nominal level of service? The first question addresses performance, while the second question addresses cost. The concept of dependability addresses the third question. By doing so, it plays an important role in providing a platform that solidly addresses all three issues, thus allowing the user to make an effective multi-criteria decision. Dependability provides a proper focus for product-improvement eforts. Taking in a dependability point of view in the design process (and in manufacturing and operations planning as well) causes one to consider a broad range of possible influences on product quality. Without such a view, one may focus strongly on a specific area or method, which may result in actions that in fact hurt the overall situation. For example, a focus on improving component
178
DAVID I. HEIMANN eta!.
DEPENDABILITY
PERFORMANCE
COST
FIG. 1. Product selection criteria.
reliability or maintainability alone may miss the following possibilities: 0
0
0
The processor reliability is improved too much. Further improvement in the reliability of the given component beyond a certain point will not help the overall dependability.It is very important to recognize this point so as not to waste time and resources trying to improve subsystem reliability past this limit. Measures other than subsystem reliability improvement may provide better results. For example, consider a system that requires a system reboot after every failure. One may obtain improvement by increasing the processor reliability so that failures do not happen as often. However, it may turn out to be much more effective to change the design so that failures can be isolated from the rest of the system and do not require a total system reboot. Failures may not be the main problem. For example, service interruption may occur most often when the system is heavily loaded (Iyer et al., 1986). In this event, rather than trying to improve the processor reliability, it may be far more effective to perform load balancing.
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
179
Dependability can take into account safety and risk issues. Safety is an extremely important issue in many situations. This includes not only safety of people and equipment, but also safety of data and processes as well. Unsafe situations generally arise only through a combination of underlying events, often in a complex fashion. To deal with them adequately requires the kind of system-level approach that dependability analysis provides. In addition to safety, dependability can also assess other risks a user faces because of failure-induced non-performance. Risk is a very important part of product selection before purchase and product operation after purchase. Risk can be reduced by identifying the sources of significant and unsafe outages and taking steps to decrease their occurrence and their impact. Dependability analyses can identify the likelihood of potentially large impacts as well as the overall average risk, and also pinpoint the sources of significant risk.
1.3 Where is Dependability Used?
Dependability is used in all stages of the computer life cycle. In the requirements and planning stage, it provides a customer/user orientation in developing the overall requirements on the hardware, software, and information systems. In the specijication stage, these generalized assessments of user failure sensitivities are formulated into dependability specifications, from which reliability and maintainability specifications are developed. In this manner, the resulting specifications are focused properly on the users’ failure sensitivities. In the design stage, prospective system architectures and operating policies are evaluated with respect to the specifications, as well as with respect to an overall fault tolerance approach developed during the planning stage and refined during this stage. In the manufacturing stage, dependability provides an overall framework for quality control effort, so that these quality control efforts can be focused on those potential defects to which the users would be most sensitive. In the sales and deployment stage (which includes sales and sales support, product marketing/positioning, and systems planning and analysis), dependability makes for precise expectations on how well and in what form the product can deliver its intended usage. In the operations stage, dependability can be used to plan operator response to failures and other incidents, including the effective use of measures such as operator-requested shutdowns, load balancing, scheduling of preventive and non-urgent corrective maintenance. In the maintenance stage, the frequency of preventive maintenance can be compared against the improvement in user dependability, and its scheduling can be adjusted to minimize the impact on users.
180
DAVID I . HEIMANN et el.
2.
Measures of Dependability
2.1 Classes of Dependability Measures
Dependability measures fall into three basic classes: system availability, system reliability, and task completion. Each of these measure classes has its differences; which one is appropriate depends on the specific situation under investigation, the availability of relevant data, and the usage or customer profile. System Availability. System availabilitymeasures show the likelihood that the system is delivering adequate service or, equivalently, the proportion of potential service actually delivered. Commercial computer systems are designed to provide high system availability. Brief interruptions in system operation can be tolerated, but not significant aggregate annual outage. For such highly available systems, a measure of interest might be the probability that the system is up at a given time t, or the expected proportion of time that the system is up in a given time interval. System Reliability. System reliability measures show the length of time before a system failure occurs or, equivalently, the frequency of such failures. These measures apply to systems that are highly sensitiveto interruptions. For example, flight control systems are required to provide interruption-free service during the length of the flight. For these systems, the measure of interest is the probability that the system operates without failure during a specific time interval. Task Completion. Task completion measures show the likelihood that a particular user will receive adequate service or, equivalently,the proportion of users who receive adequate service. These measures fit best with situations in which specifictasks of importance can be identified,for example, for an on-line transaction processing system, in which a definite unit of service exists: the transaction. The measure “percent of transactions successfully completed” accurately describes the dependability situation.
2.2
Guidelines for a Choice of Measure
The choice of a proper measure is very important for an effective dependabilityanalysis. The key to a proper choice is the user tolerance pattern, i.e., how the system and its users react to failures and interruptions. The tolerance pattern can be depicted by a graph of the impact of an interruption or outage against its length as shown in Fig. 2. The graph illustrates the situations in which particular measures should be used.
t
dr
l-l
M
\
\
\, \
182
DAVID I. HEIMANN e t a / .
Note the following considerations in the user tolerance patterns: 0
0
0
There is a tolerance z, at or below which the users are unaffected by the interruption. The tolerance may be equal to zero, in which case all interruptions affect the users. The graph may be discontinuous at the tolerance z. A discontinuity implies that a system reliability measure (Curves 2 and 4)be used, while a lack of such discontinuity implies that a system availability measure (Curves 1 and 3) be used. The graph may have a positive slope after the tolerance z (vs. a slope of zero). Such a slope implies that a system availability measure be used. A slope of 1 implies that a basic availability be used, whereas other slopes imply that a weighted (or capacity-oriented) availability be used.
The tolerance graph can address task-completion measures as well as system availability and system reliability measures. In these cases the tolerance graph addresses the impact of the interruption on an individual task rather than on the computer system as a whole. 2.3 The Exponential Distribution
The exponential distribution plays an important role in dependability analysis. The exponential distribution function is given by F(t) =
1 -e-nr
io,
,
ifOst
Suppose we have an event type whose occurrence is governed by the following two rules:
1. The likelihood that the event occurs once during a given small interval of time is proportional to the length of the interval, i.e., Pr(event occurs during (t, t
+ h)) = Ah + o(h),
where 1is the constant of proportionality. 2. The likelihood that the event occurs twice or more during a given small interval of time is negligibly small, i.e., Pr(two or more events) = o(h), where o(h) denotes any quantity having an order of magnitude smaller than h, that is,
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
183
Then the time between successive occurrences of the event will be exponentially distributed with rate of occurrence 1. The mean time to the occurrence of the event is l/L. If successive inter-event times of a stochastic process are independent, identically distributed with an exponential distribution of rate 1,the number of occurrences of the event in an interval of length t has a Poisson distribution with mean At, i.e.,
Notice that the likelihood of the event occurring in the interval (t, t + h) does not depend on the value of t. This is called the memoryless or Markov property (Trivedi, 1982). The memoryless property states that the time we must wait for a new occurrence of the event is statistically independent of how long we have already spent waiting for it. In a reliability context, it implies that a component does not “age,” but instead only fails due to some randomlyappearing failure. In a performance context, it implies that the arrival of a new customer does not depend on how long it has been since the arrival of the previous customer. The memoryless property leads to considerable simplification in the analysis of the stochastic processes underlying dependability and performance problems. In many such problems the memoryless property in fact holds, and in many others it represents an excellent approximation. Methods of dealing with non-exponential distributions are beyond the scope of this paper but the interested reader is referred to Cinlar (1975), Cox (1955), and Trivedi (1982). If a random variable with distribution function F ( t ) is used to model the time to failure of a component (or system) then R(t)= 1 - F ( t ) is called the reliability function. Note that R(0) = 1 and R(m) = 0. Note also that if the time to failure is an exponentially-distributed random variable with rate parameter 1, then R(t) = e-“. Related to the reliability function is the hazard function h(t). This function represents the instantaneous rate at which failures occur. In other words, the probability is h(t)d t that a failure will take place in the time interval (t, t + dt), given that no failure has taken place before time t. Furthermore,
and
For the exponential reliability function h(t) = 1, i.e., the instantaneous failure rate is a constant in time (i.e., no infant mortality or aging takes place).
184
DAVID I. HEIMANN e t a / .
2.4
An Introductory Example
Consider a multiprocessor system with two processors. Each processor is subject to failure so that its MTTF (mean time to failure) is l/A. When a processor failure occurs, the system will automatically recover from the failure with probability c (this is called a covered failure), or with probability 1-c the system needs to be rebooted (this is called an uncovered failure). A covered failure is followed by a brief reconfiguration period, the average reconfiguration time being 1/6. An uncovered failure requires a reboot, which requires a longer time to take place, the average reboot time being l / f l ( l / f l > l / d ) . In either case, the affected processor needs to be repaired, with the Mean Time To Repair (MTTR) being 1/11. During the repair, the other processor continues to run and provides service normally. Should the other processor fail before the first one is repaired, however, the system becomes out of service until the repair is completed. If we assume that times to processor failures, processor repair, system reconfiguration and system reboot are independent exponentiallydistributed random variables and that there is only a single repair person, then the multiprocessor system can be modeled by the continuous time Markov chain shown in Fig. 3. Let & ( t ) be the probability that the system is in state i at the time instant t . Then, the followingdifferentialequations completely define the Markov chain of Fig. 3 (Trivedi, 1982):
-dP2(t) - -2AP,(t) + pP1(t) dt
-dP1c(t)- -6P,,(t) + 2AcP2(t) dt
-dP1u(t)- - j P J t ) dt
-dP1(t)- -(A dt
+ 2 4 1 - c)Pz(t)
+ p)P1(t)+ dPIC(t)+ jPIU(t)+ ,UP&)
h P U FIG.3. Markov model for a two-processor system.
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
185
with the given initial state probabilities, e(0).We assume that at time t = 0, the system is in State 2, that is, P2(0) = 1. Solving this system of coupled linear differential equations will provide the transient solution (P2(t),Pl,(t), Pl,(t), Pl(t),Po(t))of the Markov chain. Often, however, we are merely interested in the long-run or the steady-state probabilities n, = limt-m e(t).The steadystate balance equations are obtained by taking the limit of the system of differential equations above Trivedi (1982): P n2 = -n,
21
2Ac n1c
=7
2
2 4 1 - c) nlu =
B
n2
where Xini = 1. Data for Introductory Example. In the introductory example, we shall use the foliowing numerical parameter values:
Processor mean time to failure Mean time to repair Coverage Mean reconfiguration time Mean reboot time System size
(1 /A) (1/d
(4 (1/6) (1/p)
5,000 hours 4 hours 0.9 (90%)
30 seconds 10 minutes 2 processors
Note that the data describing the sample system and the results of the availability analysis, while based on engineering designs and observations of machine performance, are hypothetical. They should be used only for general observations about dependability and its modeling and analysis, not to draw specific conclusions about specific products. Note also that the models on which the results are based will continue to evolve as development work and validation proceeds.
186
DAVID I. HEIMANN e t a / .
Solving the steady-state equations for the Markov model of Fig. 3 with these parameters, we obtain n2 = 0.99839164, nlC= 0.00000300, nlu = 0.00000665, n1 = 0.00159743, and no = 0.00000128. 2.5
System Availability Measures
System availabilitymeasures are what traditionally have been referred to as “availability.” To temporarily oversimplify, system availability is the proportion of total time in which the system is in an operational condition. The measure can be expressed either as a percentage or probability or as the amount of system uptime (or downtime) per year. As mentioned before, availability measures are used for systems, such as telephone switching systems and database systems, that are usually operated continuously and for which short down times can be tolerated. 2.5.1 Basic Availability
The most straightforward form of system availability is basic availability. Basic availability follows the dotted curve shown in Fig. 2. The tolerance is zero, so that all outages count. The system is assumed to be either “up” or “down,” with no partial or intermediate states. Using the state description shown in Fig. 3, we consider the states labeled 2 and 1 as system “up” states and all other states as system “down” states. Three Forms Of System Availability. System availability measures can be expressed in one of three forms, as follows: The probability that the system is up at a time t, called instantaneous basic availability, is A(t) = &(t)
+ Pl(t).
If we assume that the system has reached steady state (i.e., time t + OD),we then have steady-state basic availability, which is A = 712
+ 711.
We also have the interval basic availability of the system, the proportion within a given interval of time that the system is up, by carrying out a time average value of instantaneous availability over the time interval, i.e.,
t j:
A(t) = -
A ( x )dx
:j:
=-
+
[ P ~ ( x ) P,(x)]dx.
Figure 4 displays the three availabilitiesA, A(t),and A(t)as functions of time t for our example system. Note that A(t) > A(t)(sinceA(t)is A(t)averaged over
/
I
I
I
I
3
188
DAVID I. HEIMANN e t a / .
the time interval and the latter is a decreasing function), and that both of these converge to the steady-state availability A. In the introductory example, the steady-state basic availability A is 0.99998907.This means that the system is up 99.998907%of the time (basic availability) and thus down 0.001093%of the time (basic unavailability). In the course of a year, or 525,600 minutes, the system can be expected to be out of operation an average of 5.74 minutes (basic downtime). Further analysis shows the basic downtime is composed of 0.67 minutes due to lack of required processors (loss of both processors), 1.57 minutes due to reconfigurations, and 3.5 minutes due to uncovered failures. 2.5.2 Tolerance (Nonreconfiguration) Availability Tolerance availability introduces a tolerance, along the lines of the solid curve of Fig. 2. In this case, all reconfigurations are assumed to result in brief outages that are below the tolerance values (and hence tolerable), while all reboots and all repairs when the system as a whole is down, are assumed to result in outages above the tolerance. According to the state description shown in Fig. 3, State l c is now considered an “up” state, in addition to States 2 and 1 while States lu and 0 are “down” states. Then, at steady state, Tolerance (Nonreconfiguration) Availability = n2
+ n,, + nl.
In the example, the system is either up or undergoing a tolerably brief outage, 99.999207% of the time (tolerance availability), and thus during 0.000793%of the time, the system is undergoing an intolerably long outage (tolerance unavailability). In the course of a year, or 525,600 minutes, the system can be expected to be out of operation an average of 4.17 minutes due to intolerably long outages (tolerance downtime). Further analysis shows that this downtime is composed of 0.67 minutes due to lack of required processors and 3.50 minutes due to uncovered failures. 2.5.3
Capacity-Oriented Availability
Capacity-oriented availability takes into account that in many situations the users are interested not as much in whether the entire system is up or down but rather in how much service the system is delivering. Capacity-oriented availability measures have curves similar to the first curve in Fig. 2 except that the slope, instead of being equal to one, is equal to the relative amount of lost service capacity. In the example, we assume that if both processors are up, the system is delivering full service, whereas if only one processor is up, the system is delivering only half service. If no processors are up, or if reconfigurations or
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
189
reboots are taking place, the system is assumed to be down and thus delivering zero service. According to the state description shown in Fig. 3, State 2 delivers full service, State 1 delivers half service, and States lc, lu, and 0 deliver zero service. Accordingly, the steady-state capacity-oriented availability is given by COA = n2
+ 0 . 5 =~ 0.99919036. ~
Thus, in the example 99.919036% of the 2* 525,600 processor-minutes potentially available over the course of a year are actually delivered (capacity-oriented aoailability). Equivalently, 0.080964% (capacity-oriented unavailability) of the 2* 525,600 processor-minutes, or 851 processor-minutes (capacity-oriented downtime), are not delivered. This downtime consists of 2*5.74 or approximately 11 processor-minutes of downtime due to system downtime per year plus 840 processor-minutes of downtime due to degraded capacity. 2.5.4
Tolerance (Nonreconfiguration) CapacityOriented Availability
Tolerance capacity-oriented availability measures takes both tolerance and capacity considerations into account. Except for the tolerance value below which outages are not counted, these measures are similar to capacity-oriented measures. They therefore have curves similar to the solid curve in Fig. 2 except that the slope, instead of being equal to one, is equal to the relative amount of lost service capacity. From the state description in Fig. 3, the tolerance (nonreconfiguration) capacity-oriented availability is given by TCOA = n2
+ n,, + 0 . 5 =~ 0.99919335. ~
In the example, the tolerance capacity-oriented availability of the system is 99.919335%. Equivalently, the tolerance capacity-oriented unavailability is 0.080665%, and the tolerance capacity-oriented downtime is 848 processorminutes per year. Note that the difference between this downtime and the 851 processorminutes for capacity-oriented downtime represents 1.57 minutes of reconfiguration downtime, or approximately 3 processor-minutes. The tolerance capacity-oriented downtime figure is only slightly lower than that for capacity-oriented downtime with reconfiguration losses taken into account. This is because most of the impact on capacity-oriented downtime comes from the degraded-capacity state (and the reboot state, to a lesser extent) rather than from reconfiguration losses.
190
DAVID I . HEIMANN e t a / .
2.5.5 Degraded-Capacity Time The degraded-capacity time is the annual amount of time that the system is functioning but operating at less than full capacity. In the example, out of 525,600 minutes in a year, the system spends approximately 6 minutes actually out of service, but 840 minutes (525,600 * nl)in a degraded-capacity mode due to the loss of one processor. For the remaining time (524,754 minutes), the system is expected to operate at full capacity. The degraded-capacity time of 840 minutes per year compares with 6 minutes per year spent actually out of service, so that the contribution to loss of service from degraded capacity far outweighs the contribution from actual system outages. 2.6 System Reliability Measures System reliability measures emphasize the occurrence of undesirable events in the system. These measures are useful for systems where no downtime can be tolerated, for example, flight control systems. System reliability can be expressed in a number of forms: Reliability Function. This represents the probability that an incident (of sufficient severity, if a tolerance threshold is in effect) has not yet occurred since the beginning of the current uptime epoch. It is denoted by the function R(t) = P(X > t), where X is the (random) time to the next failure and t is the length of the time period of interest. The system unreliability is simply 1 - R(t). In computing the system reliability R(t)for our example system, we consider three different criteria: Case I . Any processor failure is considered a system.failure. In this case we turn States lc and l u into absorbing states so that once the system enters those states, it is destined to stay there (see Fig. 5a). Then
R&) = Pz(t), where S(t)denotes the transient probability that the system is in State j at time t given that it started in State 2 at time 0. Case 2. Any uncovered processor failure or any failure that leads to exhaustion of all processors is considered to be a system failure. In this case, States l u and 0 are absorbing states (see Fig. 5b). R,(t) = P2(t) + PIC(t)+ PI(t).
Case 3. Only the failure of all processors is considered a system failure. In this case only State 0 is an absorbing state (see Fig. 5c). R3(d =
PAt)
+ Pl&) + Pl&) + PI@).
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
191
FIG.5. Failure criteria for system reliability analysis (* indicates absorbing states). (a) Any processor failure. (b) Any uncovered processor failure or loss of processors. (c) Loss of all processors.
For any of the three cases, the reliability R(t)is the probability that having started in State 2 at time 0 the system has not reached an absorbing state by time t. Likewise, the system unreliability 1 - R(t)is the probability that system has reached an absorbing failure state on or before time t. In Fig. 5a, 5b and 5c, respectively, we show the Markov models for each of the three criteria. Note the difference between the graphs of Fig. 5 and that of Fig. 3. All the graphs in Fig. 5 have absorbing states, while the one in Fig. 3 has no absorbing states. Naturally, the corresponding differential equations will also be different.
192
DAVID I. HEIMANN e t a / .
Mean Time To Failure (Incident). This is the average length of time that elapses until the occurrence of an incident, and is denoted by M T T F . It is given by MTTF
:j
=
R(t)dt.
Frequency of Incidents. This is the average number of occurrences of incidents per unit of time. In order to compute the frequency of a certain incident, we return to the Markov model shown in Fig. 3 and count the average number of visits to the state of interest during the interval of observation. For our sample system under the three criteria given above, the frequencies of incidents per year are therefore Fl = 8,760 * C(@n,,+ ( F2 = 8,760 *
m l u
+ (Conol,
[(mi"+ (P)noI9
F3 = 8,760 * (p)no.
We shall present a collection of system reliability measures, again based on the example system described previously. The first three cases show differing criteria as to what constitutes an incident. For each of these cases three kinds of values discussed above (reliability function, MTTF, and frequency of occurrence) are provided. The fourth case generalizes the first three in that a (frequency-of-incident) value is given for a whole range of tolerance values rather than just for a specific given value. The remaining frequency measures depict related aspects of system behavior. 2.6.7
Any Outage (Case 1 )
The likelihood that an outage occurs between time 0 and time t is 1 - R,(t), plotted in Fig. 6. The mean time to the first outage is 2,500 hours. The average frequency at which service is interrupted on the system is 3.5 times per year. Note that the tolerance pattern of this measure corresponds to the dashed curve of Fig. 2: a system reliability curve (i.e., a curve with a discontinuity at the tolerance level) with a zero tolerance value. 2.6.2 Over-Tolerance (Nonreconfiguration) Outages (Case 2)
The likelihood that an outage, other than a reconfiguration, occurs between time 0 and time t is 1 - R2(t),plotted in Fig. 6. The mean time to the first
Il
d r i
194
DAVID I. HEIMANN e t a /
outage of more than a reconfiguration is 24,857 hours. The average frequency at which service is interrupted on the system for more than a reconfiguration time interval is 0.35 times per year (once every 2.8 years). The tolerance pattern of this measure corresponds to the double solid curve of Fig. 2: a system reliability curve (a curve with a discontinuity at the tolerance level) with a nonzero tolerance value. 2.6.3 Outages Due To Lack Of Processors (Case 3)
The likelihood that all processors fail at some point between time 0 and time t is 1 - R3(t), plotted in Fig. 6. The mean time to the first occurrence of “both processors failed” condition is 3,132,531 hours. The average frequency at which service is interrupted on the system due to all processors having failed is 0.0028 times per year (once every 357 years). 2.6.4 Frequency and Duration Of System Outages
Next we consider the frequency of system outages exceeding a given outagelength tolerance t. It is given by F4(t)= 8,760 * [(6)n1ce-dr (P)n,,e-Br ( p ) ~ ~ e - ~ ~ ] ,
+
+
since the probability that a given reconfiguration interval is longer than T is Kdrand likewise for the reboot interval and the repair interval. This frequency for the sample system is shown in Fig. 7 as a function of t. Note that the relationship is nonlinear; the outage frequency changes significantly as the outage tolerance moves through values associated with reconfigurations (less than 0.01 hour) or system reboots (more than 0.1 hour), whereas the outage frequency does not change much for tolerance values intermediate between reconfigurations and system reboots. 2.6.5 Frequency of Degraded-Capacity Incidents
The frequency of degraded-capacity incidents is the average annual number of times that the system loses capacity but continues to operate at a reduced level. In the example, this frequency is 3.15. Note that these incidents represent those incidents of Section 2.6.1 not included in Section 2.6.2. In other words, the formula used for this frequency is F,
=
8,760 * 6 * nlC.
2.6.6 Frequency of Processor Repairs
Unlike the above measures, this measure does not reflect system reliability per se, since it does not necessarily show instances where the user is deprived of
g 0 3
i
i
/
,
0 0
.. 0
0
0 0 3
9 0
/
/
,/,’ c
196
DAVID I. HEIMANN e t a / .
system service. Rather, it shows the workload on the maintenance facility generated by incidents. The average rate of processor repairs per year is given by
F6 = 8,760 * p
* [nl + no].
The value for the example system is 3.5 times per year.
2.7 Task Completion Measures Task completion measures indicate the likelihood that a task (or job or customer) will be completed satisfactorily. Since the task is the fundamental unit by which work is carried out on a system, the likelihood of successful completion of a task gives a precise assessment of customer satisfaction. Task completion measures are thus very effective in situations where system usage can indeed be broken down into individual tasks, such as a transaction processing system, for example. Unlike system availability or system reliability measures, which only take into account the system itself, task completion measures also include the nature of the tasks themselves’ and their interaction with the system. The analysis therefore has two layers: the occurrence of and recovery from incidents, and the effects of these incidents on the tasks. The effects are functions of such aspects as the incident profiles, the length of time of the task, and the sensitivity of the task to interruptions. We shall present a collection of task completion measures (specifically, probability-of-end-user-interruption measures), again based on the example system described previously. The numerical values shown are for a task that needs 60 minutes (one hour) of “uninterrupted” execution time. Curves showing the values of these task completion measureS for other task execution times are provided in Fig. 8. 2.7.1
Task Interruption Probability Due To Any Interruption
The likelihood that a user requiring x units of uninterrupted system time finds the system initially available and suffers a service interruption during usage, whether due to the failure of the user’s own processor, system reconfiguration, uncovered failure or loss of required processors, is given by Task Interruption Probability = (1 - e-2Ax)n2+ (1 - e-”)nl, since the probability that the interruption occurs due to the first failure in the system is 1 - e-2Ax,provided that the task was executing with both processors up. Similarly, the probability that the interruption is due to a loss of required
10001
\
\
0 d
d
-\.
1001
-\,
'--
S
.........
101
11
10
Task Time (hours)
FIG.8. Odds against task interruption.
Id0
Uncovered failure or Loss of all processors Uncovered failure or Loss of own processor Any interruption
198
DAVID I. HEIMANN etal.
processors is 1 - e-”, provided that the task was executing with the system in State 1 (of Fig. 3). For x = 60 minutes, the task interruption probability is calculated to be 0.03997%, and the odds against interruption 2,501:1. 2.7.2
Task Interruption Probability Due To An Over-Tolerance (Nonreconfiguration) Interruption of the System or User’s Processor
The likelihood that a user requiring x units of “uninterrupted” system time finds the system initially available and suffers an interruption due to a failure of the user’s processor or due to a system uncovered failure or loss of required processors is given by Task Interruption Probability = (1 - e-(A(”-c)+’)x)7r2
+ (1 - e-Ax)nl.
The probability that the interruption occurs due to an uncovered processor X, that the task failure or the user’s processor failure is 1 - e - ( A ( l - c ) + A ) provided was executing with both processors up. Similarly, the probability that the interruption is due to a loss of required processors is 1 - eCdx in case the task was running with only one processor up. For x = 60 minutes, the task interruption probability is computed as 0.021997%, and the odds against interruption 4,545 :1. Note that reconfiguration interruptions do not count as an interruption. 2.7.3 Task Interruption Probability Due To An Over-Tolerance (Nonreconfiguration) Interruption of the System
The likelihood that a user requiring x units of “uninterrupted” system time finds the system initially available and suffers an interruption due to uncovered system failures or loss of required processors is given by Task Interruption Probability = (1 - e-21(1-c)x)nz + (1 - e-”)n,. The probability that the interruption occurs due to an uncovered processor failure is given by 1 - e-2’((’-c)x ,provided the system is in State 2 (of Fig. 3). For x = 60 minutes, the task interruption probability is computed to be 0.004026%, and the odds against interruption 24,841: 1. Note that in this situation, the user can switch to another processor in case of a covered failure, so that a covered failure of the user’s own processor does not count as an interruption. Note also that a reconfiguration does not count as an interruption. 2.8
Summary of Measures
The various dependability measures are summarized in Table I.
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
199
TABLEI DEPENDABILITY MEASURES Table Ia System Availability Measures Measures
Values in sample system 99.998907% 0.001093% 5.74 minutes/year 99.999207% 0.000793% 4.17 minutes/year 99.9 19036% 0.080964% 85 1 processor-minutes/year 99.919335% 0.080665% 848 processor-minutes/year 840 minutes/year
Basic availability Unavailability Downtime Tolerance (nonreconfiguration) availability Unavailability Downtime Capacity-oriented availability Unavailability Downtime Tolerance (nonreconfiguration) capacity-oriented availability Unavailability Downtime Degraded-capacity time Table Ib System Reliability Measures Measures Any outage Over-tolerance (nonreconfiguration) System outages Lack of required processors Frequency and duration of system outages Frequency of degraded-capacity incidents Frequency of processor repairs
Values in sample system (MTTF = 2,500 hrs.) 3.5/year (MTTF = 24,857 hrs.) 0.35/year (MTTF = 3,132,531 hrs.) 0.0028/year See Fig. 7 3.15/year 3.5/year
Table Ic Task Completion Measures Measures (user requiring 60 minutes) Against any interruption: Task interruption probability Odds against interruption Against an over-tolerance (nonreconfiguration) interruption of system or of user’s processor Task interruption probability Odds against interruption Against an over-tolerance (nonreconfiguration) interruption of system: Task interruption probability Odds against interruption
Values in sample system
0.04% 2,500: 1
0.022% 4,500: 1
0.004% 25,000: 1
200
DAVID I. HEIMANN e t a / .
3.
Types of Dependability Analyses
To fully analyze a candidate computer system, there are four types of dependability analyses: evaluation, sensitivity analysis, specification determination, and tradeoff analysis. Evaluation (i.e., “What is?”) is the basic dependability analysis. It investigates a specific computer system, either as designed or as it actually exists. Input data is collected as to nominal performance, component reliability, maintainability, failure recovery, etc. The analyst then evaluates the dependability of the system as described by the design specifications or the existing conditions. Sensitioity analysis (i.e., “What if?”) takes place after a system has been evaluated. One may naturally wish to determine how the analysis results would change if one or more of the input parameters change (for example, what if the component reliability improves?). One can then conduct several analysis runs with differing values for a given input parameter, and examine the changes in the dependability measure of interest. This applies particularly well to situations where some doubt exists as to the proper values of a certain input parameter, or where results are required for a range of values for a parameter. This type of procedure is called sensitivity analysis, because it measures the sensitivity of dependability to changes in the input parameters. It is also possible to compute the partial derivative of the measure of interest with respect to a chosen parameter in the quest for sensitivity analysis (Blake et al., 1988). Specijication determination (i.e., “How to?”) determines the values of given input parameters required to achieve a given level of dependability. These values then become specifications for the indicated parameters. Specification determination is therefore the reverse of sensitivity analysis, in that while sensitivity analysis takes given values of input parameters and determines the impact of these values on dependability, specification determination takes a given value of dependability and determines its impact on the specification for an input parameter. Tradeof analysis (i.e., “How best?”) investigates trading off of a change in one input parameter for a change in a second parameter, leaving overall dependability unaffected. For example, if in order to save costs the designer reduces the redundancy of a subsystem by one unit, by how much would the component reliability in that subsystem have to improve in order to preserve the overall dependability? The main distinction between tradeoff analyses and sensitivity analysis is that the former investigates the interaction between two input parameters (holding dependability constant) while the latter investigates the interaction between an input parameter and a dependability
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
201
measure. Tradeoff analyses allow a designer to have a design depend less on weaker or more expensive areas and more on stronger and/or more costeffective ones. The relationship among these four types of analyses is shown in the conceptual graph given in Fig. 9. Component reliability is shown on the horizontal axis and maintainability (or redundancy) on the vertical axis. Within the graph are curves of equal dependability, i.e., all points on a given curve have the same dependability. The dependability represented by each curve increases as one moves upward in the direction of the dashed arrow. Point A represents an evaluation for a given level of component reliability and maintainability (or redundancy). Points B , and B, represent sensitivity analyses from point A , with B, showing the effect of increasing maintainability (or redundancy) and B , showing the effect of increasing component reliability. Point C represents a specification determination for component reliability, with the dependability requirement shown by the second curve line from the top and the component reliability being increased from point A to point C until dependability meets the requirement. Points D, and D, represent tradeoff analyses (with both points remaining on the same dependability curve as A ) , with D,showing an exchange of lower component reliability for greater maintainability or redundancy and D, showing the reverse. Sample Types Of Analyses. To illustrate the four types of analyses, we have carried them out on the sample system defined previously. The analysis consists of an evaluation on the original data, sensitivity analyses and specification determinations on two of the system parameters, namely processor reliability and repair time, and a tradeoff analysis of processor reliability vs. failure coverage. The measure used for dependability is the mean downtime per year, a basic availability measure. The results are summarized in Table 11. 4.
The Modeling of Dependability
Generally when one carries out a dependability analysis of a computer system, the system is represented by a mathematical model. It is certainly possible to evaluate the dependability of a system by observing and measuring actual system behavior under either normal or controlled conditions, then estimating various measures of dependability using statistical techniques (Bard and Schatzoff, 1978; Trivedi, 1982). However, a measurement-based evaluation is sometimes impossible or prohibitively expensive. For instance, the system under consideration may not yet be available for obtaining measurements, either not at all or not for the intended application. Additionally, the required measurement data, especially frequency-of-failure data in
Maintainability
Redundancy
Component reliability
ANALYSES REPRESENTED IN GRAPH
A
B1,Bz
EVALUATION SENSITIVIW ANALYSES
C SPECIFICATION DETERMINATION Di.Dz TRADEOFF ANALYSES
FIG.9. Types of system dependability analyses.
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
203
TABLEI1 TYPS
OF
DEPENDABILITY ANALYSES
Evaluation: System as originally specified (Processor MTTF = 5,000, MTTR
= 4, c =
0.9)
Downtime = 5.7 min/yr
Sensitivity Analysis: Processor reliability Repair time
MTTF = 10,000 hours MTTF = 2,500 hours MTTR = 2 hours MTTR = 8 hours
Downtime = 2.7 min/yr Downtime = 12.8 min/yr Downtime = 5.2 min/yr Downtime = 7.7 min/yr
Specification Determination: Specification is 5 min/yr of downtime MTTF = 5,670 hours Cannot meet specification (5.08 min/yr when MTTR = 0)
Processor reliability Repair time Tradeoff Analysis:
Processor Reliability vs. coverage Downtime remains at 5.7 min/yr Processor reliability increases so that MTTF = 10,OOO hours Processor reliability decreases so that MTTF = 3,500 hours
Coverage may decrease to c = 0.76 Coverage must increase to c = 0.975
high reliability situations or data on the effects of infrequently-occurring failure modes, may require unfeasible levels of time and effort to obtain in sufficient amounts to yield statistically significant estimates (Geist and Trivedi, 1983). Therefore, a model-based evaluation, or in some cases a hybrid approach based on a judicious combination of models and measurements, is used for cost-effective dependability analysis. Two broad categories of mathematical models exist: simulation and analytic. In Monte Carlo simulation models, an input stream of simulated events, such as failures, recoveries, and repairs, is produced using random variates from the appropriate distributions, and the impact of these events on the system is evaluated. In analytic models, equations describing the underlying structure of the system are derived and solved. Simulation models are frequently more straightforward than analytic ones, and usually do not have as many of the simplifying assumptions that analytic models require for tractability. However, we must carry out repeated
204
DAVID I. HEIMANN e t a / .
replications, each with a different randomly-generated input stream, until enough replications have been made to obtain statistically significant results. In the case of large models that result from reasonably complex systems, this can become prohibitively expensive or evcn computationally unfeasible. In addition, in dependability analysis the numerical values for failure rates and repair/recovery rates are usually vastly different, with failure rates being much lower than repair/recovery rates. This makes dependability models stiff and hence even more difficult to simulate. Methods of speeding up the simulation of stiff systems are being studied (Conway and Goyal, 1987). Nevertheless, whenever a reasonable analytical model exists or can be developed, it should be used over a simulative approach. Analytic models include combinatorial and Markov models. Combinatorial models, in turn, include reliability block diagrams, fault trees, and reliability graphs. These models are parsimonious in describing system behavior. Hence, they are relatively easy to specify and solve. Combinatorial models, however, generally require that system components behave in a stochastically independent manner. Dependencies of many different kinds exist in real systems (Goyal et al., 1987; Veeraraghavan and Trivedi, 1987). For this reason combinatorial models turn out not to be entirely satisfactory in and of themselves. A Markov model is represented by a graph (or, equivalently,by a matrix of transition rates) in which the nodes are the possible states the system can assume and the arcs depict the transitions the system can make from one state to another (Figs. 3,5a, 5b and 5c are examples of such graphs). The model is then solved to obtain the probabilities that the system will assume various states. Unlike combinatorial models, Markov models can include different kinds of dependencies. However, for most practical systems, a satisfactory Markov model could easily have tens of thousands of states. The construction and solution of such large Markov models pose a challenge. Two principal approaches exist to deal with this potential largeness of the Markov state space. In the approach we call largeness avoidance, we find a way to avoid generating and solving a large Markov model. Largeness avoidance commonly uses hierarchies of models and often (but not always) implies an approximate rather than an exact solution to the original modeling problem (Ibe, Howe, and Trivedi, 1989; Sahner and Trivedi, 1987; Veeraraghavan and Trivedi, 1987). State truncation (Boyd et al., 1988; Goyal et al., 1986; Ciardo et al., 1989), fixed-point iterative (Ciardo and Trivedi, 1990) and other approximation techniques (Blake and Trivedi, 1989) that avoid the generation and solution of large state spaces also belong here. The alternative approach to largeness avoidance is to use largeness tolerance. In this approach, we accept the fact that a large Markov model needs to be generated and solved. However, we automate the generation and
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
205
solution of the large Markov model. This can be done in several ways: 1. A special-purpose program can be written to generate the states and transition rates of the Markov model (Heimann, 1989b). 2. A more concise stochastic Petri net (SPN) model of the problem can be specified, and subsequently an SPN package can be used to automatically generate the Markov model (Ciardo et al., 1989; Ibe, Trivedi, et al., 1989). 3. Modeling languages specially tailored to availability modeling (e.g., SAVE (Goyal et al., 1986) or reliability modeling (e.g., HARP (Bavuso et al., 1987; Dugan et al., 1986)) can be used to automatically generate the underlying Markov chain state space.
Whether the Markov model is directly specified by the modeler or has been automatically generated by a program, the need to use sparse-matrix storage techniques and sparsity-preserving efficient numerical solution methods is evident. In the rest of this section, the discussion shall be based on a Markov model with a largeness tolerance approach. For further information on measurement techniques see Bard and Schatzoff (1978), Iyer et al. (1986), and Siewiorek and Swarz (1982). For references on combinatorial methods see Sahner and Trivedi (1987) and Shooman (1968), for simulation see Conway and Goyal (1987), and for hierarchical combinatorial and Markov methods see Blake and Trivedi (1989), Ibe, Howe and Trivedi (1989), Sahner and Trivedi (1987), and Veeraraghavan and Trivedi (1987). This section addresses three areas: model solution, parameter determination, and model validation. 4.1
Model Solution Techniques
As mentioned above, a Markov model is described by a graph, called a state transition rate diagram, such as the one shown in Fig. 3. The graph is represented by a state space S and a matrix of transition rates Q = [ q i j ] ,where qij is the rate of transition from State i to State j ( j # i ) where i , j E S, and where the value of the diagonal element qii is equal to - C j qij(so that the rows of Q sum to zero) (Cinlar, 1975; Trivedi, 1982). For our example problem (Fig. 3), for instance, we have ~
Q=
-2A 0 0 P
0
2Ac 2 4 1 - c) -6 0 0 -P O 0 0 0
0 6
P
0 0 0
-(A + P)
A
P
-P
state 2 state l c state lu state 1 state 0
206
DAVID I. HEIMANN eta/.
The rows are identified to show that the first row covers transitions from State 2, the second row from State lc, etc. Similarly, the first column covers transitions to State 2, the second column to State lc, etc. The solution of the Markov model to obtain steady-state availability, instantaneous availability, interval availability, system reliability or task completion measures is discussed below. Steady-State Availability. Let n, be the steady-state probability that the Markov chain is in State i. Let n be the row vector of these probabilities. Then the linear system of equations zQ=O,
Eni= 1 i
will provide the required probabilities. If we assume that every state in the Markov model can be reached from every other state (that is, the Markov chain is irreducible) and the number of states is finite, then the above system has a unique solution n =(xi) independent of the initial state (Trivedi, 1982).To obtain basic availability, we partition the state space S into the set of system U P states and the set of system DOWN states. Then the basic availability is given by A = E i c u p n i . Thus, the steady-state analysis of a Markov model involves the solution of a linear system of equations with as many equations as the number of states in the Markov chain. The number of states can thus be rather large. However, the connection graph of the Markov chain, and therefore the transition rate matrix, is sparse, and this can be exploited in solving and storing large Markov models. In carrying out this solution, iterative methods such as Gauss-Seidel or Successive Overrelaxation (SOR) are preferable to direct methods such as Gaussian elimination (Goyal et al., 1987; Stewart and Goyal, 1985). The iteration for SOR is n k + l = w[nk+'U
+ nkLID-1 + (1 - w)d,
(2)
where n k f l is the solution vector at the kth iteration, L is a lower triangular matrix, U is an upper triangular matrix, and D is a diagonal matrix such that Q = D - L - U. For w = 1, the solution given by Equation (2) reduces to the Gauss-Seidel method. The choice of w is discussed in Stewart and Goyal (1 985). To obtain the more general dependability measures, we make use of Markov reward models (Blake et al., 1988; Howard, 1971; Smith et al., 1988). In such a model, we assign a reward rate ri to State i of the Markov chain. For basic availability (Section 2.5.1),the reward rate 1 is assigned to all operational states (i.e., states in UP) and a reward rate 0 is assigned to all system failure states (i.e., states in DOWN). Note that by reversing the reward
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
207
assignments so that states in UP get reward rate 0 and states in DOWN get reward rate 1, we obtain bas;,: unavailability. For nonreconfiguration availability (section 2.5.2), we set the reward assignment to 1 not only for all states in UP, but also for all states in DOWN.that represent the computer system undergoing a reconfiguration. For capacity-oriented measures, the reward assignment of a state in UP is the system capacity level in the state (possibly normalized so that nominal capacity is l), while the reward rate of a state in DOWN is 0. The measures of interest above are thus a weighted sum x i r i n i of state probabilities, with the reward rates ri as weights. Algorithms for the steadystate solution of Markov and Markov reward (as well as semi-Markov reward) models have been built into SHARPE (Sahner and Trivedi, 1987), SAVE (Goyal et al., 1986) and SPNP (Ciardo et al., 1989) packages. Instantaneous Availability. The above discussion addresses a steady-state solution, i.e., the probabilities ni are independent of the time elapsed since the start of system operation. However, this is not always sufficient. For example, high-dependability systems with preventive maintenance will not often be in steady-state. In this case, we need to carry out a transient analysis. Let P(t) be the row vector consisting of pi(t), the probability that the Markov chain is in State i at time t given that the initial probability vector is P(0).Then P(t) can be obtained by solving the following coupled system of linear first-order, ordinary differential equations (Trivedi, 1982): dP dt
-=
P(t)Q.
(3)
The solution method commonly used for such a system of differential equations is uniforrnization (or randomization) (Reibman and Trivedi, 1988). Uniformization first applies the transformation Q* = Q/q + I, where q = maxiIqiil.The solution is then
where O(0) = P(0)and O(k)= O(k - l)Q*. For computational purposes, the series needs to be truncated. The number of terms to be used in the series can be determined based on a given truncation error bound (Reibman and Trivedi, 1988). Other solution methods for transient analysis of Markov models are discussed in detail elsewhere (Reibman and Trivedi, 1988). Many transient measures can be obtained as weighted sums of transientstate probabilities, pi(t), with the weights being the reward rates. In other words, the desired measure of interest will be the expected reward rate at time t, rie(t). This expression can be clearly specialized to the instantaneous availability A ( t )by assigning a reward rate 1 to all up states and a reward rate 0
xi
208
DAVID I . HEIMANN e t a / .
to all down states. Algorithms for the transient solution of Markov and Markov reward (as well as semi-Markov reward) models have been built into SHARPE (Sahner and Trivedi, 1987), SAVE (Goyal et al., 1986) and SPNP (Ciardo et al., 1989) packages. Interval Availability. Many measures of interest are cumulative in nature (e.g., interval availability, downtime in a given interval of observation or the downtime between two preventive maintenance events). For computing the expected values of cumulative measures, integrals of state probabilities over the interval 0 to t are required. Thus if we let L,(t)= dx, be the average time spent by the Markov chain in state i during the interval (0,t) then riLi(t). expected accumulated reward in the interval is obtained as Measures like the expected downtime or the expected total work done in a finite interval of operation can be computed using this approach. A special case of this measure that we have already discussed in Section 2.5 is the interval availability A(t)(= l / t x i e U PLi(t)),where the accumulated uptime is divided by the elapsed time t. The vector L(t)= (L,(t))satisfies the equation
&e(x)
ci
dL dt
- = L(t)Q
+ P(O),
L(0)= 0.
(4)
For a discussion of the methods of solving this equation and hence computing expected cumulative measures, see Reibman and Trivedi (1989).Such transient analysis of cumulative measures can be done using SHARPE, SAVE or SPNP. The next level of measure complexity is related to the distribution of availability and other cumulative measures. Algorithms for such computations are known (Smith et al., 1988) but will not be discussed here. System Reliability. If all system down states are made absorbing states, then the sum of state probabilities of all the U P states will yield the system reliability, R(t), at time t. For instance, in our example problem, the matrix corresponding the reliability in Case 3 in Section 2.6 (Fig. 5c) is given by
Q=(
-21 0 0 P
21c 2 4 1 - c) -6 0 0 -B O 0
0
B
state 2 state lc state l u '
Note the difference between the two matrices Q and Q, in that the latter omits the last row and column of the former. This represents the fact that for the system reliability measure represented by Q State 0 is considered to be a system failure and thus an absorbing state. Solving the differential equation
dP dt
- = P(t)O
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
209
cisup
and summing over the state probabilities, P,(t), will yield system reliability at time t. A basic form of system reliability measure is the mean time to system failure (MTTF).Such measures can be obtained by solving a linear system of equations much like the case of steady-state probabilities: tQ =
-F(O),
where t = (ti)is a vector of times before absorption, ti,i E UP, is the average time spent in State i before absorption (note that ti can be assumed to be zero for i E DOWN), and P(0) is the partition of P(0) corresponding to the U P states only. After solving for the row vector t, the system MTTF is obtained by (Goyal et al., 1987) MTTF =
czi. i
(7)
The methods of solving the above linear system of equations are similar to those used for solving for steady-state probabilities (recall Equation (1)) (Goyal et al., 1987. Stewart and Goyal, 1985). SAVE, SHARPE and SPNP facilitate the computation of MTTF. Task Completion Measures. So far, we have discussed system-oriented dependability models. Suppose we consider a task that requires x amount of uninterrupted CPU time. Further suppose that when the task arrives the system is in State i and the rate of interruption as seen by the task is yi. Then the task interruption probability is given by ci(l- e - Y i x ) x i . More generally, assume that a task requires x amount of time to execute in the absence of failures and let T ( x ) be the task execution time with failure interruptions accounted for. The execution requirement, x, of the task can be either deterministic or random. It is of interest to compute the expected value, E [ T ( x ) ] or , the distribution, P ( T ( x )< t),of the task completion time. Models for task completion time can be built as either Markov (or semi-Markov) models. Such Markov models can be generated by hand, using Kronecker algebra techniques (Bobbio and Trivedi, 1990) or by using generalized stochastic Petri nets (Ciardo et al., 1989). If more accurate modeling of task performance including the effect of work loss and checkpointing is desired, then transform-based techniques need to be used (Chimento, 1988). For references on this topic, the reader may consult Chimento (1988), Kulkarni et al. (1987), Nicola et al. (1987), and Kulkarni et al. (to appear). 4.2
Parameter Determination
In order to solve and use dependability models, one must consider the underlying input parameters. These parameters group into four categories: failure rates (A), failure coverage probabilities (c), repair rates (p),and system performance levels (or reward rates) (r).
210
DAVID I. HEIMANN e t a / .
4.2.1 Failure Rates (What is A?) A number of issues arise in describing and determining the occurrence of component failures in a computer system. Foremost among these are the fmlt/error/failure distinction, the source of failures, the type of failures, the age dependency of failures, the distribution of inter-failure times, and the process by which failure rates are estimated. We address each of these in turn. Faults us. errors us. failures. To properly evaluate failure rates, one must distinguish among faults, errors, and failures. A fault is an improper condition in a hardware or software module which may lead to a failure of the module (Nelson and Carroll, 1987).An error is a manifestation of a fault leading to an incorrect response from a hardware or software module. A failure is a malfunction of a system or module such that it can no longer operate correctly. While failures result from errors, which in turn result from faults, it is not necessarily the case that a fault will lead to a failure. In fact, to have faults not lead to failures is the objective of fault-tolerant computing design. High system dependability may thereby be obtained not only by reducing the rate at which faults occur, but also be preventing faults that do occur from propagating into failures. Source offailures. Failures can arise from a number of different sources in the computer system. They can be hardware, software, or operator induced, and can arise from the processors, the storage units or storage controllers, power supply, or system or application software. There is much interaction among these sources, so that it is often difficult to pinpoint the actual source. For example, a software failure may look like an operator-induced one if it causes the operator to have to shut down the system in order to reload the code, or a hardware failure may look like a software one if it changes a parameter value to be outside the range the software is designed to handle. Note that permanent hardware failures generally form only a small minority of the total failures. Types of failures. Failures can be one of three types: permanent, intermittent, or transient. Permanent (also called hard or solid) failures are those that occur due to a fault in the system and require a repair action to restore system operation. Intermittent (also called soft) failures are those that occur due to a fault in the system but do not require a repair action to restore system operation, but rather a reboot or other system restart. Intermittent failures frequently, though not always, can be precursors to an eventual permanent failure, in that an underlying fault may initially have only a mild impact on the system, but then have an increasing impact as it gets worse, eventually resulting in a permanent failure. Note that intermittent failures generally occur far more often that permanent failures, sometimes by an order of magnitude. Transient failures are those that occur not due to a fault in the
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
21 1
system, but due to outside causes such as cosmic rays or alpha particles. Transient failures, like intermittent failures, generally do not require a repair action to restore system operation. These types were developed with hardware failures in mind. Software failures are nominally permanent, in that some code fault is causing the failure. However, as software becomes very complex, the faults can become extremely subtle, and the resulting failures look more and more like intermittent ones. Intermittent software failures have been termed “Heisenbugs” (Gray, 1986). Age dependency of failures. The rate at which failures occur in a component generally depends on how far along it is in its life span. Hardware components will show a decreasing failure rate in early life due to the realization of “infant mortality failures.” During midlife the failure rate will be approximately constant, and in later life (particularly for mechanical components), the failure rate will increase due to wearout characteristics. Software components will generally show a decreasing failure rate as bugs are found and removed (similar to hardware infant mortality). Distribution of inter-failure times. For simplicity and often with justification (especially in the midlife section of the failure process), times to failure are often assumed to be exponentially distributed. This is a powerful assumption which allows many analytical techniques to be applied to the dependability evaluation. Often, however, the modeler is interested in more general distributions of times to failure. In some cases, extensions of the exponential assumption can be used. For example, the use of nonhomogeneous Markov models (Bavuso et al., 1987) (a special case of such a process is the nonhomogeneous Poisson process, or NHPP) allows failure times to have a Weibull distribution. Semi-Markov models (Cinlar, 1975; Ciardo et al., to appear) and phase-type expansions (Cox, 1955, Cinlar, 1975; Hsueh et al., 1988. Sahner and Trivedi, 1987) can also be utilized to capture non-exponential distributions. For each fault type in the fault model of each component, the nature of these distributions must be specified. Estimation of component failure rates. A crucial question while computing system dependability is how to obtain accurate estimates of component failure rates. A relatively straightforward way to obtain these may be to use vendor data and the parts count method, facilitated by reliability databases and analysis tools. This method may have drawbacks, though, because an exhaustive listing of all parts within, for example, a processor or a storage device may be unwieldy and, furthermore, the database may be incomplete and/or untrustworthy. However, in the early stages of the design cycle (design stage), this may nonetheless be the only applicable approach. A second method of estimating component failure rates is from field measurement data. Such operational field failure data are likely to be a much
212
DAVID I. HEIMANN e t a / .
more reliable source than a database with vendor-supplied part failure rates. Trading off against this is that the expense and time of collecting enough data is quite high. An important way to use these data more efficiently is to think of the individual component failure rates At(that is, the failure rate for component i, where a component is a basic unit of the computer system such as a processor, storage unit, storage controller, power supply, or communications link) as functions of at least three kinds of parameters; a, e, u, i.e.,
li= h(a,e, u; 81,
(8)
where 0
0 0 0
a is a vector of architectural (or system configuration) variables (e.g., the number and types of processor nodes, the number of disks and disk controllers, etc.), e is a vector of environment variables (e.g., temperature), u is a vector of usage variables (e.g., banking, education, transaction processing, military, etc.), and 8 is a vector of coefficients for the above parameters.
After hypothesizing a functional form of fi based on the parameter set a, e, and u, we use statistical techniques such as regression analysis, Bayesian techniques, or maximum likelihood estimation to determine the coefficients 8 and use the resulting equation to determine the component failure rates for the dependability model. In some sense this is analogous to the approach used in MIL-HDBK-217C (US. Department of Defense, 1980) but tailored to the problem at hand. Failure rates can also vary with the load on various system resources. Since load is a function of time, so will be the failure rates. For the sake of simplicity, we have assumed for the analyses described in this paper that failure rates are not dependent on load. For information on the load dependence of failure rates, see Iyer et al. (1986). 4.2.2
Coverage Probabilities (What is c?)
In a system-level analysis of dependability, it becomes very important to know how well the system as a whole can operate when one of its subsystems fails. If the system can continue operations, either without ill effect or with an acceptable degradation of operations, the failure is said to be covered. If, however, the failure causes the whole system to become unoperational, the failure is said to be uncovered.Clearly, dependability-enhancing efforts such as redundancy or checkpointing will only function if subsystem failures are
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
213
covered. The coverage probability c is the conditional probability that a system successfully recovers, given that a fault has occurred. It has been known for some time that a small change in the value of a coverage probability can make a rather large change in model results (Dugan and Trivedi, 1989). It is, therefore, extremely important to estimate various coverage parameters accurately. Three different ways of estimating coverage can be identified: Modeling. This approach involves decomposing the fault/error-handling behavior into its constituent phases (e.g., detection, retry, isolation, reconfiguration, etc.) and using various Markov, semiMarkov and stochastic Petri net models for computing the overall coverage (Dugan and Trivedi, 1989).This approach is useful during the design phase. 2. FaultlError-Injection Experiments. If the system is ready for experimentation, fault/errors can be injected and the response can be recorded. From this measurement data coverage can be estimated (Arlat et al., 1989). This approach is appropriate at design verification time. 3. Field-Measurement Data. Based on the data collected from a system in operation, coverage can be estimated. Clearly, this is the most expensive approach among the three. Nevertheless, collection and analysis of measurement data is to be highly encouraged to enhance our understanding of dependability. 1. Structural
4.2.3
Repair Rates (What is p?)
Two types of broad repair categories need to be specified: Corrective or unscheduled maintenance and preventive or scheduled maintenance. For each type of detected error, a different type of corrective repair action needs to be specified. Various parameters of interest here are the time to reboot a processor, time to reboot a system, reconfiguration time, and so on. These data could come from design documents or error logs. One also needs to determine whether the system is considered (fully or partially) up or down during each of these intervals. Other data, such as the field service travel time and actual repair time, may come from field service organizations.
4.2.4
Reward Rates (What is r?)
In a basic availability model, we classify states as either up or down. However, this binary classification of states often needs to be expanded for many applications, such as multiple and distributed processing systems with
214
DAVID I . HEIMANN et a/.
many different performance levels. A simple extension of Markov models allows a weight, worth or reward rate assignment to each state. The reward rate may be based on a resource capacity in the given state (such as the number of up processors) or, in more sophisticated analyses, on the performance of the system in that state. After making reward assignments, sometimes it is desirable to scale the reward rates so that the value assigned to the fully operational states is 1 and the values assigned to degraded configurations are less than 1. Other times scaling may not be appropriate, such as when two systems with a different number of processor nodes are compared. Note that in some cases, such as for unavailability or for the probability of end-user interruption, for examples, the “reward is actually a penalty (i.e., a value of 1 represents a failure), though for consistency it is nonetheless called a reward rate. In Table 111, we summarize the reward assignments that yield some of the measures in Sections 2.5-4.1. The states of the system as depicted in the Markov chain are partitioned into U P and DOWN states (i.e., U P is the set of all operational states and DOWN is the set of all system failure states). DOWN states are similarly further partitioned into RECON, UDO W N and PDO W N states, where RECON indicates the system is undergoing a reconfiguration, UDOWN indicates the system is down due to an uncovered failure, and PDO W N indicates the system is down due to loss of processors. The number of processors in the system is denoted by N . Let Ci denote the system capacity in State i, and let C, denote the capacity when all processors are up (note: one freqiiently used example of system capacity is the number of up processors). Clearly, Ci for a DOWN state will be zero. Table IIYa describes the reward structure for the system availability measures. Note that most of the measures are steady-state-based and thus use the steady-state probabilities x i . However, the instantaneous and interval availability measures instead use the instantaneous (time-dependent) and interval-based quantities pi(t) and Li(t)/t,respectively. Note also that the down-time measures are expressed in minutes/year (or processorminutes/year for the capacity-oriented downtime), and are based on a total of 60*24*365 = 525,600 minutes/year. Table IIIb describes the reward structure for the system reliability measures. The set ABS represents the absorbing states in the underlying Markov chain, i.e., the set of system failure states from which no recovery is permitted. The value pi@) for i 4 ABS represents the probability that the system has not yet q(t)represents failed and is currently in State i, so that the summation the probability that the system has not yet failed, i.e., the reliability function R(t). Note also that for i # ABS the value zi represents the mean time before failure that the system spends in State i, so that the summation x i C A B S z i represents the system mean time to failure, i.e., M T T F .
xiCABs
TABLE 111 REWARD-BASED FORMULAS FORDEPENDABILITY MEASURE Reward rate ( r i )
Measure
Formula
Table Ma System Availability Measures Basic availability Basic unavailability Basic downtime Basic instantaneous availability Basic interval availability Tolerance availability Tolerance unavailability Tolerance downtime Capacity-oriented availability Capacity-oriented unavailability Capacity-oriented downtime
1 if 0 if 0 if 1 if 1 if 1 if 0 if 0 if
i E UP, else 0 i E UP, else 1 i E UP, else 525,600 i E UP, else 0 i E UP, else 0 i E U P u RECON, else 0 i E U P u RECON, else 1 i E U P u RECON, else 525,600
CJC, if i E UP, else 0 1 - [C,/C,] if i E UP, else 1
Tolerance capacity-oriented availability Tolerance capacity-oriented unavailability Tolerance capacity-oriented downtime Degraded-capacity time
525,600 * C, * ( I - [ C i / C , ] )if i E UP, else 525,600 * C, Ci/CN,if i E U P u RECON, else 0 1 - CJC,, if i E U P u RECON, else 1 525,600 * CN * (1 - [CijCN]) if i E U P u RECON, else 525,600 * C, 525,600 if i E U P and Ci # C,, else 0
Table IlIb System Reliability Measures Due to lack of processors (ABS = PDOWN) Reliability ( R ( I ) ) System MTTF
1, if i # ABS, else 0 I, if i # ABS. else 0
Ci r,?(t) = Xi,
Frequency
525,600'p. if i E PDOWN
CFi%
ABS
?(t)
ziriri =~ i , A B S T i
Due to over-tolerance outages (ABS = UDOWN u P D O W N ) Reliability ( R ( t ) ) System MTTF
1, if i E ABS. else 0 1, if i # ABS, else 0
L r i ? ( t ) = CirABs ?(I)
Frequency
525,600*8, if i E UDOWN 525,600.p. if i E PDOWN
ciri%
liriTi
Due to any outage (ABS = DOWN = RECON u UDOWN u PDOWN)
I, if i # ABS. else 0 1, if i 9 ABS, else 0 525.600.6, if i E RECON 525,600'8, if i E UDOWN 525,600*p, if i E PDOWN
Reliability (R(r)) System MTTF Frequency
Frequency of degraded-capacity incidents
x,jc if (i
ABs~,c,=c,,qij
E
UP, Ci # C,), else 0
Table IlIc Task Completion Measures Probability of end-user interruption 1 - e-y'x,i
E
UP, else 0
xirixi
=Ci(ABSTi
DAVID I. HEIMANN ef a / .
216
In computing the frequency of lack-of-processors events, for example, the rate of occurrence is the repair rate p (in repairs/minute) provided the system has experienced this condition, and the mean time spent in this condition is 525,600~i,,D0,, niminutes per year. The other frequencies in the table are similarly derived (note that the occurrence rate from an uncovered failure event is p and the occurrence rate from a reconfiguration event is 6). Note also that the occurrence rate from a degraded-capacity state i to either an absorbing state or a full-capacity state is ‘&jeABSorC,=CN) qij. Table IIIc describes the reward structure for the task-completion measures. Note that x denotes the uninterrupted processing time required by the task under consideration. The reward rate ri is (1 - e O i X )assuming that system operating in State i and where yi is the cumulative rate at which all the interrupting events occur. More generally, reward rates can be based on actual system performance. States of the Markov model represent the configuration of up resources of the system (for example, see Fig. 3). For that complement of resources and the given workload, we calculate system performance using an analytical model, a simulation model or actual measurement results (Lavenberg, 1983). Transitions of the Markov model represent failure/repair of components and system reconfiguration/reboot. The Markov reward model is then solved for various combined measures of performance and availability using the techniques described in Meyer (1980, 1982) and Reibman et al. (1989) or using the tools such as SHARPE (Sahner and Trivedi, 1986; Veeraraghavan and Trivedi, 1987) or SPNP (Ciardo et al., 1989).
4.3.
Model Validation and Verification
Model verification and validation are the processes by which one determines how well a model fits with the underlying situation it aims to represent. Model verification is concerned with the correctness of the implementation of the conceptual model, while model validation ascertains that a model is an acceptable representation of the real world system under study. A model can be verified, at least in principle, by using program-proving techniques. More commonly, however, structured programming techniques and extensive testing are used in order to minimize implementation errors. The testing is often aided by simple cases for which there might be closed-form answers or by the existence of an alternative model that applies in some cases and which has been previously verified and validated. Reasonableness checks on the results can also help in testing.
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
217
A three-step validation process has been formulated by Naylor and Finger [1967]: 1 . Face Vulidution This involves a dialogue between the modeler and the people who are acknowledgeable about the system in producing a model that is mutually agreeable. This is an iterative process of stepwise refinement and requires a constant contact with people who are well versed with the innards of the system being modeled. 2. Input-Output Validation The data obtained from the real system are used as input to the model and the output data from the model are compared with the observed results of the real system. Clearly, many different data sets should be used in order to gain confidence in the model. The process is quite expensive and time consuming, yet extremely important. 3. Validation of Model Assumptions The third step in the validation process is validating model assumptions. Here, all the assumptions going into the model are explicitly identified and then tested for accuracy. Validation of the assumptions can be carried out either by face validity (checking the assumptions with experts), logical inference (proving the assumption correct), or statistical testing. In addition to checking the validity of the assumptions, one should also check their robustness (or sensitivity), i.e., for each assumption, how likely are the model results to change significantly if the assumption is not quite correct? Such an analysis, while often difficult, has the potential of identifying the assumptions that need a careful examination.
Model verification and validation check out the following types of assumptions on which models are often based: Logical. Are the states and state transitions of the model close to the behavior of the system being modeled? If there are missing or extra states or missing or extra transitions, the error in the results of the model can be rather drastic. Although formal proof techniques have been proposed, the most effective way of ascertaining that the model behaves correctly from the logical point of view appears to be a very good understanding of the system on the part of the modeler and face validation. Distributional. We need to verify whether all the distributional assumptions made in the model hold. Sometimes, we can show that a farm of a certain distribution does not have an effect on the results of the model. Such insensitivity results, although desirable, do not generally hold. In the common case, we need to statistically test a hypothesis regarding each distributional assumption and prepare to modify the model in case the hypothesis is rejected based on measurement data.
218
DAVID I. HEIMANN e t a / .
Independence. Most stochastic models (Markov models included)assume that some events are independent of some other events. We need to statistically test the hypotheses of such assumptions. In case the hypothesis is rejected, we should be prepared to modify the model. Approximation. Several types of approximations, e.g., state truncation (Boyd et al., 1988) and decomposition (Bobbio and Trivedi, 1986), are commonly used. We need to provide good estimates of the approximation error (Muntz et al., 1989) or to provide tight bounds on the error (Li and Silvester, 1984).
Numerical. Since a model is eventually solved numerically, truncation and round-off errors are encountered. An attempt should be made to minimize and/or estimate these errors. 5.
A Full-System Example
5.1 System Description
To demonstrate the preceding techniques on an example based on actual systems, we increase the complexity of the example system. The new example (which is representative of actual systems in use at Digital and elsewhere) contains four processors, three of which are required for system operation. Processors are subject to two types of failures; “permanent” failures, which require a physical repair (taking a matter of hours) to the processor in order for a recovery to take place, and “intermittent” failures, which require only a reboot (taking a matter of minutes) of the failed processor. Failures can be either “covered” or “uncovered.” In covered failures, the system reconfigures itself to exclude the failed processor (in a matter of seconds), then continues to function as long as at least three processors remain (when a failed processor recovers, another configuration takes place to include it once again in the system). In uncovered failures, the system cannot reconfigure successfullyand thus fails as a whole. In this case a complete system reboot (taking a matter of minutes) is necessary for the system to recover. The system failure and recovery data are as follows: Processor MTTF for permanent failures Processor MTTF for intermittent failures Processor MTTR for permanent failures Mean processor reboot time Mean system reboot time Mean reconfiguration time Coverage (permanent failures)
5,000 hours 1,OOO hours 4 hours 6 minutes 10 minutes 30seconds 90%
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
Coverage (intermittent failures) System size Minimum required size (3)
219
90% 4 processors 3 processors
5.2 Dependability Analysis We shall carry out a dependability analysis of the system described above. The analysis consists of an evaluation on the original data, sensitivity analyses and specification determination based on each of five system parameters, and a tradeoff analysis of processor reliability vs. processor reboot time. The measure to be used for dependability is basic availability, displayed in terms of mean downtime per year. The analysis is carried out using the model described in Heimann (1989b) and using the SHARPE package. 5.2.1 Evaluation
The system as specified has a mean downtime of 87 minutes per year. The mean downtime consists of 5 minutes per year due to too few processors up to carry out the customer’s function (i.e., lack of required processors), 40 minutes per year due to reconfiguration, and 42 minutes per year due to uncovered failures. We shall use the notation “87 min/yr (5 + 40 + 42)” to summarize this information. This implies that efforts to improve dependability should concentrate on reconfigurations and uncovered failures, as against meeting required processors (for example, this would suggest not adding redundancy by using extra processors, whose positive effect in meeting required processors would be more than offset by the negative effect of inducing more reconfigurations and uncovered failures). 5.2.2 Sensitivity Analysis
We investigate the sensitivity of dependability to five parameters: intermittent failure rate (while keeping the permanent-failure processor MTTF constant), permanent-and-intermittent failure rates (while keeping their ratio constant), mean repair time, mean reconfiguration time, and mean processorand-system reboot times (while keeping their ratio constant). Processor Intermittent MTTF. Changing the rate of intermittent failure does cause a significant change in dependability. Increasing the processor , OO to 2,500 hours reduces the downtime from intermittent MTTF from 1O 87 min/yr to 45 min/yr (4 + 20 + 21), while decreasing it from 1,OOO to 500 hours increases the downtime to 155 min/yr (5 + 73 + 77). The reconfiguration and uncovered failure components are affected by the changes
220
DAVID I. HEIMANN e t a / .
about equally, while the lack-of-required-processors component is virtually unaffected. Processor MTTF. Changing the processor MTTF's for permanent failures and intermittent failures by the same factor also causes a significant change in dependability. Increasing the permanent-failure MTTF to 10,000 hours (with a corresponding change of the intermittent-failure MTTF to 2,000 hours) improves the downtime from 87 min/yr to 42 min/yr (1 + 20+21), while decreasing the former MTTF to 2,500 hours and the latter MTTF to 500 hours degrades the downtime to 182 min/yr (18 80 84). Because both the permanent and the intermittent failure rates change, rather than just the intermittent rate alone, the dependability impact is greater. All three components of dependability are affected by the changes, with the lack-of-required-processors component showing a very strong sensitivity. Note that if only the permanent failure rates are changed, leaving the intermittent rates constant, the sensitivity is far less. Improving the permanent MTTF to 10,000 hours improves the downtime to 76 min/yr (1 + 37 + 38), while degrading the permanent MTTF to 2,500 hours degrades the downtime to 113 min/yr (17 + 47 + 49). In both cases, the lack-of-required-processors time changes significantly, but the other two components do not change very much. Permanent failures thus mainly influence the lack-of-requiredprocessors downtime, while intermittent failures mainly influence the reconfiguration and uncovered-failure downtimes. Mean Repair Time. Changing the mean repair time does not affect dependability very much. Decreasing the repair time from 4 hours to 2 hours improves the downtime from 87 min/yr to 83 min/yr (1 + 40 + 42), while increasing it degrades the downtime to 92 min/yr (10 + 40 + 42). The impact shows up in the lack-of-required-processors component, which actually is very sensitive to repair time. The overall impact is low because the lack-of-requiredprocessors component comprises only a small portion of the overall measure, and repair time does not affect the other two components at all. Note the value of disaggregating the output measure into its components; the overall lack of sensitivity of dependability to repair time masks a very high sensitivity on the specific component of downtime directly affected. Mean Reconjguration Time. Changing the reconfiguration time does affect dependability, but not to the same extent as changing the processor reliability values. Decreasing the reconfiguration time from 30 seconds to 15 seconds improves the dependability from 87 min/yr to 67 min/yr (5 + 20 42), while increasing it to 60 seconds degrades the dependability to 127 min/yr (5 + 80 + 42). The change affects only the reconfiguration component of downtime, which may explain the lower overall impact. Mean Reboot Time. Changing the processor and system reboot times (keeping their ratio constant) also affects dependability, but to a lesser extent
+ +
+
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
221
than changing processor reliability values. Decreasing the processor-reboot time from 6 minutes to 3 minutes improves the dependability from 87 min/yr to 65 min/yr (4 + 40 + 21), while increasing the time to 12 minutes degrades the dependability to 129 min/yr (5 -t- 40 + 84). The change affects only the uncovered-failure component of dependability, which explains the lower overall impact. Coverage. Changing the coverage also affects dependability to a moderate extent. Increasing the coverage from 90% to 95%improves the dependability from 87 min/yr to 67 min/yr (5 + 41 + 21), while decreasing the coverage to 80% degrades the dependability to 127 min/yr (5 + 38 + 84). This impact is just about the same as that for a similar change in the reboot time. 5.2.3 Specification Determination
Suppose we need a dependability of 99.99%,or 53 min/yr downtime. Since the system as evaluated has an downtime of 87 min/yr, some parameter specifications need to be improved to meet this requirement. We determine, for each of the five system parameter$ in turn (and assuming the other four remain constant), the necessary specification on that parameter for the system to satisfy the overall dependability requirement. Processor Intermittent M TTF. To meet requirements, the processor MTTF for intermittent failures must be 2,000 hours instead of the current 1000. Processor MTTF. To meet requirements, the processor MTTF for permanent failures needs to be improved to 8,000 hours instead of 5,000, while the processor MTTF for intermittent failures needs tb be improved to 1,600 hours instead of 1,OOO. Note that because both permanent and intermittent failure rates change, the magnitude of change necessary for each is less than for the intermittent failure rate alone as shown above (1.6:1 instead of 2:l). However, if only the permanent failure rate changes, with the intermittent rate remaining fixed, then the requirements cannot be met by an improved (permanent-failure) MTTF. Even with a very high permanent-failure MTTF (such as 1,000,000 hours), the dependability is 68 min/yr (0 + 33 + 35). Mean Repair Time. The requirements cannot be met by improving repair time. Even if the MTTR were reduced to zero, downtime would still be 82 min/yr, well above the requirement. This is so because, as seen above, repair time affects only the lack-of-required-processors dependability component, which represents only a small portion of overall downtime. Mean Reconfiguration Time. To meet requirements, the mean reconfiguration time must be 5 seconds instead of the current 30 seconds. This means a significant change is necessary in order to meet the dependability
222
DAVID I. HEIMANN e t a / .
requirements by means of reconfiguration time (largely because changing reconfiguration times only affects one component of downtime: reconfiguration downtime). Mean Reboot Time. To meet requirements, the mean processor reboot time must be 1.2 minutes instead of the current 6 minutes (and the system reboot time must be 2 minutes instead of the current 10 minutes). As with reconfigurations, a significant change is necessary in order to meet the dependability requirements by means of reboot times (again, largely because changing reboot times only affects one component of downtime: uncoveredfailure downtime). Coverage. To meet requirements, the coverage must be 98.4% instead of the current 90%. In a similar manner as reboot time, a significant change in the coverage (the lack-of-coverage must decrease by a factor of six) is necessary in order to meet the dependability requirements because only one component of downtime is affected by the change. 5.2.4
Tradeoff Analysis
Even if the system as currently specified does not meet the dependability requirements, the parameter values given may not be the best way to satisfy the requirements. For instance, we may be able to easily improve the processor reboot times from 6 minutes to 3 (and similarly for system reboots), whereas the specified processor MTTF values may be difficult to achieve. Conversely, 6-minutes processor reboot times may be difficult to achieve (and similarly for system reboots), while 12-minute reboot times may be achieved quite easily and compensatory improved processor reliabilities may be easy to come by. In either of these cases, it would be beneficial to know the extent to which reboot times can be “traded off against individual processor reliability, while keeping overall system dependability constant. Note that in the following analyses the processor MTTFs are changed in such a way as to keep constant the ratio between the permanent and intermittent failure rates, so that a change in the permanent failure rate also means a proportional change in the intermittent failure rate. Decreased Reboot Times us. Decreased Processor Reliability. Suppose the mean processor reboot time improves from 6 minutes to 3 minutes, with the mean system reboot time similarly improving from 10 minutes to 5 minutes. The system dependability can then be maintained with a permanent-failure processor MTTF of 3,800 hours (instead of 5,000hours) and an intermittent, OO hours). Since the failure processor MTTF of 760 hours (instead of 1O downtime disaggregation changes from (5 + 40 + 42) to (7 + 53 + 27), this
223
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
has been achieved by improving uncovered-failure down-time at the expense of lack-of-required-processors and reconfiguration down-time. If low reboot times are easier to achieve than high processor reliabilities, or if the customer is more sensitive to uncovered-failure outage than reconfiguration (or lack-ofrequired-processors) outages, this tradeoff would be worthwhile. Increased Reboot Times us. Increased Processor Reliability. Suppose the mean processor reboot time degraded from 6 minutes to 12 minutes, with the mean system reboot time similarly degrading from 10 minutes to 20 minutes. The system dependability can nonetheless be maintained with a permanentfailure processor MTFF of 7,300 hours (instead of 5,000 hours) and an intermittent-failure processor MTTF of 1,460 hours (instead of 1,000 hours). Since the downtime disaggregation changes from (5 + 40 + 42) to (2 + 28 57), this has been achieved by improving lack-of-required-processors and reconfiguration downtime at the expense of uncovered-failure downtime. If high processor reliabilities are easier to achieve than low reboot times, or if the customer is more sensitive to reconfiguration (or lack-of-required-processors) outages than uncovered-failure outages, this tradeoff would be very much worthwhile. The dependability analysis is summarized in Table IV.
+
5.2.5 Remark We have seen in the sensitivity analyses that overall dependability is highly sensitive to some parameters, moderately sensitive to others, and not very sensitive to still others. This sensitivity is influenced by the specific measure used for dependability. Table V compares qualitatively the sensitivity of Basic Availability/Mean Downtime (a system availability measure given in Section 2.5.1) with Frequency and Duration of System Outage (F4(z)) (Heimann, 1989a). Compared to the Mean Downtime measure, the Frequency-of-TotalSystem-Outage measure is more sensitive to a change in the coverage or a degradation in the reconfiguration time, but less sensitive to an improvement in the reconfiguration time. This is because a changed likelihood of an uncovered failure, or an increased likelihood of a lengthy reconfiguration, strongly influences the likelihood that an outage will exceed a tolerance value (of on the order of a few minutes), whereas above-tolerance reconfigurations are already unlikely in the base case, so that an improved reconfiguration time will not help matters significantly. This comparison thus highlights the importance of choosing the proper dependability measure for the particular application under consideration.
TABLE IV
ANALYSIS RESULTS SUMMARY OF DEPENDABILITY Table IVa Evaluation Dependability (min/yr of downtime) 87 (5 + 40+42) (i.e., uptime 99.9835%)
System as originally specified Table IVb Sensitivity Analysis
Dependability (rnin/yr of downtime) Intermittent processor MTTF Processor MTTF
Mean repair time Mean reconfiguration time Mean reboot time
Coverage
MTTF, = 2,500 hr MTTF, = 500 hr MTTF, = 10,OOO hr (MTTF, = 2,000 hr) (MTTF, unchanged) MTTF, = 2,500 hr (MTTF, = 500 hr) (MTTF, unchanged) 2 hours 6 hours 2 seconds 15 seconds 60 seconds Processor = 3 rnin System = 5 rnin Processor = 12 min System = 20 rnin 95% 80%
45 ( 4 + 2 0 + 21) 155 (5 + 73 + 77) 42 (1 + 20+ 21) 76 (1 + 37+ 38) 182 113 83 92 49 67 127
(18 + 80 + 84) (17 + 47 + 49) (1 +40+42) (10 + 40 + 42) (4 + 3 + 42) ( 5 + 20+ 42) ( 5 + 80 + 42)
65 ( 4 + 4 0 + 21) 129 (5 +40+ 84) 67 ( 5 + 4 1 +21) 127 (5 + 38 + 84)
Table IVc Specification Determination Dependability (rnin/yr of downtime) Dependability requirement is 53 min/yr (i.e., 99.99% uptime) Intermittent processor MTTF Processor MTTF
Specification: Ratio = 2.5 (MTTF, = 5,000 hr, MTTF, = 2,000 hr) Specification: MTTF = 8000 hr (MTTF, = 8,000hr, MTTF, = 1,600 hr)
Cannot meet specification when MTTF, unchanged Mean repair time
Cannot meet specification
Mean reconfiguration time Mean reboot time
Specifcation: Mean time = 5 sec Specification: Mean reboot time = 1.2 min (processor), 2 min (system) Specifcation: Coverage = 98.4%
Coverage
224
53 (4 + 24+ 25) 53 ( 2 + 25 +26)
68 ( 0 + 33 + 35) when MTTF,, = co 82 (0+40+42) when MTTR = 0 53 ( 4 + 7 +42) 53 ( 4 + 4 0 + 9 ) 53 (5 +41 + 17)
225
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
Table IVd Tradeoff Analysis Dependability (min/yr of downtime) Reboot time vs. processor MTTF Dependability remains at 87 min/yr (i.e., 99.9835%) uptime Permanent MTTF may decrease to 3,800 hours (intermittent to 760 hr) Permanent MTTF must increase to 7,300 hours (intermittent to 1,460 hr)
teboot time decreases to 3 min teboot time increases to 12 min
+ 27) ( 2 + 28 + 57)
87 (7 + 53 87
TABLEV QUALITATIVE COMPARISON OF SENSlTIVlTlEs Sensitivity of:
Change of Parameter@)
Mean downtime
Frequency of system outages
Processor intermittent MTTF = 2,500 hr. Processor intermittent MTTF = 500 hr. Processor permanent MTTF = 10,000 hr, Processor intermittent MTTF = 2,000 hr Processor permanent MTTF = 10,000 hr, Processor intermittent MTTF unchanged Processor permanent MTTF = 2,500 hr, Processor intermittent MTTF = 500 hr Processor permanent MTTF = 2,500 hr, Processor intermittent MTTF unchanged Mean repair time = 2 hr Mean repair time = 6 hr Mean reconfiguration time = 15 sec Mean reconfiguration time = 60 sec Mean reboot time = 3 rnin Mean reboot time = 12 rnin Coverage = 95% Coverage = 80%
High High
High High
High
High
Low
Low
High
High
Low Low Low Moderate Moderate Moderate Moderate Moderate Moderate
Low Low Low Low High Moderate Moderate High High
5.3
Evaluations Using Other Measures
The full-system example can also be evaluated using the entire collection of measures described in Section 2. Results using the various measures are shown in Table VI.
226
DAVID I. HEIMANN e t a / .
TABLE VI DEPENDABILITY MEASURES (FULL-SYSTEM EXAMPLE) Table VIa System Availability Measures Measures
Values
Basic availability Unavailability Downtime Tolerance (nonreconfiguration) availability Unavailability Downtime Capacity-oriented availability Unavailability Downtime Tolerance (nonreconfiguration) capacity-oriented availability Unavailability Downtime Degraded-capacity time
0.999835 (99.9835%) 0.000165 ( 0.0165%) 87 minutes/year 0.999912 (99.9912%) O.ooOo88 ( 0.0088%) 46 minutes/year 0.998952 (99.8952%) 0.001048 ( 0.1048%) 2,204 processor-minutes/year 0.999078 (99.9078%) 0.000972 ( 0.0972%) 2,044 processor-minutes/year 1,858 minutes/year
Table VIb System Reliability Measures Measures
Values
Frequency of outages Frequency of over-tolerance (nonreconfiguration) system outages Frequency of lack of required processors Frequency and duration of system outages Frequency of degraded-capacity incidents Frequency of processor repairs ~~
42/year 4.3/year 0.1l/year See Fig. 10 38.0/year 7lyear
~~~
Table VIc Task Completion Measures Measures (user requiring 60 minutes) Against an over-tolerance (nonreconfiguration) interruption of system: Task interruption probability Odds against interruption Against an over-tolerance (nonreconfiguration) interruption of system or of user’s processor Task interruption probability Odds against interruption Against any interruption: Task interruption probability Odds against interruption
Values
0.00049 (0.049%) 2037: 1
0.00157 (0.157%) 638:l 0.00584 (0.584%) 170:1
0
9 0 3 0
9
4 0
0
,/"'
/"
/
/'
/'
?
228
DAVID I. HEIMANN et a/.
Note the following:
1. The difference between basic availability and capacity-oriented availability is quite large. The total lost processing capacity of 2204 processorminutes is more than six times the lost processing capacity due only to system outages (87*4, or 348 processor-minutes). The difference is accounted for (with minor discrepancies due to rounding) by the degraded-capacity time of 1858 minutes per year. From a system-capacity point of view, the loss due to total system outage accounts for only a small part of the total failure-related loss of service. The loss due to partial system outage represents a by far greater contribution to the total. 2. The effect of not counting the brief reconfiguration outages varies greatly with the measure used, and is thus very application-dependent. For capacity-oriented availability (a measure fitting many office applications), the relative difference is small (2204 vs. 2044 processor-minutes/year), since most of the lost capacity is accounted for by the degraded operation state, which is not influenced at all by whether or not reconfigurations are counted. For basic availability, the relative difference is large (87 vs. 46 min/yr), since, as shown in Table IV, reconfiguration time accounts for a significant percentage of the total downtime. For system reliability (a measure fitting applications such as flight control), the difference is overwhelming (42 vs. 4.3 incidents per year), since most outage events are indeed due to reconfigurations. 3. Since three processors are required out of the four available, there is a redundancy of one processor. One might expect that greater redundancies would yield greater dependability. However, this is not the case, as the following results show (using basic downtime as the dependability measure): TABLE VII
REDUNDANCYANALYSIS 3 Required Processors Downtime (min/yr) System size 3-processors 4-processors 5-processors 6-processors
Required processors 3 of 3 of 3 of 3 of
3 4 5 6
Loss of processors
Reconfiguration
Uncovered failure
System total
1,397 5 0 0
30 40 50 60
32 42 53 63
1,459 81 103 123
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
229
Even though adding extra processors improves the loss-of-processors outage, it does exact a countervailing penalty. The more processors in the system, the more failures can occur, and the more failures can occur, the more downtime results from reconfigurations and uncovered failures. After the first redundant processor, this penalty outweighs the improvement in loss-of-processor outage. A detailed discussion of this effect is given in Trivedi et al. (1990).
6.
Conclusions
The concept of system dependability is being considered with increasing interest as a component of computer system effectiveness, and as a criterion that customers use for product selection decisions. Dependability measures the ability of a product to deliver its intended level of service to the user, especially in light of failures or other incidents that impinge on its performance. It combines various underlying ideas, such as reliability, maintainability, availability, and user demand patterns, into a basic overall measure of quality which customers use along with cost and performance to evaluate products. We have defined three classes of dependability measures: System availability measures show the proportion of potential service actually delivered; system reliability measures show the length of time before a service-interrupting failure occurs; task completion measures show the likelihood that a particular user will receive adequate service. Which of these measure classes is appropriate depends on the specific application under investigation, the availability of relevant data, and the usage or customer profile. For example, an office word-processing system is best evaluated by a system availability measure, while a flight-control system is best evaluated by a system reliability measure, and an on-line transaction processing system is best evaluated by a task completion measure. We have identified four types of dependability analyses: evaluation, sensitivity analysis, specification determination and trade08 analysis. Markov models, commonly used to analyze dependability, are described and their solution methods are briefly discussed. Problems in model parameterization and model validation are described. We have carried out a detailed dependability analysis on an example system, in order to illustrate the techniques. System dependability modeling has evolved significantly in recent years. Progress has been made along the lines of clear definitions (IEV191, 1987), model construction techniques (Dugan et al., 1986; Geist and Trivedi, 1983; Goyal et al., 1986) and solution techniques (Reibman et al., 1989).
230
DAVID I. HEIMANN eta/.
The topics where further progress is needed include: 0
0
0
0
0
0
Further develop the various alternative measures of dependability, as well as how to choose the proper measure for a given application. The aim is a “front-end” technique to routinely analyze a configuration and available data, and from this to select the proper measure and analysis technique. Integrate system dependability and system performance to yield an overall measure to assess the service delivered by the system. Quite often these two issues are closely intertwined, as when subsystem failures cause degradation in throughput or when increased subsystem response time causes a “timeout” failure condition at the system level. Include software reliability and availability within system dependability (Laprie, 1986). Software is becoming increasingly important, both in terms of its percentage of total system cost and development time and in terms of its percentage of potential system incidents. Software reliability models are not discussed here because of space limitations; interested readers may see Littlewood (1985) and Musa et al. (1987). Develop techniques to address model largeness. As the systems being analyzed become larger and more complex, and as performance and software considerations are included, the underlying state space can quickly become very large. Further work in largeness avoidance and largeness tolerance is needed. Develop techniques to address model stiffness.The equations to be solved in dependability modeling are stiff, particularly when dependability is combined with performance. This is because of the considerable difference in magnitude between failure rates, recovery/repair rates, and arrival/service rates. Techniques for solving stiff equations need to be further developed and applied to dependability analysis. Incorporate model calibration and validation into the modeling process. Techniques to identify and collect the necessary data, develop the appropriate experimentaldesigns and statistical analyses, and to evaluate the results need to be developed. In addition, the interaction between measurement techniques (including model calibration and validation) and the model formulation and solution process needs to be encouraged.
ACKNOWLEDGMENTS
This paper is based on projects sponsored by Digital’s VAXclusterTechnicalOfficeunder the direction of Ed Balkovich, who has also provided significant direct input into those projects. In addition,this paper has benefited by the comments and suggestions of Michael Elbert, Rick Howe, Oliver Ibe, John Kitchin, Archana Sathaye, and Anne Wein.
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
231
REFERENCES Arlat, J., Crouzet, Y., and Laprie, J. C. (1989). Fault Injection for Dependability Validation of Fault-Tolerant Computing Systems. Nineteenth Int. Symp. Fault-Tolerant Computing, Chicago, pp. 348-355. Bard, Y., and Schatzoff, M. (1978). Statistical Methods in Computer Performance Analysis. In “Current Trends in Programming Methodology, Vol. 111: Software Modeling” (K. M. Chandy and R. T. Yeh, eds.), pp. 1-51. Prentice-Hall, Englewood Cliffs, New Jersey. Bavuso, Salvatore J., Dugan, Joanne Bechta, Trivedi, Kishor S., Rothmann, Elizabeth M., and Smith, W. Earl (1987). Analysis of Typical Fault-Tolerant Architectures Using HARP. IEEE Trans. Reliability R-36(2), 176-185. Blake, J., and Trivedi, K. (1989). Reliability of Interconnection Networks Using Hierarchical Composition. IEEE Trans. Reliability 38 (I), 1 1 1-120. Blake, J., Reibman, A., and Trivedi, K. (1988). Sensitivity Analysis of Reliability and Performability for Multiprocessor Systems. Proc. 1988 A C M S I G M E T R I C S Conf., Santa Fe, New Mexico. pp. 177- 186. Bobbio, A., and Trivedi, K. (1986).An Aggregation Technique for the Transient Analysis of Stiff Markov Chains. IEEE Trans. Computers C-35(9), 803-814. Bobbio, A., and Trivedi, K. (1990). Computation of the Distribution of the Completion Time When the Work Requirement is a PH Random Variable. Stochastic Models 6 (1). Boyd, M. A,, Veeraraghavan, M., Dugan, J. Bechta, and Trivedi, K. S. (1988). An Approach to Solving Large Reliability Models. 1988 IEEEiAIAA DASC Symp., San Diego. Chimento, P. F. (1988). System Performance in a Failure Prone Environment, Ph.D. thesis, Department of Computer Science, Duke University, Durham, North Carolina. Ciardo, G., and Trivedi, K. S. (1990). Solution of Large GSPN Models, Proc. First Int. Workshop on Numerical Solution of Markou Chains. Raleigh, NC. Ciardo, G., Muppala, J., and Trivedi, K. (1989). SPNP Stochastic Petri Net Package, Proc. Third Int. Workshop Petri Nets and Performance Mo-dels PNPM89, 142- 151, Kyoto, Japan. Ciardo, G., Marie, R., Sericola, B., and Trivedi, K. S. (1990).Performability Analysis Using SemiMarkov Reward Processes. IEEE Trans. Computers. Cinlar, E. (1975). “Introduction to Stochastic Processes.’’ Prentice-Hall, Englewood Cliffs, New Jersey. Conway, A. W., and Goyal, A. (1987). Monte Carlo Simulation of Computer System Availability/Reliability Models. Proc. Seventeenth Int. Symp. Fault-Tolerant Computing, pp. 230-235. Cox, D. R. (1955). A Use of Complex Probab es in the Theory of Stochastic Processes. Proc. Camb. Phil. SOC.51, 313-319. Dugan, J. B., and Trivedi, K. (1989). Coverage Modeling for Dependability Analysis of FaultTolerant Systems. IEEE Trans. Computers C-38(6), 775-787. Dugan, J. B., Trivedi, K., Smotherman, M., and Geist, R. (1986). The Hybrid Automated Reliability Predictor. A I A A J. Guid., Control, and Dynamics 9 (3), 319-331. Geist, R., and Trivedi, K. S. (1983). Ultra-High Reliability Prediction for Fault-Tolerant Computer Systems. IEEE Trans. Computers 32 (12), 1 1 18-1127. Goyal, A,, Lavenberg, S. S., and Trivedi, K. S. (1987).Probabilistic Modeling of Computer System Availability. Annals of Operations Research 8, 285-306. Goyal, A,, Carter, W. C., de Souza e Silva, E., Lavenberg, S. S., and Trivedi, K. S. (1986). The System Availability Estimator. Proc. Sixteenth Int. Symp. Fault-Tolerant Computing, pp. 84-89. Gray, J. (1986). Why Do Computers Stop and What Can Be Done About It? Proc. F i f h Symp. Reliability in Distributed SoBware and Database Systems, pp. 3- 12. Heimann, D. (1989a). VAXcluster-System Availability-Measurements and Analysis. Technical Report, DEC, March, 1989.
232
DAVID 1. HEIMANN eta/.
Heimann, D. (1989b). A Markov Model for VAXcluster System Availability. IEEE Trans. Reliability, submitted. Howard, R. A. (1971). “Dynamic Probabilistic Systems, Vol. 11: Semi-Markov and Decision Processes.” John Wiley and Sons, New York. Hsueh, M. C., Iyer, R., and Trivedi, K. (1988). Performability Modeling Based on Real Data: A Case Study. IEEE Trans. Computers C37 (4), 478-484. Ibe, O., Howe, R., and Trivedi, K. (1989). Approximate Availability Analysis of VAXcluster Systems. IEEE Trans. Reliability 38 (l), 146-152. Ibe, O., Trivedi, K., Sathaye, A,, and Howe, R. (1989). Stochastic Petri Net Modeling of VAXcluster System Availability. Proc. Third Int. Workshop Petri Nets and Performance Models PNPM89, pp. 112-121 Kyoto Japan. IEV191 (1987). International Electrotechnical Vocabulary, Chapter 191: Reliability, Maintainability, and Quality of Service. CCIR/CCITT Joint Study Group on Vocabulary. International Electrotechnical Commission, Geneva, Switzerland. Iyer, R. K., Rosetti, D. J., and Hsueh, M. C. (1986). Measurement and Modeling of Computer Reliability as Affected by Systems Activity. ACM Trans. Computer Systems 4,214-237. Kulkarni, V., Nicola, V., and Trivedi, K. (1991). Effects of Checkpointing and Queuing on Program Performance. Stochastic Models. Kulkarni, V. G., Nicola, V. F., and Trivedi, K. S. (1987). The Completion Time of a Job on Multimode systems. Adv. in Applied Prob. 19 (4), 932-954. Laprie, J. C. (1985). Dependable Computing and Fault Tolerance: Concepts and Terminology. Fqteenth Int. Symp. Fault-Tolerant Computing, pp. 1-1 1. Laprie, J. C. (1986). Towards an X-ware Reliability Theory. LAAS Technical Report, Toulouse, France, December, 1986. Lavenberg, S. S. (1983). “Computer Performance Modeling Handbook.” Academic Press, New York. Li, V. O., and Silvester, J. A. (1984). Performance Analysis of Networks with Unreliable Components. IEEE Trans. Communications COM-32 (lo), 1105-11 10. Littlewood, B. (1985). Software Reliability Prediction. In “Resilient Computing Systems” (T. Anderson, ed.). Collins, London. Meyer, J. F. (1980). On Evaluating the Performability of Degradable Computing Systems. IEEE Trans. Computers C-29 (S), 720-731. Meyer, J. F. (1982). Closed-form Solutions of Performability.” IEEE Trans. Computers C-31 (7), 648- 657. Muntz, R. R., de Souze e Silva, E., and Goyal, A. (1989). Bounding availability of repairable computer systems. Proc. 1989 ACM SIGMETRICS and PERFORMANCE89 Int. Conf. Measurement and Modeling of Computer Systems, pp. 29-38 Berkeley, California. Musa, J., Iannino, A., and Okumoto, K. (1987). “Software Reliability: Measurement, Prediction, Application,” McGraw-Hill, New York. Naylor, T. H., and Finger, J. M. (1967). Verification of Computer Simulation Models. Management Science 14,92-101. Nelson, Victor P., and Carroll, Bill D. (1987). “Tutorial: Fault Tolerant Computing.” IEEE Computer Society Press, Silver Springs, Maryland. Nicola, V. F., Kulkarni, V. G., and Trivedi, K. S. (1987). Queuing Analysis of Fault-Tolerant Computer Systems. IEEE Trans. Software Eng. SE13 (3), 363-375. Reibman, A., and Trivedi, K. S. (1988). Numerical transient analysis of Markov models. Computers and Operations Research 15 (I), 19-36. Reibman, A. L., and Trivedi, K. S. (1989). Transient Analysis of Cumulative Measures of Markov Model Behavior. Stochastic Models 5 (4),683-710. Reibman, A., Smith, R., and Trivedi, K. (1989).Markov and Markov Reward Models: A Survey of Numerical Approaches. European J . Operations Research 40 (2), 257-267.
AVAILABILITY AND RELIABILITY MODELING FOR COMPUTER SYSTEMS
233
Sahner, R., and Trivedi, K. (1986). SHARPE: An Introduction and Guide to Users. Duke University, Computer Science, Technical Report. Sahner, R., and Trivedi, K. S. (1987). Reliability Modeling Using SHARPE, IEEE Trans. Reliability R-36(2). 186-193. Shooman, M. L. (1968).“Probabilistic Reliability: An Engineering Approach.” McGraw-Hill, New York. Siewiorek, D. P., and Swarz, R. S. (1982).“The Theory and Practice of Reliable System Design.” Digital Press. Bedford, Massachusetts. Smith, R. M., Trivedi, K.S., and Ramesh, A. V. (1988). Performability Analysis: Measures, an Algorithm and a Case Study. IEEE Trans. Computers C-37 (4), 406-417. Stewart, W. J., and Goyal, A. (1985). Matrix Methods in Large Dependability Models. Research Report RC-11485, IBM, November, 1985. Trivedi, K. S . (1982).“Probability and Statistics with Reliability, Queuing and Computer Science Applications.” Prentice-Hall, Englewood Cliffs, New Jersey. Trivedi, K., Sathaye, A,, Ibe, O., and Howe, R. (1990). Should I Add a Processor? Proc. Hawaii Conf. System Sciences, pp. 214-221. U. S.Department of Defense (1980).“Military Standardization Handbook: Reliability Prediction of Electronic Equipment. MIL-HDBK-217C, Washington, D.C. Veeraraghavan, M., and Trivedi, K. (1987). Hierarchical Modeling for Reliability and Performance Measures. Proc. 1987 Princeton Workshop on Algorithms, Architecture and Technology Issues in Models of Parallel Computation. Published as “Concurrent Computation.” S . Tewksbury, B. Dickinson and S . Schwarz (eds.),Plenum Press, New York, 1988, pp. 449-474.
This Page Intentionally Left Blank
Molecular Cornput ing MICHAEL CONRAD Department of Computer Science Wayne State University Detroit. Michigan 1. Introduction . . . . . . . . . . . . . . . . . . 2. Background . . . . . . . . . . . . . . . . . . 2.1 Proteins versus Transistors . . . . . . . . . . . 2.2 Rationale . . . . . . . . . . . . . . . . . 2.3 Note on Terminology . . . . . . . . . . . . . 3. Theory of Molecular Computing . . . . . . . . . . . 3.1 The Tradeoff Principle . . . . . . . . . . . . 3.2 Programmability versus Efficiency . . . . . . . . 3.3 Evolvability versus Programmability . . . . . . . . 3.4 Extradimensional Bypass . . . . . . . . . . . . 3.5 Relevance to Protein Engineering . . . . . . . . . 3.6 Quantum Molecular Computing . . . . . . . . . 4. The Macro-Micro (M-m) Scheme of Molecular Computing . . 4.1 The M-m Architecture . . . . . . . . . . . . . 4.2 Biological Cells as M-m Architectures . . . . . . . 4.3 The Brain as a Neuromolecular Computer . . . . . . 4.4 Models and Simulations . . . . . . . . . . . . 5 . Modes of Molecular Computing . . . . . . . . . . . 5.1 Classification Scheme . . . . . . . . . . . . . 5.2 The Hierarchy of Mechanisms . . . . . . . . . . 5.3 Biosensor Design . . . . . . . . . . . . . . 6. The Molecular Computer Factory . . . . . . . . . . 7. Molecular Computer Architectures . . . . . . . . . . 7.1 Conventional (von Neumann) Architectures . . . . . 7.2 Parallel (including Neural) Designs . . . . . . . . 7.3 Optical Architectures (including Memory-Based Designs) 7.4 Conformation- and Dynamics-Driven Designs . . . . 7.5 Hybrid Systems . . . . . . . . . . . . . . . 7.6 Evolutionary Architectures . . . . . . . . . . . 7.7 Towards Cognitive Computation . . . . . . . . . 8. Conclusions and Prospects . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . . . .
235 ADVANCES IN COMPUTERS. VOL . 31
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
236 238 238 240 244 246 246 249 252 256 261 265 269 269 274 280 282 289 289 291 301 303 307 307 309 311 312 313 313 314 317 318 319
Copyright 0 1990 by Academic Press. Inc . All rights reproduction in any form reserved. ISBN 0-1 2.01213 I-X
236
MICHAEL CONRAD
1.
Introduction
Molecular computers are information processing systems in which individual molecules play a critical functional role. Natural biological systems fit this definition. Artificial information processing systems fabricated from molecular materials might emulate biology or follow de novo architectural principles. In either case they would qualify as molecular computers. The term may also apply to simulations of biomolecular systems or to virtual molecular computers implemented in conventional silicon machines. The objectives of such simulation systems are to understand the information processing capabilities of biological systems, to provide design guidance for molecular computers fabricated from bona Jide molecular materials, or to serve as biologically motivated artificial intelligence systems. Finally, molecular computers may serve as theoretical models of alternative modes of computing, just as Turing machines serve as theoretical models of conventional computing. Here the contribution of molecular computing is to delineate the ultimate limits and capabilities of information processing systems, and in particular to clarify the comparative capabilities of organisms and machines. Our treatment of molecular computing will commence with a motivational discussion, move to the theoretical analysis, and on to a general architectural scheme capable of supporting the various domains of computing implied by this analysis. We will consider how this architecture is concretized by known processes in biological cells and the brain, and then expand the discussion by classifying the possible modes of molecular computing and outlining the various mechanisms that have been proposed to support these. We will consider the possibilities for implementation, including virtual implementations, implementations using special-purpose silicon hardware, and, most significantly, implementations using macromolecular materials. This will position us for the analysis of special-case architectures of potential technological interest. The general technological requirements for effective development and production are important here as well. Finally, we will consider the implications for the computational model of cognition. The preponderant research effort in the molecular computer field involves materials (both organic and bioorganic), techniques for manipulating materials, and devices. For the purposes of this paper, however, it is better to set up the computational paradigm first and to use this to understand the significance of the work on materials and devices. Two points should be clear at the beginning. First, molecular computing takes a rather broad view of computing. Computing is not predefined in terms of any particular model of computing (such as the digital computer model), any particular physical mechanism (such as switching operations), any particular formal process (such as symbol manipulation), or any particular
MOLECULAR COMPUTING
237
type of control (such as programmability). Any mechanism or process that can contribute to problem solving is a legitimate form of computing. Problem solving here includes any activity that allows a system to stay in the “game of existence.” The system may be a human or other biological organism, an artifact, or a naturally occurring nonbiological entity. From our human point of view we are interested in activities that contribute to our own well being and that of our environment. Pattern recognition, ultimately pattern recognition at the molecular level, will turn out to be our most useful primitive concept. Dynamic mechanisms of signal integration, in particular self-organizing dynamics, play a central role. Sensing, measurement, and effector action are forms of computing in this broad sense. If one adheres to a conventional model of computing, it might seem more natural to view these as robotic processes that can be hybridized with the formal string processing operations of a conventional serial or parallel machine. But we shall see that in many cases it is more interesting to regard them as core forms of computing per se. The second point concerns the state of the art. In the early 1970s, when the author first started teaching and publishing in the field, molecular automata and molecular computers were most reasonably viewed as theoretical metaphors (Conrad, 1972, 1974a). The advent of recombinant DNA technology opened the possibility for ultimate fabrication. But the possibilities for implementing actual systems seemed rather distant. The possibilities still appeared remote to most investigators as recently as the mid 1980s (Yates, 1984). Serious (as opposed to popular and sensational) discussions of molecular computing emphasized that no actual prototype had been fabricated. Such a flat statement cannot be made at the present time. Some prototype or near-prototype elements of computing systems have been fabricated, and molecular functional devices, mostly biosensors, are increasingly common commercial commodities (to verify this it is only necessary to examine the label on a typical supermarket pregnancy test). The judicious statement at this time is that the field has moved to a very primitive stage of technological development. No computer products are commercially available at present, and whether any of the prototype or near-prototype elements could be commercialized before the turn of the century is dubious. Optomolecular memory storage and retrieval systems are a potential exception. Some hand-tailored, highly exotic applications (in the military sphere) are conceivable. Nevertheless, the rapidity with which it is becoming clear that it is feasible to fabricate computationally pertinent devices is stunning. It is due to the large number of researchers worldwide that have entered the field with enthusiasm and determination. Understanding what has motivated this development provides an excellent preview of our theoretical and technological analysis.
238
MICHAEL CONRAD
2.
Background
2.1 Proteins versus Transistors
The basic reason for the interest in molecular computing is that computer science and molecular biology are the two biggest scientific developments in the second half of the twentieth century. The idea of synthesizing these fields in an integral manner must have occurred to many investigators, starting from different points of view (molecular biophysics, computer science, neurophysiology) and probably tracing back to about the same time. Many features of molecular biology might have struck the fancy of early investigators interested in biocomputing. One is the analogy between DNA and computer programs that is suggested by the similarity between the replication and transcription of DNA and various tape writing and tape reading operations in a digital computer. Unfortunately this is an extremely misleading analogy. DNA, as we shall see, is entirely dissimilar to a computer program. Infelicitous analogies between biological cells and conventional computers were a significant negative factor in the field of molecular computing so far as its conceptual development was concerned. The most salient feature is the connection between the structure and function of proteins. The picture that emerges from the x-ray crystallographer’s static analysis is of 20 types of differently shaped beads strung together by wires and then twisted and folded into an intricate and highly asymmetrical shape that brings beads far removed from each other in linear order into close proximity. The beads are amino acids, comprising from about 20 to 50 atoms; a typical protein consists of from 200 to 400 amino acids. The reality is far more dynamic, involving complex thermal and fluid motions. Nevertheless, a quite adequate first-order metaphor for the structure-function relation is that the protein is like a key and the molecule on which it acts is like a lock (Fig. 1). The protein may either serve an enzymatic (that is, catalytic) function or a structural function. If the protein is an enzyme it recognizes a target molecule in its environment (called the substrate) and switches its state by making or breaking a covalent bond. If the protein performs a structural function it sticks to the target molecule that it recognizes, often another protein. This is called the self-assembly process since it is a bona j i d e form of self-organization. The metaphor of two pieces of a jigsaw puzzle randomly bumping into each other and then engaging is appropriate. In both enzyme catalysis and self-assembly the key process is recognition based largely on shape fitting. The scanning required for this shape fitting is driven by Brownian motion. Why is the analogy between DNA and a computer program misleading? The sequence of bases in the DNA codes for the covalently bonded sequence
MOLECULAR COMPUTING
239
FIG. 1. Jigsaw puzzle model of enzymatic pattern recognition. The most important factor for recognition at the molecular level is shape complementarity. The three-dimensional shape of a protein molecule self-organizes from the linear sequence of amino acids through an energyand entropy-dependent folding process.
of amino acids in the protein. The sequence of amino acids in turn folds into the three-dimensional shape of the protein on the basis of numerous weak interactions among the amino acids (van der Waal’s interactions, hydrogen bonds, coordination bonds, disulphide bonds, hydrophobic interactions, etc.). It is as if the sequence of symbols in a Pascal program were coded into a string of differently shaped magnetized beads, which then self-organizes into a threedimensional shape on the basis of the forces among all the beads. The emergent three-dimensional shape might be useful for the performance of some particular function, but it would not be possible to say in advance what function could be performed without performing a physics calculation or doing an experiment. Obviously this is not the situation in conventional computer programming. It is evident that the protein enzyme is far more intricate than a conventional computer switch. The conventional switch is a three-terminal devicethat is, a device with a source of electrons, a sink for electrons, and a control over the flow of electrons. A wall switch that controls the light in a room is an example, but of course in computing we are thinking of a solid-state device. The switching activity of a protein enzyme may involve the “flow” of an electron into or out of a covalency state, or it may involve changes in its own shape state in response to control molecules. The “control terminal” of an enzyme is vastly more sophisticated than the control terminal on a conventional switch. It comprises the shape features of the enzyme responsible for recognition of the substrate or control molecule, as well as the dynamic characteristics responsible for the enzyme’s ability to respond sensitively to other features of its milieu.
240
MICHAEL CONRAD
The intricacy of structure-function relations in proteins entails a number of pertinent features that are markedly different from those possessed by conventional switches. The positions of the atomic nuclei are basically fixed in a solid-state device, except for some vibratory motion about the equilibrium positions and for slight distortions in superconducting materials. From the standpoint of physics all that needs to be considered are the departures of the electrons from their equilibrium positions and their subsequent return. Proteins operate in a radically different physical regime. The detailed positions and motions of the atomic nuclei are as important as the motions of the electrons. This is the basis for their ability to recognize specific substrates and to act on them in a precise way. The vast combinatorial variety of proteins that can be created through genetic engineering methods is a second difference. Roughly 2O4Oo different types of proteins are possible, assuming 20 types of amino acids and amino acid sequences not longer than 400. This means that the designer can work with an indefinitely large variety of different types of switches, with tailored personalities so to speak, rather than implementing functions by cutting complex connections among a large number of basically identical switches. A third difference is that the vast combinatorial variety of protein structures affords the possibility of exerting subtle control over electron motions that is not possible in the much more restricted structural domain of inorganics. In short, what is different is the vastness and variety of delicate specificitiesthat biopolymers allow for, and what is new are the DNA and protein engineering technologies that make it possible to effectively explore and exploit this variety.
2.2 Rationale Proteins are perhaps the most significant example of biomolecular computing elements. But they are not the full story. To fill out the preview let us first look at the predominant technological goals of mplecular computing research today. There are two: The electronics-driven goal. The objective here is to duplicate functions performed by traditional electronic or optoelectronic materials, but at a smaller size scale, higher speed, or lower power dissipation. The main function is that of a simple three-terminal switch. The biology-driuen goal. The objective here is implement biological and brain-like functions that depend on molecular properties that have no analog in conventional materials. The main property is connected with the shape-dependent specificity of proteins and other macromolecules. These goals are conceptually antithetical. The electronics-driven goal is motivated by artificial systems that have no analog in biological nature.
MOLECULAR COMPUTING
24 1
Whether this goal is achievable is thus an open question; achievement depends on the de nouo inventivity of biotechnologists and organic chemists. The feasibility of the biology-driven goal has (as it is sometimes phrased) an existence proof in nature. Proto-biologists have of course not yet succeeded in synthesizing systems that would generally be accepted as living. And living systems require a forbiddingly elaborate infrastructure of metabolism, repair, and control. This is beside the point, however. The biologically motivated molecular computer engineer is not trying to solve the origin-of-life problem, or to create “living computers.” The more-than-sufficient objective is to exploit the characteristic properties of biological macromolecules to produce devices that perform useful information processing functions. Let us now step back and consider in more detail the factors that provide impetus to the above approaches. There are six distinctive lines of development that have converged to produce the present active situation. The first comes from electronics. The majority of micro- and nanoelectronics researchers anticipate that the size, speed, and power dissipation of switches based on silicon or other conventional materials will move incrementally to limits set by the basic laws of physics (Gulyaev et al., 1984; cf. Rambidi et al., 1987). The term “smokestack industry” aptly describes the discomfort with which this situation is perceived. Molecular electronics, an old term that predated the transistor, is seen as a possible escape route. The second line of development is that of computer science per se. The field has enjoyed a sequence of ringing achievements, including languages, compilers, operating systems, analysis of algorithms, data bases, information systems, artificial intelligence, scientific, and modeling applications. A number of critical problems, however, have persisted in being recalcitrant. Pattern recognition, learning, and parallelism are three that immediately come to mind. Most frustrating is the paradox of power created by integrated circuit technology. In serial machines all but a small portion of the silicon is dormant at any given time. If twice as much silicon is packed into a given volume, the fraction of active material decreases. Thus the complexity of problems (such as pattern recognition) that can be addressed increases not a bit, apart from advantages associated with increased memory capacity. To harness the dormant computing power it is in general necessary to give up conventional programmability. It is plausible that the excellent patter;] recognition capabilities of organisms are due to their effective use of parallelism, and to their ability to learn how to exploit their computational resources effectively. As a consequence many researchers, in particular those of a connectionist and neural persuasion, are looking to biology for ideas about parallel computing (cf. Rumelhart and McClelland, 1986). Actually such models were of seminal significance in computer science (McCulloch and Pitts, 1945; Yovits and
242
MICHAEL CONRAD
Cameron, 1960; Yovits et al., 1962) or were actively pursued (and then dropped) early in the development of computer science (Rosenblatt, 1958, 1962; Minsky and Papert, 1969;von Foerster and Zopf, 1962).Molecular computer designs emanating from biology are a further development in this direction, but with no connection to early computer science ideas and much more radically different in style from conventional computing than the connectionist models. The third motivating influence comes from physics. At the present time a semiconductor digital switching operation dissipates not less than about lo5 kT per switching operation (where k is Boltzrnann’s constant, T is absolute temperature, and kT represents thermal energy). A neuron dissipates about 1O’O kT per pulse, the exact order of magnitude depending on the size of the neuron. An enzyme, or protein switch, can dissipate as little as 10 to 100 kT per step (Liberman, 1979).In reality, the switching acitivity involves zero dissipation of energy. This is because the enzyme, as a catalyst, is reversible. The 10 to 100 kT is required if the catalytic action is to leave a macroscopic trace. Superconducting (Josephson junction) switches could in principle also have very low power dissipation. The early physical analysis of computing assumed that computing could always be decomposed into distinct switching operations, and that each switching operation was analogous to a measurement operation (e.g. Brillouin, 1962;von Neumann, 1966). Bennett (1973)and a number of other investigators (Benioff, 1982;Landauer, 1982;Feynman, 1986)argued that reversible general-purpose computing is, in principle, possible, implying that there is no hardware-independent limit on the amount of dissipation associated with computing. Bennett’s analysis suggested that an arbitrary amount of “mathematical work” could be performed before a measurement operation would be required to record the result. While Bennett’s analysis remains controversial in detail (Hastings and Waner, 1984)the implication that a vast reservoir of potential computing power is untapped by conventional technology is undoubtedly sound, and corresponds to the fact that each enzymatic switching operation subsumes a molecular pattern recognition task that would require an enormous number of digital switching operations to duplicate (Conrad, 1984). The analysis of physical limits of computing has thus been a two-edged sword so far as the goals of molecular computing are concerned. The electronics-motivated researcher can draw the implication that there is no general physical reason that would militate against the attempt to fabricate a “molecular transistor” that transcends conventional materials in performance capabilities. The agreement of the physical analysis with the facts of biomolecular computing suggests to the biologically motivated designer that enzyme-driven computer designs are a justified course of research. Later we
MOLECULAR COMPUTING
243
will show that quantum considerations allow even greater computational power than suggested by the reversible computing analysis. The fourth influence comes from polymer chemistry (cf. Friend, 1988; Potember et al., 1988). The discovery of organic polymers with metallic properties attracted a great deal of attention from the electronics point of view. Polymers with conductive and semiconductive properties have been synthesized (cf. Street and Clarke, 1981), and probably more significantly a variety of optically active polymers (since silicon, as an indirect semiconductor, is not an efficient light emitter). A molecular-sized switch could, in principle, act in about one-hundredth of a picosecond, two orders of magnitude faster than can be achieved with today’s fastest transistor. Unless the molecule could be isolated from the much larger capacitive environment, this speed advantage would in practice be degraded. Nevertheless it is clear that the creation of organic polymers with radically new properties and the increasing feasibility of arranging molecules into complexes (such as donoracceptor complexes) that could exhibit useful functional properties provides a major impetus to the development of new devices, including information processing devices. To date, the type of systems envisioned are on the conventional electronic side, but in actual fact the organic chemistry work may be better suited to nonconventional architectures. The fifth seminal influence derives from biophysics and cell biology. A variety of particular themes can be mentioned, including studies of structurefunction relations in biomolecules, membrane processes (Tien, 1988), photobiology, electron transfer proteins and related energy transduction mechanisms (Gilmanshin and Lazarev, 1987), reaction-diffusion and other nonlinear dynamic models (Nicolis and Prigogine, 1977), and (more speculatively) soliton and related quasi-particle mechanisms of signal transduction (Davydov, 1985). The recognition by researchers in these areas that basic biophysical knowledge, as incomplete as it is, could be harnessed for technological applications is obviously a major motivating factor. This reviewer’s assessment is that one key discovery stands out apropros computing. Since the 1950s it has been known that in many cases hormones (first messengers) impinging on the external membranes of animal cells trigger the production of intracellular chemical signals (called second messengers). The second messengers in turn trigger various molecular processes within the cell. The pertinent impulses impinging on the external membranes of many central nervous system neurons trigger the production of second messengers, which in turn trigger molecular mechanisms that control nerve impulse activity (Liberman et al., 1975). The discovery that second-messenger mechanisms play a linking role in cellular information processing means that all the macromolecular and microphysical mechanisms mentioned become directly
244
MICHAEL CONRAD
pertinent to information processing in the large, and biological systems therefore possess a channel for exploiting the vast potential reservoir of low dissipation computing power whose existence is implied by the physics of computation arguments. The sixth line of development is molecular biotechnology. This is the main enabling technology, and the main reason for viewing molecular computing as a concrete engineering possibility (rather than as a purely theoretical model or as a purely natural science).Actually we are dealing here with a collection of technologies, including DNA and protein engineering, membrane reconstitution techniques, immobilized enzymes, antigen-antibody networks, bioreactors, and applied molecular evolution. The step-by-step advance of these technologies is sure to have a significant impact on agriculture, medicine, and the chemical industry. And it is the step-by-step development of the biotechnological infrastructure that provides the context for all directions of molecular computer technology. Though the biologically and electronically oriented goals are motivated by quite different considerations, the technologies required to achieve them are in many cases the same. Developments along both lines are, as a consequence, potentially highly synergetic. The reason is that both the conventional and biologically motivated designs may be viewed as cross-sections of a broader class of molecular computer designs. This broader class also includes architectures that deviate both from conventional electronics and from biology.
2.3 Note on Terminology Molecular computing is not the only term used to describe work in the field to be reviewed. Other terms include molecular electronics, molecular device technology, nanotechnology, bioelectronics, biosensors, biocomputing, biochip, molecular functional systems, and biomolecular information processing. “Molecular electronics,” an old term, has become popular because it hitchhikes on the flow of the electronics industry. As indicated above, it must be kept in mind that nuclear motions are important in many polymeric materials, and almost always significant in biopolymers. Also, the flow of protons and the dynamics of photons, phonons, and various quasi-particles are key factors. It is important not to be trapped by the analogy with conventional electronic devices. The term “molecular device technology” escapes this, but is restricted to engineering applications rather than natural science studies. “Nanotechnology” and “nanoelectronics”refer to engineering applications at a molecular and atomic size scale. As a downward extension of microelectronic technology, this carries the flavor of manipulative fabrication
MOLECULAR COMPUTING
245
by outside hands rather than the self-organizing flavor of biomolecular systems. Nanoelectronics can probably safely be viewed as the solid-state limit of molecular electronics. “Bioelectronics” refers to the classic study of the electrochemical activity of biological membranes; in recent times it has come to include the coupling of molecular mechanisms to ionic gradients associated with membrane potentials, in particular in conjunction with the energy processes of photosynthesis and respiration. Biosensors are either natural or artificial systems that detect signals on the basis of protein specificity or on the basis of bioelectronic processes involving protein-membrane coupling. “Biocomputing” is a broad term that could encompass natural biological information processing, biologically and neurally motivated computer designs, as well as applications of computers to biology. The term “biochip” suggests a marriage between bioorganic materials and electronic chip technology. The term has caught on in many parts of world. It has become unpopular in the United States, however, due to the overly sensational attention that it attracted. Another term, “chemical computing,” should now probably refer to information processing schemes that operate on a primarily chemical kinetic basis. This author is an advocate of the term “molecular functional systems.” This accommodates all the nuclear and electronic dynamics that could be functionally significant, and is open with respect to the domain of application. It accommodates both the scientific and technological directions of the field (molecular functional science and molecular functional technology), and, further, it does not carry with it the sense of separation between materials and architectures that is so useful in the case of programmable machines and so inappropriate in the case of biology. Molecular computers are molecular functional systems in which information processing and problem solving are primary. “Biomolecular information processing” (or “biomolecular computing”) refers to naturally occurring molecular computers (cells and organisms) or to biologically motivated molecular computer designs. “Neuromolecular computing” refers to molecular computer models with a neural organization, to brain models motivated by molecular computer design principles, or to technological molecular computer designs with a brain-like architecture. Many researchers in the field will undoubtedly object to these perhaps overly categorical distinctions. The terms that are used are not matters of ultimate import. Hopefully, however, our digression into terminology can serve to convey the diversity and richness of approaches to this new realm of science and technology, and to orient the reader to the place of the present review in this realm.
246
MICHAEL CONRAD
3. Theory of Molecular Computing In our background discussion we have referred to concepts such as programmability, self-organization capacity, and efficiency in a general way. These features may be more or less present or absent in any information processing system, and indeed in any functional system. The purpose of this section is to make a statement about the tradeoffs among these three properties. We will not attempt to justify this statement in detail (for a recent thorough review, see Conrad, 1988a).Rather we will use it as a springboard for unpacking the extreme differences between the conventional and biomolecular modes of computing. We will pay a great deal of attention to the place of evolutionary processes in these tradeoffs. There are three reasons. The first is that evolution is nature’s foundational method of problem solving. As such it may be viewed as a form of computing. Evolutionary computing (or genetic algorithms) have in fact been used as a form of optimization in a conventional computer science framework (Bremermann, 1962; Holland, 1975). The second reason is that the evolutionary process is crucial for creating biomolecular computing structures that solve problems by nonevolutionary means. This is manifestly the case for natural biomolecular systems, such as the brain, and it is therefore critical to understand the effectiveness of evolutionary methods in nature. The third reason is that it is necessary to understand why evolution works to effectively employ protein engineering, the main enabling technology for artificial molecular computing. Our working premise is that the process of evolution by variation and natural selection recruits microscopic molecular processes for the efficient performance of information processing functions in biological cells and organisms, and that it can also be used to do so in artificial biomolecular systems. 3.1 The Tradeoff Principle The tradeoff principle may be stated as follows. A computing system cannot have all of the following three properties: structural programmability, high computational eficiency, and high evolutionary adaptability. Structural programmability and high computational eficiency are always mutually exclusive. Structural programmability and evolutionary adaptability are mutually exclusive in the region of maximum efective computational eficiency (which is always less than or equal to the computational eficiency). The term programmability here is used in the rather restrictive sense that corresponds to our experience in communicating programs to digital computers. It means that an agent who wishes to prescribe the rule (or program) that will guide a system’s behavior can do so by consulting a finite (and reasonably sized) users manual that specifies the function performed by
MOLECULAR COMPUTING
247
each component of the system in a precise and definitive manner. It is not necessary to solve the equations of physics (which, because of their continuous aspect, would not allow a finite form of programming). Nor is it necessary to perform laboratory experiments once the manual is established. The essence of a conventional digital computer program is that the human programmer conceives of an algorithm, or definite method of solving a problem, and expresses it directly in the strings of a formal language without performing any calculations or doing any experiments. The term structural programmability means that the program that guides a system’s behavior is inscribed in the states of its components and (more importantly) in their connectivity according to the specifications in a definitive users manual. Later it will become clear that it is possible for a system to be structurally nonprogrammable and nevertheless have general powers of computation (Conrad, 1974b). Hence it could embed an interpreter. This would mean that it would be programmable by virtue of embodying a program that enables it to read and follow rules, but it would not be programmable at the level of physical structure. According to the tradeoff principle, the human brain must be an example, since it is a product of evolutionary self-organization. This is the reason why the tradeoff principle must be formulated in terms of structural programmability. All of today’s digital computers are structurally programmable by virtue of being built out of building blocks that behave according to the users manual specifications to as great an extent as the engineer can achieve. They are programmable at the interpretive level if a human programmer writes a universal program and uses their structural programmability to implant this universal program. More commonly, the human programmer implants a compiler that allows the state of the machine to be set in such a way that it is guided by the desired application program. Efficiency may be quantified in terms of the number of interactions that a system uses for computing (or for performing any other function) relative to the number potentially available. In computer science computational complexity is usually defined in terms of the number of processors or time steps required to solve a problem: and efficiency is thus ordinarily defined in terms of number of processors or time steps as well. This is reasonable in the special case of structurally programmable systems, since the number of interactions that a component can engage in must be limited in a definite way if it is not to break out of its users manual definition. No such restriction applies in structurally nonprogrammable systems. The number of interactions can (according to current force laws) grow quadratically with the number of elements. Evolutionary adaptability refers to the ability of a system to learn to perform a task through a variation-selection-reproduction procedure. More
248
MICHAEL CONRAD
generally, it refers to the ability of a system to learn on the basis of any procedure that requires malleable structure-function relations. This means that individual changes in the structure of the system are compatible with gradual distortion of the function. Evolutionary adaptability is closely related to self-organization, and is a form of self-organization. For example, protein folding and self-assembly of aggregates of proteins are both self-organizing processes. As a consequence, they are not amenable to conventional programmability. This is because the role that any single amino acid plays in the functional operation of the entire polymer cannot be specified in a manner that is independent of the other amino acids, therefore in a manner that can be freed from the entire context. However, the continuous dynamical aspect of folding in many cases allows the shape and function of the proteins to change in a gradual way with single changes in amino acid sequence. As a consequence the self-organizing (or emergent) relation between the linear sequence of amino acids in a protein and its three-dimensional shape and function makes it particularly well suited for “programming” through the self-organizing process of evolution by variation and selection. Structural programmability cannot be completely incompatible with evolutionary adaptability, at least if it is assumed that it is in principle possible to simulate any process in nature. This is sometimes called the strong form of the Turing-Church thesis (Hofstadter, 1980). If we accept this thesis as a heuristic strategy (as long as no convincing counterexample is found) we are obliged to admit that it should in principle be possible to simulate the process of evolution on a digital computer, at least in its main features. In order to do so, it would be necessary to construct a structurally nonprogrammable virtual machine on top of the structurally programmable base machine that exhibits the type of self-organizing characteristics that allow for evolutionary adaptability. Clearly an enormous number of computational resources are required to build such malleable virtual systems. The computational resources required to build the virtual machine are not directly targeted to the problem that the virtual machine must solve. A direct physical implementation of the virtual machine, in which the structure is naturally matched to the function performed, would couple its interactions far more efficiently to the task at hand. The eflective computational eficiency of a virtual machine is the efficiency corrected for the number of interactions required to support the virtual machine. Clearly the effective computational efficiency is always less than or equal to the computational efficiency.This is why structural programmability and evolutionary adaptability are mutually exclusive in the region of maximum effective computational efficiency. The tradeoff principle implies different domains of computing. Programmable machines operate in one domain. Systems that learn to perform tasks
MOLECULAR COMPUTING
249
through evolution operate in a radically different domain. Evolutionary systems are potentially much more efficient at coupling their material resources to problem solving. This does not help much for problems that are inherently sequential, such as arithmetic problems. To the extent that such problems defy decomposition into segments that can be performed in parallel, all systems that address them will be inefficient. The only resource that can help is speed. Efficient coupling can help enormously for parallelizable problems, such as pattern recognition. To try to duplicate the efficiency of biological organisms for such problems with structurally programmable machines is like trying to build a perpetual motion machine. 3.2
Programmability versus Efficiency
In order to proceed it is necessary to cast the above informal comments about the conflict between programmability and efficiency into the form of a simple model. Consider a system comprising n particles. If, as is usual in physics, we assume pairwise interactions (including self-interactions), we can express the efficiency as a eff = n2’
where a is the actual number of interactions used. Particles are unlikely to be the primitive elements of the system. In most complex systems the particles are lumped together into groupings, or building blocks, that serve to freeze out many of the possible interactions. This is strongly the case for structurally programmable systems. It will not significantly affect the conclusions if we make the simplifying assumption that each building block of a structurally programmable system comprises k particles, so that the number of building blocks (or processors) is given by n/k. The assumption most unfavorable to our argument is that all k 2 interactions within any processor can contribute to problem solving. The number of interactions that can contribute to problem solving in such a structurally programmable system then increases with the number of particles by at most C(n/k)k2,where C is a constant representing the number of potential contacts that a processor can have and nevertheless operate according to its definition in the users manual. The efficiency of a structurally programmable system thus scales at most as eff = Cnk/nz = Ck/n. Thus, the eficiency decreases as the number of particles in the system increases. If the processors are simple averaging devices that fire if the sum of the inputs to them exceeds a threshold, it might at first be thought that C could increase with the number of processors. This, however, is physically unrealistic. The number of particles in each processor would eventually have to be increased, and, as a consequence, fewer of the intraprocessor interactions
250
MICHAEL CONRAD
would contribute to computing. In practice, only a small fraction of the k2 intraprocessor interactions could contribute in any case. More important, only a tiny fraction of the Cnk potential interactions could actually be turned on at any given time if the machine is run in a serial mode. If a large fraction of the interactions are turned on at any given time, a large fraction of the processors must be active as well. This is the regime of structurally programmable parallelism. In practice conventional programmability is lost in this regime, just as it would be if the requirement of structural programmability were dropped. Now let us consider what happens if this requirement is dropped. In principle the system could achieve the maximum possible efficiency (eff = n2/n2 = 1). Such a system would be rather inflexible, apart from variations in the initial and boundary conditions. The maximum number of variations on the interaction structure occurs when a = n2/2. This will be called the region of maximum evolutionary flexibility. The efficiency in this region scales as eff = (n2/2)/n2= 1/2, independent of the size of the system. This is n/2Ck times larger than the maximum efficiency of a structurally programmable system run in a completely parallel mode. Shortly we will augment this argument to show that most of the interactions should be weak to maximize evolutionary adaptability. This allows for a building-block structure, but without the building blocks having definitive physics-independent specifications.Here the pertinent point is the sharp contrast between current-day programmable machines and evolved biological information processing systems. The engineer must work hard to suppress most of the potential interactions in a programmable digital computer in order to make the machine behave like a purely formal system. The physics responsible for the action of the switches is completely masked so far as the user of the machine is concerned; if the masking fails, a fault is said to occur. Let us compare the growth rate of the number of interactions with the growth rate of polynomial-typeproblems. The size of an n2-typeproblem that can be solved by a structurally programmable system consisting of n particles increases by a factor that scales at most as n’”, even if all the allowed interactions in the system can be brought to bear. This is because the number of interactions scales as the number of processors, and as a consequence it corresponds to the usual statement made about the maximum potential advantage of parallelism. By contrast, the size of a problem solvable by a structurally nonprogrammable system with maximum evolutionary flexibility increases by at most n/& This means that in the structurally nonprogrammable domain of computing it is in principle possible to keep pace with problems that grow quadratically in terms of their resource requirements. This, of course, assumes that the interactions are used in an effective fashion. This is unlikely,
MOLECULAR COMPUTING
251
but more likely than that the processors in a structurally programmable machine would be used effectively, due to the possibility of high evolutionary adaptability. It also assumes that extra interactions could in principle be coupled to problem solving in as effective a manner as extra processors. This is not possible for all problems, but it is certainly possible in important special cases. For example, suppose that the problem is to compute the behavior of an n-particle system in nature. The number of interactions to be computed would in general grow in an n 2 fashion. A structurally nonprogrammable analog could clearly be tailored to use its interactions to keep up with this growth rate, whereas it would be much more difficult to tailor a structurally programmable machine to use its processors in parallel for the same purpose in a maximally effective manner. Some systems may operate in an intermediate zone, allowing for some instructive control on the part of a designer, but not precise prescriptive control. The potential efficiency would be greater than that for programmable digital machines, but would not grow as fast as n2. Later (in Section 7) we will consider architectures in this category. The simple model outlined above is independent of physical issues that are commonly discussed in relation to limits of computation, such as heat production, speed of signal propagation, Brownian noise, and quantum side effects. The possible physical interactions in structurally nonprogrammable systems are, of course, limited by the constants of nature. Any specific limitations, over and above the general laws of physics, are relative to a particular architecture. Quantum side effects, for example, are pertinent in a conventional digital architecture in which the switching elements are supposed to realize logical operations. They are not pertinent in a biomolecular architecture in which side effects might be recruited for problem solving through the evolutionary process. Brownian noise could be a desirable feature in a system designed to generate randomness. Heat production would be a desirable feature in a system designed to function as a star, or to be a scale model (analog) of a star. It is a major limiting factor in both conventional computers and in biological organisms. Similarly, arguments based on reversible models of computing (Bennett, 1973) should be understood with reference to a particular model. As far as conventional digital models are concerned, reversible computing is an unrealistic curiosity. It means very slow computations and requires an enormous memory space to store all the intermediate states. When formulated in terms of special-purpose, enzymedriven architectures, a manifest empirical correlate accrues to the concept, however, for, as we have already noted, the molecular pattern recognition performed by enzymes is part of their reversible catalytic action. The particular constraints that must be imposed on a collection of particles to
252
MICHAEL CONRAD
construct a particular computer architecture always entail particular physical limits on function. The constraints associated with structural programmability are most limiting in this respect, since the underlying physics must be completely quenched. Of course the great advantage is the complete control available to the user. 3.3 Evolvability versus Programmability In lieu of this control it is necessary to have some learning or selforganization process. Variation and selection is the mechanism that underpins biological organic evolution. The basic idea is that entities capable of selfreproduction exist, that self-reproduction occurs with statistical variation (or error), and that a selecting agent (or environment) classifies the selfreproducing entities into two groups: those that actually reproduce and those that do not. The repetition of this process over time leads to a historical development. Many of the characteristics that develop are strongly influenced by the selecting agent. In organic evolution the selection arises from the fact that the finiteness of the environment inevitably prunes what would otherwise be an exponential increase in the number of self-reproducing entities. Thus the selection process is called natural selection. The criteria for selection emerge from the interactions amoxfg the organisms and from their interactions with the environment. The criteria change dynamically with time. In animal or plant breeding the criteria can be controlled in a more constant fashion by the breeder. This is the situation of artificial selection, and we shall eventually see that it is highly pertinent to the protein engineering aspect of molecular computer design. The variation-selection process is subject to an important additional constraint that defines the classic Darwin-Wallace version of the theory of evolution. This is sometimes called the strong principle of inheritance, and is connected with the distinction between genotype and phenotype. The genotype refers to the information, primarily encoded in DNA, that is transmitted from parent to offspring. The phenotype refers to the characteristics of the organism that self-organize in the course of development, partly under the influence of the genetic information. The protein enzyme provides a simple and clear example. The amino acid sequence is the genotype of the protein (since it is a direct coding of a DNA base sequence). The folded shape and functionalproperties of the protein comprise its phenotype. In the case of the protein the self-organizationof the folded shape is completely determined by the amino acid sequence, but through a free-energy minimization process that has no analog in conventional computer programming. In the case of the organism as a whole, the relation between genotype and phenotype is still more complex. The strong principle of inheritance says that the traits acquired
MOLECULAR COMPUTING
253
by the phenotype in the course of its interactions with the environment are not transmitted to the genotype in any way that specifically directs the evolutionary adaptation process. By and large the genotype in biological organisms is protected from such phenotypic influences as much as possible, since otherwise the information accumulated in the course of evolution would inevitably be degraded. To prevent such degradation it would be necessary to introduce a filter capable of distinguishing the environmentally induced modifications of the phenotype that are beneficial to the organism from the much larger number that are harmful. This would mean putting an intelligence into the system which begs the question of self-organization. The human genetic engineer might act as such a filter by combining physics-based calculations and computer simulations to guess which mutations might more likely yield desired protein phenotypes. But even this would not be backward reading of phenotype into genotype. Discussions of evolution are often phrased in terms of hill climbing on an adaptive landscape (Wright, 1932). Evolution is thus viewed as an optimization process of sorts. The classical adaptive surface, based on the concepts of classical genetics, is a plot of fitness against gene frequency (Fig. 2). Fitness is not a well defined concept. For the present purposes it can be viewed simply as a performance measure, insofar as such a measure can be constructed. In discussions of natural systems it should be viewed as a construct representing the belief that some scalar performance measure exists at any point in time, or at least that it is useful to make this assumption from the point of view of analyzing evolutionary processes. The situation is more complicated in molecular genetics. Genes are sequences of nucleotide bases, and hence the notion of gene frequency is much too crude an approximation. It is possible to construct a molecular adaptive surface analogous to the classical molecular adaptive landscape (Conrad, 1979a, 1983). The gene frequency axes are replaced by axes that represent the sequences of bases in each gene. For simplicity let us consider DNA sequences as a whole, without worrying how different functional units on the sequence are distinguished and processed into different amino acid sequences. Each base position can be represented by four axes, corresponding to the four possible nucleotide bases (A, T, G, and C) that could occupy that position. The A axis corresponding to a particular position is assigned the value 1 if A occupies that position; otherwise it is assigned the value 0. Obviously only one of every group of four axes can have the value 1. If a single gene comprises loo0 nucleotide bases we require 4loooaxes. We could describe an analogous space in terms of the 20 amino acids. The number of elements in each axis grouping would be larger (20), but the number of axes altogether would be cut down by a third. The dimensionality of the space is even vaster for the genome as a whole. The space becomes an adaptive surface if a fitness axis is added.
254
MICHAEL CONRAD
Each possible point in the gene space is then assigned a height on the fitness axis. If a hyperdimensional sheet were dropped over the points in this space we would have an ultrahigh-dimensional manifold of hills, peaks, valleys, and crevices. The same construction can be used to represent the performance value of computer programs. In this case the base positions correspond to the possible locations in which the elements of the formal language can be located. The number of base types correspond to the number of types of symbols used in the language. The fitness axis is a scalar measure of performance of the functionality of the program, it being clearly realized that scalar measures are unrealistic even for artificial systems created for defined human purposes. A structurally programmable system can be represented in the same way, since the states of its components and the connections among them map a formal computer program. It is clear from experience why structurally programmable systems would be unsuitable for evolution if the variations (e.g., mutations, cross-overs, recombinations) occur at the base level, corresponding to variations in the code of a computer program. The chance that any single mutation (we will use this term generically for all the variation operations) will lead to a new program with acceptable performative value is in general small. The peaks on the landscape are isolated in all dimensions by deep, wide valleys. This means that many simultaneous mutations must occur at once in order to make the jump from one peak to another. This has negligible likelihood if the mutations are random. It is possible to estimate this probability with a simple model (Conrad, 1972, 1983; see also Maynard Smith, 1970, for a different formulation leading to similar conclusions). To make matters concrete let us consider a single protein comprising a sequence of n amino acids. Suppose that m simultaneous mutations (or, more generally, unitary genetic events) are required to jump to the nearest acceptable protein. The probability, P,, is given by
P,
= p”(1 - p)”--J,
where p is the mutation probability and j (Im)is the number of “don’t care” mutations. The first factor represents the probability that the desired collection of mutations will occur at the same time and the second factor represents the probability that the remaining amino acids do not mutate, except for some “don’t care” mutations. We ignore, as not having any real impact on the conclusion, the detail that the mutation probability should be corrected for the fact that there is only about one chance in 20 that the mutation will be to the desired amino acid, and that whether a mutation at a particular site is a “don’t care” depends on what the new amino acid is and possibly on mutations at other sites. The number of generations required for
MOLECULAR COMPUTING
255
the target protein to appear is thus given by 1 z(m, simultaneous) = NO p m '
where No is the initial population size. If we assume a generous initial population size of 10" and a biologically typical mutation probability of lo-", it will on the average take more than 10" generations to jump to the peak corresponding to the target protein if m is as small as 2 (only two simultaneous mutations required). If m = 3 it will require more than lozo generations. These numbers exceed the historical time available on earth. Choosing substantially smaller mutation rates does not alter the situation. This would only serve to increase the number of undesirable mutations, especially in a long sequence. Now let us consider the case in which the evolution to the target protein can proceed in a sequence of single mutations, each of which is acceptable. The evolution time then scales as m z(m, stepwise) = - (m - l)D, NO PI
+
where D is a delay time that represents the number of generations for each mutated protein in the sequence to reach a reasonable population size. This is a lower bound since many alternative sequences of mutations might lead to the target protein and since the next step will, in general, be discovered before the previously mutated protein reaches its full potential in terms of population size. The evolution time is thus controlled by D rather than by the amount of time required for each mutation to occur, The evolution times remain reasonable for large m even if D is on the order of thousands of generations. If D is taken as 1000 and m as 100, the evolution time would require about 100,000generations, assuming the same initial population size and mutation rate as before. This is quite modest on the evolutionary time scale. We can condense the above discussion into a simple statement of the main condition that must be satisfied in order for a system to have high evolutionary adaptability: The relation between the structure and function of a system must be such that acceptable or improved function can always be achieved by single mutations of the structure. This will be called the gradualism condition. If the gradualism condition is not satisfied, the system will be caught on isolated adaptive peaks. If the environment is changing, it will not be able to keep up with the movement of the peaks. This type of stagnation has been called the mesa phenomenon by Minsky (1961), but it is probably more descriptive to call it the crevice phenomenon. We have already noted the crevice-like character of computer programs. This corresponds to the programmer's common experience that it is necessary to conceptualize and introduce a
256
MICHAEL CONRAD
variety of compensating changes to go from one usable piece of code to another. This could be given a rigorous formulation in terms of unsolvability and intractability. Without going into details here, we can simply observe that if we started out with a program that yielded a defined computation, any mutation of this program would certainly also yield a defined computation if the gradualism property applied. But this would mean that, if programs in general had the gradualism property, we would be able to solve the halting problem for Turing machines, which is, of course, unsolvable. It is this delicacy of generative processes which is the source of the incompatibility of structural programmability and high evolutionary adaptability. Classic discussions of evolution (e.g., Mayr, 1963; Stebbins, 1950) focus attention on search strategies, such as sexual versus asexual strategies, the use of various genetic operators, the role of isolation in speciation, and the use of mechanisms for conserving genetic adaptations. These are important, but secondary to the gradualism requirement, which is sine qua non.’
3.4 Extradirnensional Bypass Now I want to turn to three key questions: What type of material organization does allow for high evolutionary adaptability, is it feasible to simulate this organization on a structurally programmable machine, and what steps would have to be taken to ensure that the organic materials with which the molecular computer architect works in fact have the high evolvability property? Consider a molecular adaptive surface with peaks and valleys, and suppose that the system of interest occupies a region of the surface in which the peaks have good fitness values but are in general isolated from each other. Now suppose that we embed this region in a higher-dimensional space. We can d o so by adding elements to the gene axes that are superfluous so far as function is concerned. In effect, we can think of the peaks and valleys in the original space as shadows (projections) of corresponding peaks and valleys in the higherdimensional space. The height of the higher-dimensional peaks must in general be somewhat reduced, due the fact that the addition of mechanistically The word “gradualism”is sometimes used to refer to the rate of evolution rather than to the malleability of structure-function relations. The classic synthetic theory of evolution put emphasis on gradual changes in the fossil record, and held that visible evolutionarydevelopment resulted from a slow accumulation of such changes. The punctuated equilibrium model (Gould and Eldridge, 1977) proposes that the fossil record exhibits periods of stasis and rapid change.But rapidity here is relative to the slow rates that would be anticipated on the basis of classical evolutionarygenetics models. A gradualistic relation between structure and function provides the most robust basis for rapid rates of evolution, as shown by the simple model outlined in the text.
MOLECULAR COMPUTING
257
superfluous components is a cost. The advantage is that the peaks are less likely to be isolated in every dimension as the number of dimensions becomes greater. As a consequence the move to the higher-dimensional region corresponds to an increase in evolvability. It is not possible to give this intuition a firm theoretical basis in the adaptive landscape picture taken by itself. This is because fitness, as emphasized earlier, is an arbitrary construct. We can imagine adaptive surfaces with arbitrary topographies, including both topographies that exemplify and counterexemplify the above intuition. In order to proceed it is necessary to associate the peak structure with a dynamical picture. In this way we can draw on relevant results of dynamical systems theory. The simple idea is that stability (in some sense) is a necessary condition for fitness. Thus the peak structure of the adaptive surface can be mapped into the basins of attraction in a state space (or the valleys of a potential surface). In a biological system the phenotype constitutes the state space. In a digital computer the states of the machine constitute the phase space, and the sequence of states the trajectory in this space. Since the digital machine is structurally programmable, the state space description has an explicit (users manual) relation to the specification of the program on the adaptive landscape axes, but not to the positions on the fitness axes. In the case of biological systems the relationship between the state-space description and the specification on the genotype axis is far more indirect, since the system is self-organizing. Because of the vast variety of dynamical systems, an analysis that covers all possibilities is clearly out of the question. What we can do is to show that an increase in dimensionality in fact does yield a more evolution-friendly peak structure for an important and robust class of dynamical systems. This is sufficient to verify our intuition, but it does not mean that special classes of systems do not exist that achieve high evolvability by other means or that the addition of mechanistically superfluous elements always increases evolvability (if it did, we could use such codings as a means to solve the halting problem). The situation is schematically illustrated in Fig. 2. Peaks A and B of the adaptive surface correspond to basins of attraction A and B in the dynamical (state space) description. The lower part of the diagram represents corresponding spaces of increased dimensionality. The increased dimensionality increases the chance of an instability in the dynamical system, hence of a “pathway” from B to A in the state space that would be downward in a potential function picture (this is a statement that can be proved for a wide class of dynamical systems). But if the system slides from B to A in the state space this means it climbs from A to B in the fitness space, that is, that a stepwise climbable extradimensional bypass has been opened up.
258
MICHAEL CONRAD
component variable
fitness
7ff genotype
I environrneni
variable
component variable
fitness
I
B
leak in basin B
genotype genotype
environmeni variable
co mponent variable
FIG.2. Extradimensional bypass. An adaptive surface is a plot of fitness against genotypic structure. (a) Schematic representation of a low-dimensional adaptive surface (left) and basins of attraction in the coordinated state space (right). Peak A corresponds to basin A, and peak B corresponds to basin B. (b) Schematic representation of a higher-dimensional adaptive surface (left) and basins of attraction in the coordinated space (right). Peaks A and B in the lowerdimensional space correspond to (are projections of) peaks A and B in the higher-dimensional space. The addition of extra dimensions, however, allows for leaks from basin B to basin A, and hence corresponding stepwise traversible pathways from peak A to peak B.
MOLECULAR COMPUTING
259
Before going on, it is necessary to elaborate the notion of stability. To be fit, an organism or system must be able to stay in the game of existence. This is already a broad sense of stability. It might mean that the qualitative behavior of the system should not be qualitatively altered by slight alterations in the map that describes its behavior. In this case the system is said to be structurally stable to the particular class of alterations made (Thom, 1970). It might mean that the perturbations to the state-to-state behavior of the system (essentially alterations in the initial conditions) are dissipated away, so that the system approaches (as t + co) the state or mode of behavior that it would have reached had it not been perturbed. This is asymptotic orbital stability (Rosen, 1970). It might mean that the system has multiple steady states, all orbitally stable to only a small range of perturbations, but closely packed, so that all of them are equivalent from the standpoint of function. A system could conceivably have chaotic (initial condition sensitive) dynamics that helps to maintain the existence of a larger system. For the purposes of the present discussion we will focus on the orbital sense of stability. As indicated above, a dynamical system described by differential equations will be said to be asymptotically stable around one of its critical points (where all the derivatives are set equal to zero) relative to some class of perturbations if it returns with arbitrary closeness to that critical point as time runs to infinity. The well known criterion is that all the eigenvalues of the equation obtained by linearizing about the critical point in question are negative (or, for the weaker condition of neutral stability, negative or zero). The class of systems to be considered includes the classical biochemical and population dynamics models. The famous predator-prey models of Lotka and Volterra are an example (cf,Rescigno and Richardson, 1973). These particular models are first-order Taylor series approximations to the broader class. The relationship between stability and complexity (including dimensionality) in these systems has been analyzed in an incisive way by May (1973; cf. also Gardner and Ashby, 1970, for an earlier numerical study, and Hastings, 1982, for an alternative general proof). Let P, represent the probability that any pair of components in the system actually interact, s the (common) average interaction strength, and n the number of components. May’s theorem states that in models in which the interaction structure is selected at random the probability of stability goes to 0 when s(nP,)”* > 1,
and to 1 when s(n<)’I2 < 1,
with the transition to instability with increase in s being extremely sharp for large n. If P, is a constant, independent of n, the system must be structurally
260
MICHAEL CONRAD
nonprogrammable (since the number of interactions that a component can engage in will certainly exceed the potential number of contacts, C, as more components are added). This result implies that the chance of a basin occurring in the state space decreases as the number of components in the system increases, as the number of interactions among these components increases, and as the interactions become stronger. All things being equal, the probability of stability decreases as the dimensionality (number of components) increases. The addition of mechanistically superfluous components and interactions would in general be quite like!y to destabilize the organization and therefore to render it unfit. The system would either fail completely or self-simplify by sliding to a new basin with lower dimensionality and fewer interactions. Such self-simplification would decrease the likelihood that small perturbations would push the system from one basin to another. But at the same time, transforming to the associated fitness space, it would decrease the likelihood that the fitness peaks corresponding to the basins are connected by stepwise traversible pathways. Conversely, the likelihood that peaks will be connected by such pathways increases if the system “self-complicates” by increasing the number of components and interactions; but the likelihood that the peaks will occur at all decreases. In short, what is good for evolvability is bad for fitness. This apparent contradiction is resolvable. It is necessary to organize the system so that the state-space (i.e., phenotypic) dynamics are slightly unstable relative to genetic variation (or perturbation) but stable relative to other perturbations, such as perturbations emanating from the environment or internal noise processes not connected to the gene axes. The problem is that one and the same organization must satisfy both pressures, one driving it to complexity and the other driving it to simplicity. Three organizational features contribute to solving this problem. The first is compartmentalization, that is, division of the system into blocks of components that interact mostly among themselves in terms of number and strength of interactions. From the dynamical point of view block structure is known to increase the chance of stability (May, 1973). From the information theory point of view it is known that block structure can serve to reduce the ramification of perturbation (Conrad, 1983).This allows for channeling of the effectof mutation on specific aspects of the state-space (phenotypic) dynamics, and for selective blockage of the effect of nongenetic perturbations. The second feature is component redundancy, that is, the use of components that are essentially functionally equivalent. Redundancy is a form of compartmentalization, since similar components are grouped by virtue of their interaction with other groups of components. The difference is that the interactions among the components of an equivalence class of components are less important than interactions with components belonging to some other
MOLECULAR COMPUTING
26 1
equivalence class. As with compartmentalization proper, the effect is to prevent the ramification of a perturbation from one class to another. As a consequence, the chance of stability increases as the number of components and interconnections increases. This does not contradict May’s theorem, however, since the connections are clearly nonrandom. Redundancy is, of course, a common reliability-conferring feature in communication and computing systems (Shannon, 1948; Winograd and Cowan, 1963; Dal Cin, 1979).In systems with self-organizing dynamics it can serve to buffer the effect of mutation on the phenotypic dynamics, by distributing the effect over a larger number of elements (Conrad, 1979b). This is just the kind of increase in gradual transformability of function that corresponds to extradimensional pathways on the adaptive surface. Transformability is an extension of the information theory concept of redundancybased reliability to the structurally nonprogrammable domain. The third feature is multiplicity of weak interactions. This follows directly from May’s theorem, together with our remarks about redundancy and our earlier model of efficiency and evolutionary flexibility. Recall that evolutionary flexibility is a maximum when half of the interactions are turned on ( a = n 2 / 2 ) .This means that on the average P. = 1/2, yielding ~ ( 4 2 ) ’ c ’ ~1 as a stability criterion. The number of components, n, should be high for high computational power (number of particles times the efficiency).The number of components is further multiplied (over and above what is necessary on purely mechanistic grounds) by the requirement for redundancy. The only way of maintaining a good likelihood of stability is to make the interaction strength, s, small. Multiple weak interactions among large numbers of components is thus the characteristic feature of the the physical-dynamical regime that maximizes evolvability.2 3.5
Relevance to Protein Engineering
The protein enzyme serves to illustrate these organizational principles, but they are general to biological architectures. Figure 3 schematically illustrates a balls-and-springs analog to a folded protein. Strong (covalent) bonds are indicated by solid springs and weak bonds responsible for folding by dashed springs. The triangular shapes represent the active site, a shape feature crucial
* Holland (1975) has argued that genetic algorithms works because systems with good building blocks can generally be recombined with advantage. This confers what he calls the feature of intrinsic parallelism (meaning that exemplars with a good building block can represent the whole class of systems with that building block). Our argument would suggest that this property is not robust for information processing systems, but that it can be approximated in high-dimensionality regimes.
262
MICHAEL CONRAD
ACTIVE SITE
MUTATION
’
ACTIVE SITE
ACTIVE SITE
FOLDING 1
+
MUTATION
FIG. 3. Balls-and-springs model of enzyme organization. Different-sized balls represent different types of amino acids, solid springs represent covalent bonds, and dashed springs represent weak interactions responsible for folding. Triangles represent amino acids at the active site. The addition of more balls and springs buffers the effect of a mutation on features critical for function (the active site in this instance). The increase in the number of amino acids corresponds to an increase in dimensionality in the adaptive landscape picture (Fig. 2) and hence to a greater probability of extradimensional bypass.
MOLECULAR COMPUTING
263
for function. The addition of more balls and springs serves to buffer the effect of mutation (alteration in number or type of ball) on such critical shape features,just like the addition of more springs to a box spring mattress buffers the effect of disenabling any one spring on the crucial features of its shape. Allowing for redundancy of amino acid types (balls of intermediate size in this case) provides another means of buffering the effect of mutation on critical shape features. As indicated above, it is possible to use information theory to formalize the mutation-buffering picture. Let H(G, PI, PI') represent the entropy of the whole protein system, including the genotype ( C ) , the functionally crucial features of the phenotype (denoted by PI) and the mechanistically superfluous features (denoted by P"). This is a conceptual compartmentalization of the phenotype and not necessarily a physical one. The entropy here refers to the classic Shannon measure, H ( f , , . . .,f,) = -xfi log fi, where the fi represent frequencies. In the present case the entropies are defined over the frequency of different proteins, the frequency of different genotypes (amino acid sequences) and the frequencies of different phenotypes (defined by structure and rate constants). Given H(G), the diversity of amino acid sequences created by mutation, it is necessary to make sure that H ( P 1 )can be made as small as necessary to ensure stepwise traversibility of the adaptive landscape. First note that the joint entropy H ( G , PI, PI') may be expanded to yield H ( P ' ) + H(P" I PI) + H ( G I PI, PI1)= H ( G ) + H(P', PI' I C ) . The primary structure of a protein folds to a unique tertiary structure, as determined by x-ray crystallography. A number of different allosteric forms (conformations) may occur, and in reality each such form represents a statistical ensemble of thermal configurations. Nevertheless, it is possible to set up a definite lexicon associatingeach such collection of conformations and configurations with a definite primary structure. As a consequence the term H ( G I PI, PI1)may be set to zero. The configurations can be ignored since they contribute equally to H ( P 1 ) H(P" , PI), and H(P', PI' I G ) .The latter term then represents the allosteric conformations, that is, the change of protein shape in response to the environment. If we had included the environment in our description, this term would be determined in a definite way; in any event we can ignore it for the present purposes, since it represents a higher level of phenotypic adaptability distinct from evolutionary genetic adaptability. Thus what remains is
I
I
H ( P ' ) + H(P" PI) = H(G). The statistical spread of crucial features, H(P'), must be small if the protein is to have the gradualism property. This is not a sufficient condition, but it is a necessary one. If H ( P ' ) were large, there would certainly be a great range of
264
MICHAEL CONRAD
phenotypic characteristics. Suppose that the protein becomes isolated atop some adaptive peak. This means that H ( P ' ) is not small enough to support a stepwise evolution to another peak. In the absence of the buffering term, H(P" I PI), the changes of protein shape and function in response to genetic variation, represented by H ( G ) , would be a given fact. If the protein had reached a stage in its evolution that admitted no single acceptable mutation, this would be an unalterable fact. It would be impossible to rescale the degree of function change of the phenotype in response to genotypic mutation. The presence of the buffering term allows such rescaling. If H ( P ' ) is not sufficiently small to allow stepwise evolution to occur, it can always be made smaller by making H(P" P') larger. The analysis may be extended to more complicated biological architectures. A number of the simplifying features of the above discussion, such as the definite dependence of protein shape on amino acid sequence, are no longer applicable, and a number of forms of buffering that do not occur at the protein level become possible (Conrad, 1983). For the purposes of the present review, however, the protein example is sufficient to be illustrative of the buffering idea, and of particular importance because of the relevance of protein engineering to molecular computer design. The question may arise as to'how evolvability would evolve in nature. Natural selection acts mainly on individuals. High evolvability is not an advantage to the individual, and in some instances may be a slight disadvantage (because of the concomitant redundancy). However, it may be shown that evolvability-conferring redundancies can always hitchhike along with the advantageous traits whose appearance they facilitate (Conrad, 1979a). We can summarize by recalling the key questions raised at the beginning of Section 3.4: What is the high-evolvability form of organization, is it feasible to simulate this on digital machines, and how can it be designed into or out of organic materials? We have seen that high dimensionality (redundancy of components, multiplicity of weak interactions), along with a compartmental structure that buffers the effect of mutation on functionally critical features, is the key to high evolvability. The accumulation of these features by hitchhiking selection in the course of natural evolution corresponds to the appearance of extradimensional bypasses in the adaptive landscape picture. Redundancy cannot confer such transformability on structurally programmable systems, because the genetic description is the program that generates the state-space dynamics in this case, and as a consequence there is no way of making the state-space dynamics only slightly unstable to genetic perturbation. In biology-type organizations, by contrast, the genetic description is more accurately viewed as a parameter that can be used to tune the self-organizing dynamics. No simple coding, or choice of language constructs, can convert a structurally programmable system into a system with high
I
MOLECULAR COMPUTING
265
evolutionary adaptability. It is necessary to simulate the high-dimensionality form of organization. The degree of evolvability introduced will depend on the multiplicity of redundant components and weak interactions among them in the virtual machine. Looking in the opposite direction, the gene/protein engineer who attempts to produce the most functionally efficient biotechnological structures will find that the efficiency with which the system can be tuned through a trial-and-error process is greatly diminished. The counterintuitive caveat (from the perspective of conventional engineering) is that the evolution of both natural and artificial information processing systems requires the introduction of structural features that are inefficient from the purely functional point of view.
3.6
Quantum Molecular Computing
So far the analysis has been completely classical so far as physics is concerned. We acknowledged the existence of up to n2 pairwise interactions (including self-interactions) that could contribute to computing. We have not, however, considered the quantum nature of matter. This inevitably becomes important if microphysical processes, such as the behavior of electrons, photons, phonons, and various quasi-particles, become pertinent to the behavior of the whole system. Solid-state switches, as we have emphasized early in Section 2, are properly described by quantum mechanical equations. But to make a structurally programmable system, the engineer ensures that the quantum dynamics is invisible at the top level of overall system behavior. This need not be the case in structurally nonprogrammable systems, and is unlikely to be the case if individual macromolecules play an important role. It is hard to see how the opportunistic process of evolution could fail to exploit this fact in the case of naturally occurring biomolecular systems. We can formally examine this issue by measuring the computational power of the system in terms of the number of computational resources required to simulate it (Conrad and Rosenthal, 1980).If the simulating machine (called the reference machine) is a conventional digital computer model (such as a Turing machine or a random access machine) we can identify resources with the number of time steps or processors. Thus we can, in principle, assign a nonprogrammable system (for example, a continuous dynamical system) a digital computational equivalent in terms of the minimum number of time steps that a digital machine would require to duplicate its capabilities to an adequate degree of approximation. The “in principle” qualification is important, since it might be that we do not know how to simulate the dynamical system, or that we cannot verify that the number of time steps is a minimum. Specifying what constitutes an adequate approximation is not a serious problem for well behaved dynamical systems. Any reasonable
266
MICHAEL CONRAD
definition will do. But if the system to be simulated has continuous dynamics that is highly initial-condition sensitive (chaotic dynamics), this becomes a significant difficulty. A digital computer is a finite system, and therefore in principle must eventually cycle. A chaotic dynamical system, however, in principle has aperiodic behavior. If such systems really exist in nature, approximation will certainly break down after a definite point in the time development of the chaotic system. For the present purposes we need not dwell on these pathological situations. As noted earlier, if we adhere to the strong form of the Turing-Church thesis (cf. Hofstadter, 1979), we admit as a working assumption the principle of universal simulatability. Turing machines are usually viewed as having potentially infinite tapes, while all real computers are painfully finite. If we use such infinite machines as models, there is no reason at this time to reject the possibility of assigning a digital equivalent to any dynamical process that is realizable in nature. It is at least an open question. It is clear that this will be infeasible if we use truly finite structurally programmable models, since these can always be outpaced by a big enough piece of nature. Our concept of assigning a digital equivalent to physical-dynamical systems through simulation would still provide a means of assigning a lower bound to the computing power of these systems, and is entirely adequate for a qualitative comparison between the effective computational capabilities of classical and quantum systems. A task that is very easy for some physical-dynamical systems (as measured in terms of the number of particles required) might be extremely difficult for a digital computer (as measured in terms of time steps) and conversely. The physical dynamical system, say an enzyme, might use its up to n2 physical interactions in an extremely clever way.3 If we took the enzyme as the reference model, an appropriately specified task would appear very simple; the enzyme would be using its physical structure and interactions in a natural way. This is a major rationale for molecular computing. The question is, do we get extra equivalent digital computing power by virtue of the fact that the electrons in the enzyme require a quantum mechanical description, over and above what can be accounted for by the n2 interactions that enter into the classical description? Let us step back for a moment, to review the distinction between the classical and nonclassical description (cf. Bohm, 1951). The usual procedure for setting up a quantum mechanical problem is to write down the classical The terminology here is deliberately reminiscent of the “clever physics” idea of the direct perceptionist school of psychology (Gibson, 1966), though molecular computing models are clearly formulated at a very different physical scale than direct perceptionist models such as optic flow (which is more of a mathematical than a physical embodiment).
MOLECULAR COMPUTING
267
equation of motion. In a multiparticle system the forces among the particles must be specified. This determines the classical potential. The up to n2 interactions that could be operative in a system of n particles are thus expressed in the potential. At this point the classical (picturable) description is transformed to a description that is not picturable in a classical sense. The key feature here is the superposition principle. The quantum system exists in a superposition of its possible states. We can use either the Heisenberg (matrix) or Schrodinger (differential equation) representation to capture this feature (cf. Dirac, 1958). In the former case the question is how much extra digital computation would be required to simulate the behavior of the system if we replace variables by time-dependent operators (or arrays) that represent measurements on a previously specified state. In the latter case the variables are replaced by time-independent operators that act on a state vector (or wave function) that develops in time. The Schrodinger representation is more convenient for the present purposes, since it allows us to cast the comparison between quantum and classical systems in terms of the difficulty of computing the “ghost interactions” that must be added to the number of interactions in the classical potential. The ghost interactions are interactions among the possible locations of the particles (if we insist on a particle point of view). The Hamilton-Jacobi version of classical mechanics provides the most effective machinery for making the comparison. This is a pre-quantum-theory wave formulation of classical mechanics that is now understood to be the short-wavelength limit of the Schrodinger equation (Goldstein, 1950; Fong, 1962; Bohm, 1981). It is possible to view a quantum system as differing from a classical analog by virtue of the addition of a (fictitious) quantum potential
u = - - Eh2 -
V:R 2miZ1 R
where R2 = Y Y * , Y is the wave function, m is particle mass, and n is the number of particles. Since the quantum potential depends on the wave function, the possible locations of the particles “interact” with one another in the sense of interference. This is true even for a single particle (n = l), and in a many-particle system is over and above the up to n2 pairwise interactions that could contribute to the classical potential. The extra time steps required by a sequential digital computer to simulate the behavior of a system governed by a quantum potential, as compared to the number of time steps required to simulate the behavior of a classical analog governed solely by a classical potential, provide in principle a precise specification of the “computational parallelism” inherent in the quantum mechanical wave function (since V, represents the wave aspect of the system). The implication is that it should be possible for a system controlled by microphysical dynamics to keep pace with problems that grow even faster than
268
MICHAEL CONRAD
nz in computational complexity (Conrad, 1988b). There is, however, no general way of saying how much better a quantum mechanical system can do. How many extra time steps would be required to simulate it depends sensitively on the details of the wave function and could in some cases overwhelm the number of resources required to keep up with the classical potential (our earlier remarks on the adequacy of approximation are relevant here). We will see, in the discussion of architecture in the next section, that the practical problem of harnessing the parallelism inherent in the superposition principle is vitally dependent on a macroscopic-microscopic communication link. Here, however, we can use the classic example of barrier penetration to explicate the idea of quantum parallelism in a more concrete manner. Picture a potential surface with many valleys, one of which corresponds to the desired solution of a problem. It might be a pattern recognition problem that requires for its solution the matching of a physiochemical milieu to an appropriate template in an ensemble of molecular templates. In the absence of external perturbation (e.g., thermal agitation), a classical system could never make the transition from an incorrect potential well to the correct one. A microscopic system, such as an electron, would inevitably find the proper well by virtue of barrier penetration. But as indicated above, to actually exploit this for problem solving, it is necessary to embed the microsystem in a macroscopic architecture. Macromolecules are an intermediate size (mesoscopic) architecture that are too large to undergo barrier penetration per se, but their pattern matching motions may in large part be controlled by the electronic wave function. A yet more macroscopic level is required to transduce macroscopic signal patterns into the physiochemical milieu and to read out the actions of the enzyme into a macroscopic record. The vast computational requirements of quantum chemistry provide some indication of the computing power potentially available at the quantum molecular level. As is well known, the ab initio calculation of the properties and behavior of even small molecules rapidly overwhelms even supercomputer capabilities. To treat large molecules, such as proteins, it is necessary to introduce highly simplified approximations (ignoring, for example, the motions of the atomic nuclei) or to use semi-empirical methods. The usual point of view is that this is a computational science issue, connected with an applications agenda, and not a computer science issue, connected with mechanisms and modes of computing. But this distinction loses its validity in the field of molecular computing, where the macromolecules that are so difficult to compute from a digital point of view can become primitives of new forms of computation. The variation-selection (or other learning) process that molds these primitives might only harness limited aspects of their dynamics for functional performance. But even limited coupling would allow a
MOLECULAR COMPUTING
269
(parallelizable) task that is difficult for a digital computer in terms of the number of particles and time required to be much easier for a classical dynamic system with an appropriate potential function, and even easier for a system with an appropriate quantum p ~ t e n t i a l . ~ 4.
The Macro-Micro (M-m) Scheme of Molecular Computing
The purpose of this section is to outline a scheme (or architecture) for molecular computing that operates in the high-efficiency, high-adaptability domain of computing, but that is sufficiently general to encompass all of today’s architectural directions as special cases. The scheme will be called the M-m architecture, since it involves a macroscopic-microscopic interface. It is based on information processing mechanisms known to occur in biological cells and organisms, including nerve cells and the brain. Neuromolecular architectures are a particularly natural instantiation. However, by eliminating features of the architecture (for example, the microscopic-macroscopic communication link) and introducing specializing constraints it will be possible to describe architectures that operate in the structurally programmable domain, including conventional serial machines as the most special of special cases. We defer classifying the general modes of computing possible in the M-m scheme, and the physical and dynamical mechanisms that could support them, to the following section. The discussion of special-case architectures will be further deferred, to follow the review of enabling technologies. 4.1
The M-rn Architecture
The overall scheme is illustrated in Fig. 4 (cf. Conrad, 1984, 1985). The feature to note is the flow of information from macroscopic to increasingly microscopic representations, and then back to macroscopic form. Downward transduction and upward amplifying processes serve to link the macroscopic Several authors (4.Section 2.2) have given elegant discussions of the quantum mechanics of computing that are directed to using the formalism of quantum theory to describe a conventional digital computing process (constructing a quantum version of a Turing machine, for example). Such constructions have been used, in particular, to address the issue of reversibility of computing. No in-principle increase in computing power emerges from such analyses, nor could it, since conventional computing processes are inherently classical. The process must be embedded in constraints that mask the quantum properties of matter so far as the overall symbol manipulation behavior of the machine is concerned. The vast computational power inherent in the wave function is missed in such analyses, and would be missed in any discussion restricted to structurally programmable models of computing.
270
MICHAEL CONRAD
External World
t
sensory input
motor output
cellular
macroscopic signal
f
network
________
pr ocessing +
I
macroscopic signal
(first messenger)
I
c
/r
t
/
/
4
c
/
mesoscopic signal (second messenger)
\
/
'intraceliular processing
I
mesoscopic signal (second messenger)
microscopic signal
(molecular pattern processing)
FIG.4. M-m architecture. Note the flow of information from macroscopic to mesoscopic to microscopic form. Information processing can occur at any level or in the transduction from level to level. Only some of the possible pathways of communication are indicated.
MOLECULAR COMPUTING
271
to the microscopic. The term mesoscopic refers to objects and processes in the zone between the macroscopic and microscopic, where the object is too small to behave like familiar macroscopic objects and too large to behave like quantum particles. The symbolism (M + m ) and ( m + M ) may be used to refer to the macroscopic-to-microscopic and microscopic-to-macroscopic links respectively. If we want to distinguish between microscopic and mesoscopic we can write (M -,m + m) or (m + rn + M), where m refers to the mesoscopic layer of processing proper. Information may be processed at any level, though the arguments of the previous section suggest that mesoscopic and microscopic layers make the greatest contribution in nature, and potentially the greatest contribution in technological systems. If the mesoscopic and microscopic layers are turned off (or made irrelevant) we have a completely macroscopic network (an M-M system). If such macroscopic components are constrained to perform according to users manual specifications, we have a structurally programmable machine; if they are constrained to operate serially, in accordance with a program counter, we have a conventional digital machine (von Neumann architecture). Connections among processors at different levels may be denoted by expressions such as(M + rn -,M) + ( M -+ rn + m + rn + M) + (M + rn + M). This particular expression means that we are dealing with a multiprocessor system comprising three cellular processors. The first is a preprocessor (possibly a sensor), the second a central processor, and the third a postprocessor (possibly an effector). Only the central processor in this example makes significant use of the microscopic layer of processing. Processors may be , they may be conmutually connected by communication channels (c))or nected into highly complex networks, which would involve arrows connecting large numbers of elements at each level. Here a linear chain symbolism would be inadequate. But nevertheless the symbolic expression (M + ((rn)' c, (rn)') + M)k may help to convey the intricacy of information flow structures that are possible. It represents a network of k arbitrarily interconnected cellular processors, with each of the processors having potential links to i mesoscopic and j microscopic processors. Once the macroscopic representation of information is transduced to a mesoscopic representation, it can be communicated among the purely mesoscopic processors, or transmitted to microscopic processors, which in turn may interact directly with each other through microphysical signals, or may interact with each other through mesoscopic intermediaries. At some point the cellular processor generates a macroscopic signal, and at some point the effector processors of the system perform macroscopic acts. It must not be thought, at least in natural systems, that the cellular processors are anything like conventional computers. Each processor is best
272
MICHAEL CONRAD
viewed as a pattern recognizer-effector element, or as such an element with a memory capability. In the simplest (technological) case the processor is an elementary switch that recognizes simple signal patterns (“and” gates, “or” gates, and so forth), or that switches its state in response to a simple signal pattern. If these simple switches are macroscopic, the network could be structured to form a conventional macroscopic computer. Conventional computers can thus be viewed as built up by establishing suitable connections among simple, well-defined pattern processors and memory elements. Parallel and connectionist neural computers allow more of the elements to be active at any given time. Since networks with general (universal) powers of computation can be built up out of simple pattern processors, it is certainly possible to build them up out of processors whose pattern processing capabilities are enhanced by internal molecular mechanisms (for an explicit constructive demonstration, see Conrad, 1974b). Instead of forming complex interconnections among the simple processors to achieve a desired inputoutput function, a single processor is molded to perform this function through evolution. The enhanced pattern recognition capability arises from the unique physical-dynamical properties that occur at meso- and microscopic levels of molecular organization. The protein enzyme again serves as a useful (but not exhaustive) example. The macroscopic pattern impinging on the external boundary (or membrane) of the processor is transduced to a physiochemical representation (or pattern of activity) that is suitable for being recognized by enzymes, which then serve to trigger an appropriate macroscopic output signal (via a sequence of mesoscopic steps). This may be called the tactilization model, since the transduction process in the tactilizing processor serves to convert the input signal pattern (which is symbolic in character) into a form that is “touchable” by enzymes (Conrad, 1986, 1987a). If the conventional switches are mesoscopic or microscopic, we would have a conventional computer built out of nonmacroscopic components, but with no extra computational capabilities aside from those afforded by smaller size or enhanced speed. In this special (nonbiological) case the cellular processors could, on paper, be replicas of digital computers constituted from macroscopic components. Whether such mesoscopic or microscopic versions of macroscopic machines could be realized without any intervening macroscopic communication links is an open question. We will return to it, and the difficulties involved, after the discussion of enabling technologies. We have used the terms macroscopic, mesoscopic, and microscopic in an intuitive sense, without defining them. Signals will be called macroscopic if the energy associated with their generation, transmission, or processing exceeds thermal noise (kT).Signals or processes will be called microscopic (or microphysical) if they are not classically picturable. As indicated above, mesoscopic refers to the domain which for the most part is picturable in the
MOLECULAR COMPUTING
273
classical sense, but which nevertheless does not conform to the behavior of classical objects. The distinctions between these three scales of phenomena are somewhat of an abstraction, since they can interpenetrate in a rather intricate fashion (as opposed to simply shading into one another). Let us walk through Fig. 4 in somewhat greater detail to appreciate how these terms are actually being used. The signals from the external world represent macroscopic features of the environment. They may be in the form of photons, phonons, chemicals, or mechanical disturbances. But in each case the energies involved must exceed kT. When these environmental signals impinge on the preprocessors (or sensory elements) of a multicellular system they must be represented (or coded into) intercellular signals. The term first messenger (or first message) refers to these intercellular signals that cellular processors use to communicate with one another. These are also macroscopic, the energy required to generate them exceeding kT, in general by a significant margin. When the first messenger (or the original environmental signal) impinges on the boundary (or membrane) of a cellular processor it may be transduced into an intracellular signal, called a second messenger. This is again a macroscopic signal, though in general involving substantially less energy dissipation than the first messenger. The term “mesoscopic” is nevertheless used in Fig. 4 to refer to the second messenger due to its role as an intermediary between the macrophysical and microphysical strata of processing. The second message may trigger bona fide molecular processors, such as enzymatic pattern recognizers. The molecular processors are placed at the microscopic layer of processing in the diagram. Strictly speaking, however, they are associated with both mesophysical and microphysical phenomena. Objects of macromolecular size are too small to behave in the manner of familiar macroscopic objects, but they are also too large to behave like quantum mechanical objects. The shape of a macromolecule (its nuclear configuration or, so to speak, skeleton) is best viewed as belonging to the mesoscopic scale of phenomena, whereas the electrons, photons, protons, and quasi-particle processes that control its dynamics and interactions are genuinely microphysical. This distinction is not clearcut, but it serves to emphasize how unpicturable quantum dynamics can mesh with shape-based mechanisms of macromolecular specificity to control the picturable computational behavior of a macroscopic architecture. As the system becomes more microscopic it becomes potentially more “intelligent,” so far as its capability of exploring for a particular absorbing minimum on a multiwell potential surface is concerned. But this intelligence cannot be tapped unless it is embedded in the constraints of a macroscopic architecture. The macroscopic and mesoscopic constraints serve to set the initial state and boundary conditions of the quantum dynamics, and possibly also the potential surface, in a manner that reflects the input pattern
274
MICHAEL CONRAD
emanating from the external environment. The prepared wave function proceeds to explore the potential surface, until is discovers a well which, through evolution or learning, has been established as the one that triggers a sequence of irreversible amplifying events that lead to output actions appropriate to the input. As a concrete example, suppose that different combinations of input lines lead to different self-assembled molecular aggregates in the cell and that the output depends on which member of a repertoire of preexisting templates can complex with the aggregate. All these self-assembly processes can be represented as potential-well problems. The important point is that the proper correlation (or adaptor) mechanisms must be built into the system in advance, by choosing amino acid sequences that fold into suitable macromolecular structures. The hypothesis is thus that in nature the evolutionary process molds the vast search power of the unpicturable microworld into a form suitable for controlling the definite macroworld actions of organisms by tuning the mesoworld control knobs associated with genes and macromolecules.
4.2
Biological Cells as M-m Architectures
The architectural scheme outlined above is clearly modeled after biological cells and organisms, though in a rather general way, since it is formulated so as to admit nonbiological directions of specialization. In this section I would like to compare the architecture to some facts about neurons. The term second messenger derives from the fundamental biochemical discovery by Sutherland in the 1950s that peptide hormones generally do not act on body cells directly, but rather act on receptors on the cell membrane to trigger the intracellular production of the cyclic nucleotide molecules (Robison et al., 1971). Despite the large number of possible hormones, the number of second-messenger species is extremely small. The best known second messenger is CAMP(cyclic adenine monophosphate), produced from the ubiquitous energy carrying molecule, ATP. Another is cGMP (cyclic guanine monophosphate), produced from the common intracellular molecule, GTP. These cyclic nucleotide messengers act on target proteins inside the cell, which in turn activate effector proteins. The effector proteins might, for example, cause particular genes to be activated or deactivated. The second messenger thus provides a link between chemical signals impinging on the external membrane of the cell and internal molecular mechanisms. Second messengers are not necessary for such a link. Many chemical substances can enter the cell directly to act on target molecules. The steroid hormones are an example. But the existence of a second-messenger mechanism is definite proof for the existence of such links; it would make no sense to carry the mechanism and not use it.
MOLECULAR COMPUTING
275
Nerve cells, like other cells in the body, possess a second-messenger system. But it was generally believed that this system was pertinent only to the control of genetic expression (differentiation) and metabolism. The general view was that the nerve impulse was independent of molecular activities inside the neuron, and its dependence on the metabolic activities of the neuron was nonspecific. In the mid 1970s, however, a number of investigators demonstrated that the cyclic nucleotides cAMP and cGMP control nerve impulse activity in at least some central nervous system neurons (Liberman et al., 1975; Triestman and Levitan, 1976; Greengard, 1978). The most direct demonstration was by the Liberman group. Cyclic AMP and cyclic G M P were microinjected directly into large central nervous system (CNS) neurons of a snail. Cyclic AMP caused a rapid depolarization, whereas cGMP usually caused slow hyperpolarization or transient depolarization followed by slow hyperpolarization (Liberman et al., 1982). The effect is highly specific, since it disappears when chemically modified versions of cAMP and cGMP are used. Different investigators, using different biological preparations or different techniques, have obtained somewhat different results, in some cases suggesting longer-term effects. However, there could be little doubt that the secondmessenger molecules could control nerve impulse activity, and that nerve impulses (or more precisely, neurotransmitters) impinging on the postsynaptic membrane could stimulate the endogenous production of CAMP. Let us briefly look at the biochemistry of the cyclic nucleotide system in the neuron context. The main reactions are illustrated in Fig. 5. A nerve impulse traveling along the axon of the presynaptic neuron causes it to release a chemical transmitter which binds to a receptor on the postsynaptic membrane. The receptor could also bind a hormonal mediator. This recognition step causes the receptor to activate an enzyme with which it is in contact, in this case adenylate cyclase. The cyclase enzyme catalyzes the production of the second messenger cAMP from ATP. The cAMP molecules travel, either by diffusion or as a result of hydrodynamic flow, to a kinase protein. This is a large family (perhaps numbering in the thousands) of target proteins that recognize specific messengers (e.g., cAMP or cGMP) and that specifically recognize and activate (by phosphorylation) effector proteins. The eflector protein in the reaction scheme illustrated is an ion channel (or gating) protein that controls the cross-membrane flow of one of the ionic species responsible for the nerve impulse (such as sodium or potassium). The reverse kinase (the enzyme phosphoproteinphosphatase) serves to deactivate the gating protein, therefore to terminate the firing of the neuron. Calcium, released from various stores subsequent to the nerve impulse, activates an enzyme (phosphodiesterase) that inactivates cAMP by converting it to the inactive mononucleotide. This also serves to prevent continued firing in response to the initial input, and more generally to reset the initial state of the neuron.
trans mit te r/med iat o r membrane receptor adenylate cyclase second messenger
phosphodiesterase
4
(CAMP) I I
+
inactivated second messenger
++
diffusion I
I reverse kinase nerve impulse FIG.5. Second-messengerreactionscheme. Input signals are transduced to the second messenger, CAMP,which then triggers molecular and microphysicalprocesses. This culminatesin nerve impulse activity in some central nervous system neurons.T h e CAMPreaction schemeexemplifiesthe type of biochemicalmechanismsthat occur in an M-m architecture (see Fig. 4).
MOLECULAR COMPUTING
277
Many details have been omitted. The interaction between the transmitterreceptor complex and adenylate cyclase is now known to be mediated by regulatory proteins (in particular G proteins) that allow for some integration of input signals near the locus of input. Sequencing studies of channel proteins have revealed them to be very diverse, suggesting that these proteins can distribute themselves on the membrane in a controlled manner and that the cAMP messenger can target them in a way that is selectively dependent on which kinases are present. We have ignored other messengers, in particular cGMP which has a similar biochemistry except for the fact that the enzyme that produces it is distributed mainly within the cell rather than on the membrane. How it is activated is not entirely clear, but it is plausible that it can be activated by internal metabolic activities of the neuron. The kinases can also activate proteins associated with the cytoskeleton, a network of microtubules and microfilaments that underlies the membrane and extend throughout the cell. Recent microinjection experiments (Liberman et al., 1985) indicate a response to the injection of cAMP that is too fast to be mediated by diffusion, which is similar in effect to mechanical disturbance to the membrane and cytoskeleton, but which is nevertheless specific to naturally occurring cAMP (cyclic 5',3'-AMP) and not to other substances (including analogs). This suggests that the cytoskeleton could act as a medium for fast signals, possibly of a mechanical nature, and that the cAMP messenger can either trigger these signals or alter the structure and hence the signal processing capabilities of the cyto~keleton.~ In receptor cells in the eye the photons that provide input act through the same set of reactions, but with the key difference that they trigger an increase in the production of the deactivating enzyme phosphodiesterase rather than an increase in the production of CAMP. Finally, we can note that the endogenous production of cAMP can cause the neuron to have rhythmic firing behavior even in the absence of any external input (cf. Liberman et al., 1982), or to respond differently to a given input pattern at different points in time (hysteresis effect). We will not proceed further with these qualifications and refinements here. A full and critical treatment of second messenger systems would constitute a review in its own right (see Drummond, 1983).This is a developing area (as are all aspects of molecular computing and biomolecular information processing), Hameroff (1987) has presented an extensive review of the evidence for cytoskeletal information processing. Matsumoto et al. (1989) have shown through combined electrophysiological and fast-freeze electron microscopic studies that a component of the cytoskeleton undergoes conformation changes in conjunction with nerve impulse activity in squid giant axons, pointing to a potential functional role for such changes in CNS neurons (which cannot be studied in such a direct manner at this time). Koruga (1974) has suggested that the structure of the microtubular component of the cytoskeleton could serve as a coding mechanism in cellular information processing.
278
MICHAEL CONRAD
and it may be expected that experimental work will continue to fill out a more comprehensive picture, and at the same time raise new questions. At this point, however, it possible to discern a general principle of information processing which can be embedded in the known biochemistry of the cyclic nucleotides, and which could as well be embedded in the cytoskeleton or in other physiochemical systems in the cell. We will call this the principle of double dynamics (Conrad, 1985). The basic idea is that dynamic processes in the cell can be classified in terms of levels on a macroscopic-to-microscopic scale. The dynamics at each level can perform a pattern processing function, with the lower-level dynamics serving to interpret the higher-level dynamics. Figure 6 provides a schematic illustration, geared to cyclic-nucleotide-type dynamics. The input signals trigger a physiochemical activity pattern, in the present case a cAMP diffusion pattern. By virtue of being transduced into the diffusing cAMP messenger molecules, the input signals are combined (or integrated) in space and time. The diffusion (or hydrodynamic flow) process can be thought of as a continuous dynamic analog of a syntactic recognition process that might be embedded in a parallel manner in a systolic computer. The concentration at different locations on the membrane reflects different input signal patterns. But it is necessary to have a readout mechanism that leads to an action in order actually to utilize the information represented in the concentrations. In the case of the cyclic nucleotide system the readout is performed by the kinase enzymes, which (through the channel proteins) control the action of the cell. The dynamics of the kinase enzyme is the lowerlevel dynamics in this instance. The enzyme recognizes individual cAMP molecules, probably in a concentration-dependent way (through the probability of an allosteric interaction with other cAMP molecules). The actual action triggered by the enzyme (the firing of the neuron) represents the decision that the system makes about the input in this case. But strictly speaking it is still not enough to provide a true interpretation (or semantics). transduction dynamics
output
FIG.6. Tactilization scheme. Reaction-diffusionprocesses can serve to integrate input signal patterns in space and time. The diffusingsubstance could be cAMP in a biological cell (see Fig. 5).
MOLECULAR COMPUTING
279
The action has a functional significance,ultimately in terms of the evolution of the system, and it is in the context of this broader circle that the action takes on “meaning” (however impossible it would be to specify this precisely). The double dynamics principle addresses the perplexing question, why are there so few internal messengers? The usual, and probably correct, answer in many instances, is that the receptors on the cell membrane determine which hormonal messages a cell responds to, and how it responds depends on what kinase-effector protein complexes occur inside the cell. In short, the cell is an interpreting system, and it need only respond to a small number of the very large number of first messages in its environment. This answer is not adequate for nerve cells though, at least apropros input signals that are processed in a rapid fashion. It would make no sense to process these through the secondmessenger mechanism if the cell were not responding to the spatiotemporal pattern of incoming pulses. The double dynamics mechanism allows this, and provides cells with a means of responding to distributed aspects of their chemical milieu generally. The vast amount of equivalent digital computation performed by intracellular dynamic mechanisms would for the most part be lost if the neuron yielded only a yes-no output. But the standard style of communication among neurons is through frequency coded messages. The cyclic nucleotide system provides many handles for controlling this, since the pattern of firing is influenced by the reverse kinases, the number and types of ion channels, the enzyme that breaks down CAMP, as well as by the endogenous production of CAMP. Cells embodying the double dynamics principle will have good generalization capabilities as long as the upper-level dynamics is well behaved (not initial condition sensitive). If the system is trained through evolution to respond in a particular way to a given pattern, it will then respond in the same way to a closely related pattern. If the lower-level dynamics is sensitive to slight variations in the upper-level dynamics (e.g., slight variations in the local CAMPconcentrations) the output firing pattern can be used to reflect slight differences in the input pattern. But such generalization of input patterns or gradual transformation of output patterns in not an intrinsic feature of the double dynamics principle. Chaotic (initial condition sensitive) transduction dynamics would allow the cell to be extremely sensitive to highly specific features of the input pattern. The principle may be extended to multiple dynamics, interleaved both in series and vertically in a (macroscopic-microscopic) hierarchy. Thus regulatory proteins (G proteins) provide a first-stage double dynamics process, followed in serial fashion by the diffusion and enzymatic readout mechaism sketched above. The cytoskeleton, if it is used to integrate signals in space and time, would provide an example of vertical interleaving (since the signals would be more microscopic than chemical messengers). The nerve
280
MICHAEL CONRAD
impulse is yet another level of dynamics, but in our scheme it is at a higher level (because it is more macroscopic in terms of energy dissipation). Nevertheless, the geometry of dendritic arborizations and nerve cell bodies allows subthreshold potentials to be summed both spatially and temporally. The amount of pattern processing occuring in any one of these dynamic stages may not be dramatic. It is the accumulation of serial and hierarchical interleavings within cells and between cells, with disambiguating classifications at each stage, that leads to capabilities that depart in such a dramatic way from what has been achieved with programmable machines. 4.3 The Brain as a Neuromolecular Computer Multicellular organisms are usefully viewed as (M-rn-m)" architectures consisting of n cells, each with significant mesoscopic and microscopic layers of processing, and with large numbers of macroscopic communication links (many hormonal). The processes of cellular differentiation and multicellular morphogenesis requires each cell to act as if it had an adequate representation of its position relative to the whole architecture at any point in time, and to take appropriate actions. The term positional information (Wolpert, 1969)has been used to denote this representation. Our view of cells as powerful pattern processors, each structured to interpret the pattern of influences impinging on its external membrane in an appropriate way, precisely suits the positional information requirement. The immune system is another good example. This has elements of cellular pattern recognition, memory, cooperation among different cell types that constitute a network level of processing, and (ontogenetic) evolutionary adaptation. Antibody producing cells with antibody receptors on the membrane have to decide what action to take given the number and distribution of bound antigen and given various hormonal influences acting on them. This includes proliferation, export of antigen into the blood, and conversion to memory cell status. Antibodies are tuned to recognize and bind antigen through a somatic mutation and selection process that in essential respects is analogous to the mechanism of phylogenetic evolution. The immune system exhibits all the elements of an evolutionary (M-m)" architecture. The brain is the biological information processing system that generally elicits the greatest attention from a computational point of view. Two divergent attitudes are prevalent. One is that the brain is a biocomputer that works on radically different principles than other biological information processing and control systems, such as the systems of development and immunity. The functionalist incarnation of this view (e.g., Boden, 1988) was deeply influenced by an artificial intelligence paradigm geared to conventional
MOLECULAR COMPUTING
281
serial computers.6 Mental processes, in this conception, are analogous to programs that can, to adopt an apt phrasing of Hofstadter (1979), be cleanly “lifted out” of their material embodiment. Recent connectionist models depart from this view in that they recognize the importance of internal structure, and in particular the importance of the structure being related in a natural way to its computational function (Hopfield, 1982). The predilection of connectionism, however, is to assume that the brain is constituted of simple cellular processors and that all the intelligence is in the connections. The brain is still viewed as a biologically atypical system that could be lifted out of its carbon incarnation and reincarnated in silicon. The second attitude, obviously the one espoused here, is that the brain operates on the same principles as less flashy organs, such as the liver or spleen, but that it is specialized for pattern processing, control, memory, and problemsolving processes that allow the organism to function effectively in the external environment.’ The neurons in the brain, in this view, constitute a repertoire of powerful special-purpose pattern processors. These are molded to specific function through phylogenetic evolution, and then tuned to more specific function through ontogenetic learning processes. The memory system of the brain, by controlling the mechanisms for facilitating connections among neurons, serves to orchestrate these specific pattern processors into spatiotemporal ensembles that yield coherent perceptual and motor function. The analogy to immunity is that the organism generates an initial repertoire of neuron types, corresponding to the initial repertoire of antibody-carrying cells that are further honed by an ontogenetic adaptation process. The analogy to development is connected to the positional information principle, but only in part because of the manifest need for sensitive control over the differentiation of neuron types. Each cell in the brain, whether neuron or glial cell, must produce a suitable output message in response to a vast barrage of chemical and electrochemical influences impinging on its external membrane. It is insufficient to picture the situation simply in terms of an extremely dense, but dry, wiring diagram. It is also a bath of hormones, mediators, transmitters, neuromodulators, direct electrochemical contacts between closely packed neurons, and intercellular exchanges of materials. Some groups of neurons mainly serve as chemical factories. Most neurons yield graded electrochemical responses, not spike potentials, and the vast majority are small cells (microneurons) whose anatomical processes are not even clearly differentiated into axons and dendrites. This was, of course, a radical departure from (and advance on) the behaviorist approach, which considered only observable input-output relations of the whole organism, and considered any internal structures or processes to be moot. In reality the role played by the brain in regulating the internal processes of the body isjust as important.
’
282
MICHAEL CONRAD
The information processing function of the brain, just as much as its initial morphogenesis, requires each cell, or at least a sufficient number of them, to interpret this barrage of inputs in an appropriate manner. The receptors on the external membrane provide the first line of selectivity, and it is probable that allosteric modifications of these receptors plays a critical role in memory.’ But internal processing is necessary to interpret the patterned information transmitted by the receptors. The picture is a generalization of our earlier remarks about structurally nonprogrammable systems working with an indefinite variety of primitives, in this case neurons with varied and selected pattern processing “personalities.” Any given network structure of sufficient connectivity could implement virtually any functional capability if the cellular processors in the network can be selected to have suitable responses to the inputs impinging on them. The brain in fact has rather specific regional specializations, but nevertheless the implication holds that a virtually indefinite variety of functional capabilities could come to reside in any given region. 4.4. Models and Simulations The M-m scheme is a general model. Our picture of the brain as a neuromolecularcomputer is also rather general. The general principles cannot implicate within themselves a specific model; if they could, this would have precluded the transformations of the brain that have occurred in the course of organic evolution. In this respect the methodologicalproblem is similar to that of modeling a machine given the principles of its operation and a specification of the function that it performs. This information may be sufficient to specify a class of machines, but it is never sufficient to specify a particular design. In this domain, theory and experiment must continually correct each other, in precise accordance with the classical scientific method. The great value of models is that they provide a definite, formal basis for proceeding. If the performative behavior of the model is suggestive of the natural system, the model is potentially on the right track. Any general features that it entails should have analogs in nature; if they do not, the investigator must try another model class. This top-down procedure, sometimes called the functional approach to biology (Rossler, 1974a),is quite different from the standard ways of setting up physical and chemical problems. But it is absolutely essential in the field of molecular computing, for here the chief issue is the connection between architectures and materials.
’
Most connectionist models of memory are based on the Hebbian concept of modifiable synapses (Hebb, 1949), and at least some evidence points to receptor modification as a biophysical basis for such modifications.
283
MOLECULAR COMPUTING
Figure 7 illustrates the particular brain model (or neurocomputer architecture) that the author's research group has used as a vehicle for studying (in a virtual manner) the high-efficiency, high-adaptability domain of computing and for exploring the role of various intraneuronal dynamic mechanisms (Conrad, 1976a). This is not the place to review these studies in detail, but a brief description of the model serves to lend concrete instantiation to the general picture sketched in the previous section. The architecture consists of two major subsystems. The first is the evolutionary selection circuits system (Conrad, 1974~).This is responsible for the evolutionary modification of the internal dynamics of neurons that control the input-output behavior of the organism. These are called primary neurons in the figure, but for reasons that will become clear in a moment they are probably more significantly identified with satellite neurons that influence the firing of large cells (called pyramidals) of the cerebral cortex. The selection circuits model assumes that each region of the cerebral cortex (and other regions of the brain) contains a number of networks of primary neurons and satellite primaries that are replicas of each other so far as essential aspects of cell structure and connectivity are concerned. Any particular input-output behavior of the organism is controlled by only one or a few of these replica
first messenger
c c c
second messenger
input
primary neurons ! '
- t
output
...
growth control selection circuits
molecular (microscopic) dynamics
firing frequency
FIG.7. Neurocomputer architecture. This is one of many possible models that instantiate the major features of the M-m architecture (Fig. 4). Evolutionary learning processes mold the molecular dynamics of neurons. The repertoire of neurons is orchestrated by a memory manipulation scheme. The architecture has been used as the basis for simulation studies of brain processes from a molecular computing point of view.
284
MICHAEL CONRAD
networks (of any given type). The organism evaluates the success or failure of this behavior, for example on the basis of the pleasurable or painful consequences to which it leads. The level of success is stored in memory, and after a set of repeated input-output trials the networks are ordered according to fitness. The intracellular properties responsible for the dynamics of neurons (such as the satellite primaries) in the best performing networks are transferred to the less fit networks. This could be implemented (in rough analogy to the immune system case) through somatic mutation of genes coding for key controlling proteins, such as kinases, and transfer through glial channels of RNA to corresponding neurons in less fit networks. It could also be implemented (in rough analogy to developmental mechanisms) by transferring a gene-inducing substance to the corresponding neurons, or by using neural signals to mediate the transfer of information. After transfer (which plays the role of reproduction) the controlling proteins are varied (through somatic mutation or through alternative gene readout) and the cycle is repeated until the input-output behavior is tuned for the task at hand. The term, selection circuits, refers to the obvious fact that the adaptation process is mediated by an internalized artificial selection ~ y s t e m . ~ The second subsystem, called the reference neuron system, is responsible for memory storage, manipulation, and retrieval. This includes content-ordered memories, time-ordered memories, associative memory structures, control of the selection circuits mechanism, and orchestration of coherent ensembles of neurons either through evolutionary (trial and error) mechanisms or through imitation mechanisms that serve to link preexisting pieces of behavior into functional spatiotemporal combinations (Conrad, 1976b). Two types of neurons mediate the operations of the reference neuron system. The first are large primary neurons responsible for significant pattern processing functions. The second, called reference neurons, are pointers that serve to control access to (or index) the primaries. When primary neurons fire they become sensitized for loading by a reference neuron firing at about the same time. Loading means that a synaptic connection is facilitated. When this reference neuron fires in the future it will reconstruct the pattern of primary firing loaded under its control. Primary neurons can also load reference neurons that they contact. This means that firing of the primaries associated with part of a scene can activate the reference neuron that causes all the primaries associated with the scene to fire. This is the content-ordered memory. Sequential operation of reference neurons allows memories to be acquired in a time-ordered fashion. The mechanism of memory manipulation is controlled by rememorization, that Edelman (1978) has also proposed that a Darwinian mechanism mediates learning processes in the brain, but at the level of selecting groups of neurons rather than at the level of controlling proteins within neurons.
MOLECULAR COMPUTING
285
is, by calling up primary firing patterns, or components of firing patterns, and rememorizing them under the control of other reference neurons. This allows for the construction of associative memories, the reordering of memories in time, the stabilization of the memory trace, and the orchestration of different special-purpose neurons for performance of complex tasks. The reference neuron scheme is basically a Hebbian model, but with two distinctive features. The first is the hierarchical control that is introduced by the distinction between reference and primary neurons. This allows for the distributed global memory structure of a neural net, but at the same time allows for localized mechanisms of memory manipulation. The same simple mechanism acts to manipulate complex scenic information (many primaries firing) as simple information. The second feature is that the primary neurons may have significant internal information processing mechanisms, in accordance with the full-scale M-m scheme, and in accordance with the frequency-coded style of interneuronal communication characteristic of the brain. These mechanisms could include the second messenger and cytoskeletal dynamics discussed earlier; but we shall see, in the next section, that many other physical-dynamical processes, some at a far more microphysical level, are also candidates. The reference neuron scheme requires a supervisory system to control its operations. The supervisory system is like an operating system policy that determines whether the reference neuron signal will load primaries or activate them for firing, whether the brain will act in a content- or time-ordered mode of recall, or whether it will act to develop associative structures. For orchestrating special-purpose neurons into groups that can perform coherent functions, it is useful to use imitative mechanisms, and a self-model, as indicated above, but it is also possible to use the associative mode to implement trial-and-error evolutionary learning. The supervisory policy may be implemented with inhibitory and competitive mechanisms that allow the most highly activated reference neuron to take control. It is of critical importance for the system to have a good policy for switching among different modes of operation in a manner that is sensitive to external conditions. The primary neurons must be large, since it is necessary that they have long synaptic connections. There are about 10" microneurons in the brain, an order of magnitude larger than the number of macroneurons. But the microneurons have short anatomical processes, and therefore only local connections. For the reference neuron scheme to work, it is necessary for the reference neurons to reach many primaries, and for the primaries to reach many reference neurons. The pyramidal cells are natural candidates for primaries, since they possess long (apical) dendrites that extend to a horizontal layer of cells whose axons run laterally over long distances, somewhat like telephone lines (cf. Elliot, 1969). Furthermore, they themselves have long
286
MICHAEL CONRAD
axons capable of carrying output in the direction of motor elements, and in addition they have a shorter tuft of dendrites capable of picking up inputs from the many surrounding neurons in the cortex. The surrounding microneurons can influence the firing of the pyramidals, and because of this modulating role are good candidates for adaptation through the evolutionary selection circuits scheme. The pyramidals could also be tuned through this scheme, but the cyclic nucleotide mechanism could realistically subserve a real-time signal processing role only in small neurons (because processing time is controlled by diffusion rates). Cytoskeletal processing, and other fast signal processing mechanisms to be discussed, would not be limited in this way. Our simulation studies of these systems have been reported elsewhere. The studies of the selection circuits scheme as applied to the CAMP reaction and diffusion dynamics in neurons have been particularly intensive and empirically correlated so far as the dynamics are concerned (Kirby and Conrad, 1984, 1986; Conrad et al., 1988). Studies with cytoskeletal dynamics have also been undertaken, as have studies of the reference neuron mechanism (Trenary and Conrad, 1987).The two systems have been integrated in a conceptual way for the purposes of refining the memory control of the evolutionary learning circuits learning algorithm (Smalz and Conrad, 1990), but have not been implemented as yet in a fully integrated fashion. Here it is sufficient to consider three questions: What kind of performance can such simulations yield, what are their computational requirements, and could they have artificial intelligence value? A typical simulation of the evolutionary learning component of the system comprises from 15 to 20 replica networks. Each is a network of four reactiondiffusion neurons, with the modeled biochemistry corresponding to that outlined in Fig. 5. Each neuron is modeled as a patch of membrane, divided up into compartments. A 300-compartment membrane requires 300 ordinary differential equations (with diffusion terms). The equations are based on classical Michaelis-Menten biochemical kinetics, though situations with fancier nonlinear interactions have also been investigated, and all the rate constants correspond to the experimentally measured values. The system is provided with a task, typically a target-seeking task. Each neuron receives the same input, which represents the location of the target and some random forces (winds) acting on it. The output of each neuron is interpreted in terms of a motion in a particular direction. In a feedback-free network the motion is the vector sum of the motions produced by each neuron. The system is placed in several initial positions on the plane (typically one position in each quadrant), and it must learn to hit the target from any position. Evaluation is done after some number of steps. If this number corresponds to the best the system can do, the pressure on it to perform is strong, and it learns to perform efficiently; if it is given a lot of leeway, it will learn to perform the task in a sloppy way.
MOLECULAR COMPUTING
287
The particular task described above can be learned rapidly using the evolutionary learning algorithm of the selection circuits scheme, often in as few as 10 cycles through the evolution process. The algorithm can be tuned in various ways to improve this. But the real key to success is that the dynamics are well suited to adaptation through evolution. It would be easy to choose a highly fragile dynamics that would not admit evolution. What we have done is to structure a network capable of solving a wide variety of problems (in this case learning a wide variety of motions relative to the target), and to provide it with highly evolution-friendly dynamics so that it can actually be molded to solve a desired problem through any reasonable evolutionary learning algorithm. The principle of extradimensional bypass (Section 3.4) is operative here, at least in a moderate way (since the problem, though a standard testbed, is by no means extremely demanding). Slight variants of the network can be given more difficult problems, such as having to learn to navigate through a space of different types of fuel sources and having to find each source before it runs out of that kind of fuel (Rossler, 1974b). The network is capable of learning this task at a density that requires it to move to the nearest fuel type currently most necessary, but it requires several hundred cycles through the learning algorithm (Conrad and Kirby, 1988). These particular tasks could be solved more efficiently by writing direct computer programs on a conventional machine than by building a simulation (virtual) system on top of that machine. The real questions are: Is this an efficient approach for biological systems; would it be possible to perform the task more efficiently with a direct fabrication of the virtual machine than with a conventional machine or with some alternate architecture (such as a connectionist neural net); and might the virtual machine provide the best means of using the conventional machine for solving some tasks for which the methods of solution are difficult to discover. The neural model described could obviously reside quite naturally in a real brain, since it is an abstraction of brain processes. The complexity of metabolic processes assumed is not a count against it in this respect, since these processes must be present for the energy and entropy transformations required for life in any case. A population of small neural networks learning to solve the task through natural evolution would be simpler than a general-purpose machine, despite the complexity of interactions within the neurons. Connectionist neural network models with simpler neurons could also solve the task in question, without the need for describing the internal interactions responsible for the input-output behavior of the neurons. But in order to implement such a system it would also be necessary to implement one of the standard learning algorithms, such as back propagation, and this requires a larger support context that must be factored in. Once the solution is learned we could dispense with the machinery required for learning, both in the case of our
288
MICHAEL CONRAD
intraneuronal dynamics model and in the case of conventional neural nets, at least so far as technological applications are concerned. The question then boils down to whether it is possible to discover these solutions, and, at a deeper level, whether the dynamics is in principle rich enough to support them. If we allow richer internal dynamics, and in particular if we allow the evolution process to act on the dynamics itself, it would in principle be possible to “breed” simulated neurons and neural networks that are capable of performing highly specific or unusual pattern processing tasks. Our working hypothesis is that long-term (possibly months or years) application of the evolutionary learning algorithm to the intraneuronal dynamics itself can yield networks capable of solving problems refractory to other approaches. Extreme high dimensionality might be necessary to provide enough extradimensional bypass. But as long as the dynamics of the resulting networks are simulatable with reasonable efficiency, the time for breeding a useful product can be discounted. Once a useful network is evolved, much of the mechanistically superfluous dimensionality might be stripped away. The evolved specifications could be ported to a more efficient implementation in parallel hardware (the learning algorithm is also well suited to parallelism). Finally, for the types of dynamics used in the reaction-diffusion model, it should be possible to cut special silicon devices, for example devices that utilize (rather than suppress) the electron diffusion and tunneling processes that inevitably occur as the size of components becomes smaller. A cellular automaton machine that is closely analogous to reaction-diffusion dynamics could be constructed on the basis of tunneling between densely packed components (Conrad, 1988a). But what about enzyme geometry and other features of biomolecules unique to the mesoscopic scale of phenomena? These play no role in the simulation described above. All the information processing power resided in the upper-level transduction dynamics; the enzyme was treated as a shapeless threshold. The reason, very simply, is that shape-based processes are too costly to compute. Some primitive simulations along this line have provided useful insights (Goel and Thompson, 1988),but it would be totally infeasible to build a virtual A1 system that operated on this basis that would be acceptable from the standpoint of the time steps required. Special silicon hardware could not help here, since shape matching is not a natural feature of silicon and congener materials. The simulation can perform in a manner that should be of some applicative value for a modest class of problems, despite the fact that it does not actually utilize any of the macromolecular properties that play such a crucial role in biological cells. It is reasonable to expect that the natural system would perform much better and that an artificial system that does use these properties would as well. Lengthy runs with computational models, together with scaling, can help to ascertain how much better. But there is no possibility
MOLECULAR COMPUTING
289
that these models could actually be converted to A1 applications. For this it would be necessary to directly enlist modes and mechanisms of computing unique to carbon.
5.
Modes of Molecular Computing
At this point we enter the transition zone between nature and technology. The boundaries are not clear. Some of the mechanisms that are “in the air” at the moment are clearly biological and have a good basis in molecular and cellular biophysics. Quite a few are potentially biological, but it is unknown whether they actually play a role in nature. Some are clearly de nouo artificial, as tenuously connected to biology as plastics are. In many instances we are dealing with “paper” mechanisms whose physical and chemical reality has not yet been demonstrated. 5.1
Classification Scheme
Let us first classify the general types of mechanisms. There are five, which we call modes of molecular computing.
State-determined. This term, coined by Ashby (1956), aptly describes the dynamics of finite automata, though here it will be used somewhat more broadly to refer to any physical system dominated by constraints. A finite automaton is a system whose next state is completely determined by its present state and input according to an arbitrary table (or transition function). “Arbitrary” here means that any table (subject to some rules of wellformedness) is possible. If the system is structurally programmable, it is possible to constrain the system in such a way that it behaves according to the table and to do so through an effective procedure (actually another finite table). If it is not structurally programmable, some kind of evolutionary procedure is necessary. In either case, energy and entropy are irrelevant. The constraints completely determine the behavior in this physical regime (Pattee, 1973). Symbolic computing belongs to this mode. The important feature of symbols is that they can be manipulated in an arbitrary (physics-independent) fashion. The state, or some feature of it, can be interpreted as a symbol (that is, as standing for something else). The physical states could be interpreted, for example, as numbers, if the tables prescribing their behavior correspond to operations on numbers. A digital computer is an example of a finite automaton and is the prime example of a purely state-determined system. Energy-driven. This means that the computing process is guided by the minimization of free energy, somewhat like a crystallization process. The free
290
MICHAEL CONRAD
energy may here be taken as F = E - TS, where E is the potential energy, S is the entropy and T is the absolute temperature. At low temperatures, energy dominates the entropy. This is the regime of crystallization. Self-assembly is a free-energy minimization process that is similar to crystallization in that energy plays an important role even at physiological temperatures, due to the additive effect of large numbers of weak interactions between complementary surfaces.lo Entropy-driven. This means that the computing process is guided by entropy increase. This is the antithesis of crystallization, since the system with the largest number of degrees of freedom will predominate. This will be the case at high temperature, since the contribution of entropy to the free energy becomes more important at high temperature, and as a consequence entropy will make the major contribution to the free-energy process. But the pertinence of entropy increase to computing is far more general than suggested by its connection to minimization. This is just the equilibrium case. In the nonequilibrium case the dissipation of energy (or the export of entropy) can, in the presence of suitable nonlinearities, create dynamic spatiotemporal structures, sometimes called dissipative structures (Nicolis and Prigogine, 1977; Ebeling and Feistel, 1982). The upper-level dynamics of transduction in the M-m architecture is, in general, entropy-driven. The simple diffusional dynamics that integrates input signals in the CAMPmodel is clearly dissipative (though it is too linear to support a dissipative structure proper). More elaborate, nonlinear dynamics could serve to integrate signals in different ways. Chaotic dynamics may be thought of as a mixing process (Rossler, 1983) and, as a consequence, is another example of a dissipative process. Specijcity-driven. This is the shape-based, enzyme-driven form of computing already discussed at length. It is completely unique to biomolecular systems, or to artificial systems constructed on the basis of biomolecular principles. Enzyme catalysis involves no net energy component per se, apart from the free-energy change of the reaction catalyzed. Self-assembly is a specificity-driven mode of computing, but with a significant energy and entropy component. Specificity-driven computing could also be called conformation-driven computing. lo A significant class of computer science problems can be mapped into free-energy minimization problems. The main insight in the neural network model of Hopfield (1982)is that it is possible to assign a notational “computationalenergy’’ to a network. The computationalenergy is a Lyapunov function (a generalization of free energy). Its fall to a minimum yields the solution (or approximate solution) of the minimization problem mapped into the network. Bona fide freeenergy minimization processes could be more efficient in terms of time and number of elements than virtual ones implemented in the signal processing strata of connectionist neural nets; but as will be indicated in the text (Section 5.3) they probably contribute to computationalfunctions in a different manner.
MOLECULAR COMPUTING
29 1
Signal integration. This is the mode in which signals are combined in space and time. It is a subcategory of state-determined behavior as long as the recombination of signals is the main determinant of state transitions. Serial digital computers are special cases in which the number of signals that the system is allowed to process at any given time is severely restricted, so as to prevent any unanticipated conflicts from arising. Dissipative processes, such as reaction-diffusion, dissipative structure formation, chaos and turbulence can also serve as means of signal integration. In general, the signal processing has more of an energy and entropy aspect in this case, and therfore is not completely dominated by forces of constraint. However, state-determined systems operating in a highly parallel mode (such as cellular automata) can exhibit pattern formation phenomena very similar to that exhibited by differential dynamical systems (Wolfram, 1984).This is to be expected, since cellular automata are to some extent analogs of partial differential equations, and in practice programs that compute reaction-diffusion equations are cellular automata (but with a very large number of possible states for each cell, corresponding to the numbers that can be represented in any particular machine).
Several of these examples, in particular the last, illustrate that these modes grade into one another, and that a given mechanism may belong to several modes. Current computer science emphasizes the state-determined mode exclusively, and does so with rather severe restrictions on the signal integration mechanisms that it will admit. Biological systems mix all the modes, which means that they stand in the zone between the purely formal and the purely physical. This zone is difficult to analyze theoretically, but insofar as it is evolution friendly it should be possible to exploit it for technological purposes. 5.2
The Hierarchy of Mechanisms
I now want to sketch a representative variety of molecular functional mechanisms under current active study. A thorough review of this active topic is infeasible here. My goal is to indicate the motivation behind the mechanisms with sufficient clarity to lead into our discussion of architectures, and to ignore all details that do not fit with this goal (however relevant they are from a technical perspective). Similarly, the techniques for exploiting the various mechanisms are in principle out of order in this section; the comments made in this respect are the bare minimum necessary to put the mechanism in perspective. We will make the presentation in terms of different types of physical particles or aggregates. In principle we would like to move systematically up the scale of size, from microscopic, to mesoscopic, to macroscopic. This is impossible, however, since the most interesting mechanisms involve interactions
292
MICHAEL CONRAD
of processes at a number of different size scales. We will associate the mechanism with the level that is most significant for it explication. Our classification of molecular computing modes will be the motivating factor. What mode of computing a particular mechanism can contribute to is the question on the floor. 5.2.7
Electrons, Photons, and Protons
Electron mobility is of obvious importance in silicon computing. As a consequence, silicon mimicry mechanisms have probably attracted the greatest attention in the molecular electronics community. The two basic problems are to create a molecular double-well potential, and then to connect the double wells in an appropriate way. This is just the three-terminal switch previously mentioned, with the control terminal being responsible for the connection. The double-well potential is not by itself adequate. Some constraint must be imposed that localizes the electron, so that it only jumps (tunnels) from one well to another under the influence of acontrol signal (Aviram, 1988). By and large, electrons do not move great distances in biological systems. The reason is that most biological macromolecules have significant shape properties, a feature incompatible with mobility (Volkenstein, 1982). However, electrons do move in a stepwise manner, probably through a tunneling mechanism, and in many cases with the requisite speed, in the energy transduction processes of photosynthesis and respiration. Consequently, many investigators have proposed that the electron transfer chains that occur in mitochondria1 membranes, in chloroplasts, and in the chromatophores that occur in photosynthetic bacteria could be a substrate for molecular computing (cj. Gilmanshin and Lazarev, 1987, for a recent review). The difficulty (as with all macromolecular switches) is establishing distant connections, setting up input and output, and suppressing tunneling side effects that would interfere with the realization of logical operations. The idea most often suggested to overcome this is to use the shape properties of the proteins to self-assemble them into circuits, at present a formidable task. Gilmanshin and Lazarev (1987, 1988) have proposed that proteins of the electron transport system could serve as critical components in analog and digital circuits operating on the basis of single-electron tunneling (Likharev, 1987). The same objectives have motivated chemists to investigate organic polymers that might exhibit a switching function. A number of organic polymers that can exhibit controllable switching, in particular electron-donor complexes, have been proposed or developed (4.Carter, 1982; Hong, 1989). The problem of linking these switches to perform a computational function is more challenging. It would be chemically intractible simply to link organic
MOLECULAR COMPUTING
293
switches to each other by attaching them to organic conductors (organic metals). The idea of linking switches to proteins that would self-assemble them into useful configurations has also been a goal of work in this area. This is probably feasible for small circuits, but, as with self-assembly of electron transfer proteins, the technological difficulties are major. The use of antibodies capable of forming highly structured antibody-antigen networks is a significant possibility here. Locating the acceptors and donors in precise locations on a membrane using membrane reconstitution techniques is yet another possibility (more on this shortly). The most plausible coupling mechanisms at the moment are optical. This means the organic or bioorganic donor-acceptor system should be pigmented, or coupled to a dye molecule. Much of the effort has focused on memory storage and retrieval. The problem is easier here since the switches need only be embedded in a film in a highly ordered way, and then accessed by optical means. These might be provided by well controlled laser optics or by a mechanically controllable multihead scanning-tunneling microscope (Schneiker et al., 1989). Organic memory storage and retrieval systems along these lines are at or near the prototype stage. The most sophisticated prototype developed to date utilizes the light energy transducing protein, bacteriorhodopsin (b-r). This purple pigmented protein, found in the “purple membrane bacteria,” is a congener molecule of the visual pigment, rhodopsin. It is primarily responsible for photosynthesis, though in recent years some sensory functions have been detected. Photosynthesis in green plants is mediated by an ultra-highly integrated complex of proteins and membranes, so it is really a significant fact that all the requisite interactions for light energy transduction can be associated with a single protein. In both cases (and in other biological energy transduction processes) the ultimate result is the translocation of protons across a membrane. This is the well known Mitchell hypothesis. It means that the activation of bacteriorhodopsin can produce membrane potential changes (electrochemical signals). Since the photocycle is driven by the input signals, the initial state of the system is automatically regenerated. By interfering with the slow (microsecond) component of the photocycle, it is possible to store and erase information with different wavelengths of light. Researchers at the Biophysics Institute in Pushchino, USSR, have developed a photochromic film (called biochrome) on this basis that stores and erases information on an optical disc (Vsevolodov et al., 1986). The storage time of the image depends on the temperature, which, according to reports, must be * I For a thorough review of bacteriorhodopsin-membrane interactions pertinent to the design of molecular electronic devices, see Hong (1986).
294
MICHAEL CONRAD
well below room temperature for a substantial storage time. The stability of the device is probably quite good, however, since b-r can be stable for up to a year in a device setting. Biochrome film is well suited for holographic memory storage. Whether its properties exceed those of other photochromic materials is arguable at this point; however, protein engineering methods applied to the b-r molecule will in all probability produce improved variants. Biochrome film, though the details of its preparation have not been published at this time, is a milestone achievement, since it demonstrates the fact that bioorganic materials can be used to make a key component of a computing system. Some investigators (Birge et al., 1989) are seeking to use the fast (picosecond rise time) component of the b-r photocycle to implement switching properties, and thereby create a bona fide molecular optoelectronic computer. As indicated above, light impinging on b-r produces membrane potential changes. Hong (1986) has shown that it is possible to predict and control the electrochemical signal in detail. A reconstituted b-r membrane system is thus a light-to-electric-potentialtransducer. The membrane is ordinarily an equipotential; but by reticulating it with a dielectric it should be possible to create a structure in which the potential changes diffuse in a manner analogous to the intracellular diffusion of the second-messenger molecule CAMP in model neurons. Such a device would, if fabricated, integrate the incoming light signals in space and time, thereby serving as a visual processing element (Conrad and Hong, 1985; Conrad, 1986). Alternatively, it could be viewed as a cellular automaton architecture, though probably in the future variants of b-r, or other pigments, may allow for bona jide neighbor-neighbor interactions at the molecular level that are better suited to a cellular automaton realization. All of the above mechanisms are directed to the state-determined mode of computing. All of them involve highly sequential signal integration, except for biochrome (which can operate on images) and for the b-r signal integration device mentioned above. This corresponds to a dynamic mode of signal integration, and since it involves diffusion it is to some extent entropy driven. A number of the mechanisms would utilize self-assembly, but for fabrication or regeneration after disturbance and not for computing per se. As a consequence we are not here dealing with either an energy-driven or specificity-drivenmode of computing. However, the elaborate conformations of electron-transfer proteins and b-r, and their conformational changes, allow a degree of subtle control over electron motions that is not possible in the inorganics and the simpler organics. Many of the mechanisms described would probably be more naturally directed to dynamic, structurally nonprogrammable modes of signal integration than structurally programmable sequential modes, or even structurally programmable parallel modes. In this case, tunneling side effects between components in a self-assembled aggregate would be harnessed for signal integration, which ultimately might be easier
MOLECULAR COMPUTING
295
than suppressing them. Repeatable fabrication of any aggregate with useful properties would be necessary, but it would not be necessary to produce the very highly and intricately constrained structures necessary for conventional computing. This remark extends to much of the work on conducting polymers and organic metallics as well. 5.2.2 Quasi-Particles (Solitons, Photons, Excitons, . ..)
So far, the mechanisms have all been based on well established biophysical studies. The domain of quasi-particle exchanges is more conjectural at the present time, largely because of the delicacy of the phenomena and the relative newness of some of the ideas in a biomolecular context. The general picture comes from field theory. The system of interest is viewed as a collection of oscillators, somewhat like a spring mattress. A variety of modes of oscillation occur, and because of quantum mechanics, these have particle as well as wave properties. The phonon, or sound wave, is the original example. It is a quasiparticle because it is a mode of oscillation in a system of atoms and molecules, whereas the electrons, photons, and protons are accorded the status of real particles either because they are considered to be oscillations of the vacuum (viewed as a fundamental field) or to be in some other sense irreducible. Excitons, roughly speaking, correspond to propagating waves of electromagnetic excitations; polarons to waves of polarization, associated with polarization of the surrounding field by a charged particle. In a nonlinear medium the diversity of oscillatory exotica becomes too enormous to assign names to. Solitons have captured the most attention from the point of view of computing. These are solitary waves in a nonlinear medium that in principle could carry energy over significant distances without dissipation, though solitary waves that are somewhat dissipative could also be admitted here. The basic idea is that the nonlinear wave packet is stable due to the fact that the nonlinearity corrects for the dispersion, thereby allowing it to act as a carrier of energy or information (Ebeling and Jenssen, 1988). The interest in solitons derives from the earlier observation that electrons do not migrate over significant distances in macromolecules with functional shape features. Two types of solitons are commonly considered. The first is the kink-type soliton (Carter, 1983), which for the present purposes may be viewed as a propagating repolarization wave (or flip) in a chain of conjugated bonds. The soliton switches proposed by Carter are stunning organic structures; but these structures do not occur in biology and are well beyond the reach of today’s organic chemistry. Their value in terms of stimulating research should not be underestimated, however. We can note that the evidence is now quite good that kink-type solitons occur in polyene chains (Su et al., 1979). Apropros
296
MICHAEL CONRAD
biological molecules, Chernavskii (1986) has proposed that kink-type solitons occur in a component of bacteriorhodospin (in an embedded retinene structure) and that they mediate the transfer of energy to the proton that moves across the membrane. The second is the peak-type soliton proposed by Davydov (1985). The Davydov-type soliton has been proposed as an energy transfer (or molecular wire) mechanism in chain systems such as the alpha helical structures of proteins. Peak-type solitons have also been proposed as an energy transfer mechanism on membranes, such as Langmuir-Blodgett films (Rambidi et al., 1987). The counterargument to these proposals is that the soliton would dissipate in a heterogeneous system (Lawrence et al., 1987). This would not, however, preclude soliton-like modes from having some role in biomolecular systems. Ebeling and Jenssen (1988)have studied the properties of a somewhat different soliton model in the presence of inhomogeneities such as might occur in an enzyme; the analysis suggests that solitons can be trapped or fused at such a site, indicating a potential role in such phenomena as allosteric shape change. Kuhn (1985) has proposed that highly integrated complexes organized on a membrane (as in chloroplasts and mitochondria) could provide the basis for a network of functionally significant exchanges of electrons, protons, phonons, solitons, excitons, and other quasi-particle forms. The Langmuir-Blodgett (L-B) technique of membrane reconstitution provides a means of fabricating such organized complexes, by laying down monolayers from an air-water interface onto a solid surface and injecting selected molecules in an ordered manner. The molecules are typically constructed by attaching an active group, such as a dye chromophore, to a long-chain hydrocarbon. The network comprises an arrangement of donor-acceptor complexes, except that particles other than electrons may be donated when the system is activated by an external stimulus (say a photon impinging on the dye molecule). Systems of this type with functional attributes (e.g., molecular wire, photon funnels, electrode interfaces, and optical memories) have been constructed using this method (Barraud, 1985; Sagiv, 1988; Sugi, 19881, and the likelihood is that films with more significant functions will be produced in the future.
5.2.3 Electronic-Conformational interactions This category includes mechanisms in which molecular shape and shape change play a role (enzyme specificity,self-assembly, allosteric shape change). The term “electronic-conformational interactions” (Volkenstein, 1982) refers to the fact that both the electronic and nuclear motions are important, and that they interact. We need not repeat our earlier comments on this topic,
MOLECULAR COMPUTING
297
except to observe that precise controls on the particle and quasi-particle processes described above (Sections 5.2.1 and 5.2.2) have their source in the mesoscopic level of nuclear configuration. Electronic-conformational interactions obviously belong to the specificity-driven mode of computing. However, specificity can serve as a basis for (nonprogrammable) statedetermined computing in chemical reaction schemes, especially those that utilize the allosteric switching property. It can serve as a basis for entropydriven signal integration, as in CAMPreaction-diffusion dynamics. And it can serve as a basis for energy-driven signal integration in systems in which the self-assembly of macromolecules released by incoming signals is used as a tactilizing mechanism of pattern recognition (more on this later).
5.2.4 Membrane and Cytoskeletal Dynamics
Membranes and cytoskeletal structures are the polymolecular, highly structured “fields” on which (and sometimes in which) most of the enzymatic and particle exchange events in biological cells occur. In biological cells the membrane is a lipid bilayer, with polar heads pointing outward into the aqueous phase and nonpolar side chains attracted to each other in a fatty interior (Fig. 8, cf. Singer and Nicholson, 1972). The bilayer membrane encloses the cell and extends through it in the form of multiple internal surfaces. The external membrane is heavily peppered with proteins and carbohydrates on the outside; some proteins swim laterally within the
water
water FIG. 8. Bilayer membrane. The polar side (hydrophilic) groups of phospholipid (represented by circles) are in contact with bound water. Wavy lines represent fatty acid (hydrophobic)chains. Surface and integral proteins are not shown. The structure is self-organizing, basically due to the oil-water repulsion.
298
MICHAEL CONRAD
membrane, others penetrate through it. The inside is covered by a protein mesh. Cytoskeleton underlies the membrane and also extends throughout the cell. It may be thought of as comprising microskeleton (structural fibers) and micromuscle (sliding fibers) and, in all probability, it can utilize these structures as a sort of micronervous system. The membrane is stretched tautly over this system of mesh and supporting poles, somewhat like a circus tent, and can be pushed or pulled into different states by the motions of this underlying structure (Kirkpatrick, 1979). The microtubules, a major component of the cytoskeleton, comprise a superpolymeric network of globular proteins that can be reconstituted outside of the cell in a highly regular manner. Microtubule associated proteins (MAPs) can be targets of CAMP action, and can serve to alter the structure of the cytoskeleton and hence of the membrane. The important feature of membranes, for the present purposes, is that they control the flow of ions. We have already seen that channel proteins that specifically recognize and control the flow of sodium and potassium control electrochemical signals such as the nerve impulse, thereby controlling the macroscopic layer of nerve impulse processing. At this level we are dealing with parallel signal flows, and in some cases with collective excitatory behavior (as is clear in the case of heart muscle, for example). Lying behind this are the specific channel proteins. This is the classic picture. But it is now clear that there are a variety of more subtle controls that act on the channel proteins. These may be the electron transfer and proton translocation mechanisms that mediate energy transduction, and conceivably mediate information processes as well. In some cases they certainly include second-messenger signals resulting from external input to the cell or generated through purely endogenous metabolic processes. They may include cytoskeletal motions that act on channel proteins (Liberman et al., 1985). The cytoskeletal mechanisms are more conjectural at the moment. Suggestions include: polymerization and depolymerization reactions responsible for morphogenesis of microtubules (Matsumoto et al., 1989); hypersound signals generated by input channels (Liberman, 1979); propagating allosteric shape changes, possibly based on electronic instabilities produced by shape interactions between neighboring proteins (Conrad, 1979~);metabolically driven motions of the same type used to transport materials; and coupling through dipole oscillations (Hameroff, 1987). All these mechanisms provide a basis for the signal integration mode of computing. Changes in the cytoskeletal structure arising from second-messenger effects on MAPs or from disassembly and reassembly in response to an altered milieu could yield a different pattern of signal integration. Evolution or evolutionary learning mechanisms acting on the MAPs could also alter the pattern processing activities of the system.
MOLECULAR COMPUTING
299
5.2.5 Collective Dynamics (Coherence, Chaos, Chemical Kinetics) Now let us look at the membrane, the cytoskeleton, the intracellular aqueous phases, and the biochemical reaction network of the cell as fields in their own right, with collective dynamics that can serve a control or information processing function. In this context the term active media is often used. The CAMP reaction-diffusion dynamics is an example. But far more exotic phenomena can occur. The most ponderable level is that of chemical reaction and diffusion. In principle it is possible to map nonlinear enzymatic and cylic reactions into all the elements of a finite automaton (Rossler, 1974b; Okamoto et al., 1987). Actually coupling the reactions in any large-scale way to implement pure state-determined behavior is infeasible; however, at a local level such exotic chemical kinetic reactions can exert highly milieu-sensitive controls. At a more global level we have already noted the spatiotemporal coherence (dissipative structures) that can arise in systems with at least two chemicals that travel at different rates and interact nonlinearly. The initial- (or external-) conditionsensitive chaotic regime, mentioned in conjunction with highly sensitive transduction dynamics, can occur in the presence of three or more reacting species (Rossler, 1983).When the number of reacting species becomes large, we have the potentiality for buffering the effects of perturbation on the system. The regime between nonlinear dynamics and large numbers of species is the high-dimensional domain that allows for extradimensional bypass, and hence for effective evolution. As active media, the membrane and cytoskeleton are capable of supporting the same general types of dynamical phenomena. Membrane excitability is a highly specialized example. The cytoskeleton, with its specific network structure and fast signal processing, should in principle have more dynamical potentialities than reaction kinetics in an aqueous phase. This extra structure allows for a yet more refined level of collective behavior, involving the strata of quasi-particles. The model, due to Frohlich (1983), envisions the cell as a collection of oscillators coupled through dipole interactions, though mechanical couplings are obviously also possible in the structured phases. The system is in continual vibratory activity, corresponding to a bath of photons, optical phonons (connected with the dipole oscillations) and phonons connected with the purely mechanical vibrations. By itself this has no significance. All materials are continually emitting and absorbing photons and phonons due to the thermal agitation of the constituent molecules. Frohlich proposed that the vibratory activity could be made coherent by coupling it to energy-yielding metabolic reactions. In essence, the flow of energy in the system creates a dissipative structure in the quasi-particle medium. If this is fact the case, it would provide a subtle underpinning of
300
MICHAEL CONRAD
information and control that would interact dynamically with the comparatively more visible level of macromolecular motion. Light traps would be necessary for interactions with “biophotons,” since the cell, apart from special pigments, is basically transparent to light. Many models along this line have been constructed, but the subtlety of the phenomena has so far militated against direct experimental verification (for a recent review, see Popp, 1988). So far we have kept water, the major component of biological cells, in the background. It exists in free and bound forms. The bound water is the structured phase adjacent to the hydrophilic surfaces of membranes, proteins, and nucleic acids. It constitutes a sphere of hydration that envelops individual aqueous proteins and that, with sufficient propinquity to the membrane or cytoskeleton, merges with the extended layers of bound water in the cell. The key fact about water is that is comprises a vast network of hydrogen bonds, or protons, that move (really hop) in a manner somewhat analogous to the conduction electrons in a metal. This network extends throughout the free and bound phases, but in the latter merges with hydrogen bonds of proteins and nucleic acids to form “proton wires” (Nagle and Nagle, 1983). The structure of the network is easily influenced by external conditions since it is highly sensitive to the distribution. of ions and polyelectrolytes (e.g., proteins with multiple surface charges). It thus constitutes an excellent medium for integrating input signals and influencing the dynamics of proteins that control output signals. Collective coherent dynamics of the hydrogen bond network might also be possible. This could be driven by energy transduction mechanisms that must couple to proton motions in any case. Another model (Conrad, 1987b, 1988c) is based on the analogy between mobile protons in the layer of bound water adjacent to the membrane surface and conduction electrons in a metal lattice. The movement of the proton will attract polarizable electrons in the polar side groups of the membrane, leading to a propagating polarization (which may be called an exciton). In this way an attractive interaction between protons is set up, analogous to the phonon-mediated attractive interactions among electrons in a metal that can lead to superconductivity. If the effective mass of the protons is reduced by virtue of motion-facilitating interactions with membrane proteins, it is in principle possible for small pools and connecting channels of “supermobile” protons to condense at physiological temperatures. This model has not yet been subject to experimental test, but it serves to illustrate how intricately processes at different levels of the microscopicmacroscopic hierarchy interdigitate in biological information processing. If a supermobile condensate developed, it would have the character of an extended unitary structure that could act as a template for input signal patterns distributed over the cell surface, somewhat in the manner that an enzyme acts as a template for a substrate with a complementary shape. Molecular electronics researchers have, in general, paid less attention to
MOLECULAR COMPUTING
301
collective processes than to individual chemical and molecular mechanisms. It should, however, be feasible to employ processes of this type in a device framework. A concrete example is provided by the Belousov-Zhabotinsky (B-Z) reaction, an inorganic oxidation-reduction process that can give rise to complex dynamic structures such as ring and spiral waves (cj. Winfree, 1972; Krinksy et al., 1986). It is a dissipative structure, sometimes referred to as an autowave process because of its self-excitatory nature. Kuhnert (1986) has shown that a light-sensitive ruthenium-catalyzed modification of the B-Z reaction gives rise to patterns in the medium in response to laser light signals that can use the periodicity and phase shifts of the reaction to store and hide an image. Krinsky and Agladze (at the Biophysics Institute in Pushchino) have constructed a prototype device to store an image (such as a face), transform it to exhibit contrasts and then a skeletal image, and then restore the image. These calculations could be performed by a digital computer, or by special-purpose silicon hardware. But for visual processing on a production line, speed is all important. Many autowave chemical and biochemical reactions similar to the B-Z reaction have been discovered, and some of these (say in films) could become useful preprocessing components of vision systems.
5.3 Biosensor Design A biosensor, like a biological cell, receives input from the external environment, processes it, and either emits an output signal or takes an action. It is thus an elementary example of an M-m processor. Biosensor technology is a field under active development, with existing applications to sensing substances and physical properties such as glucose, urea, protein, fermentation, oxygen, and pH. A number of the better established mechanisms, from various levels of the mechanistic hierarchy, are integrated in these devices. Biosensors should dispel any doubt that molecular and biomolecular mechanisms can be used for information processing. A typical biosensor consists of enzymes immobilized on a membrane and a readout transducer that produces an electrical or optical signal in response to the effects of the enzyme action (Moriizumi, 1985; Tien, 1988). These enzymes thus serve as first-stage detector molecules (corresponding to the receptors on a cell membrane). Antigens and antibodies have also been used as detector molecules. The physical and chemical changes brought about by these firststage detectors include potential changes, temperature changes, changes in spectroscopic properties, and electrochemical reactions. This corresponds to what we previously called the tactilizing transduction dynamics in the cAMP system. The readout transducing device (which might be an ion-selective field effect transistor or a film electrode) serves as the second-stage detector (corresponding to the kinase enzyme in the cAMP scheme). The important
302
MICHAEL CONRAD
point is that a specially selected protein conformation must be integrated into a matrix with dynamical properties. Artificial biosensors have not yet been elaborated into pattern processors. This would require the dynamical process intervening between the first- and second-stage detector to have signal integration capabilities, or it might require that the second-stage detector be more sensitive to the signal integration capabilities that exist. Also, of course, the first-stage detector would have to be replaced by a battery of receptors, to allow for multiple input lines. The (paper) b-r “cellular automaton” device described in Section 5.2.1 would have this property. But perhaps the most illuminating Bendanken example is provided by the self-assembly mechanism (Section 5.2.3). The process of signal integration is schematically illustrated in Fig. 9. The pattern of signals impinging on the device causes distinct macromolecules to be released. The released molecules self-assemble to form a mosaic-like structure. The shape features of the mosaic are interpreted by readout enzymes input
input
output
output
FIG.9. Self-assembly model of pattern recognition. Input signals lead to the release of specifically shaped macromolecules, which then self-assemble to form a higher-level aggregate (or mosaic). Different input patterns are represented by different shape features of the mosaic. Not shown is the adaptor molecule that must recognize these shape features and control the output of the device. The model illustrates how a self-organizing process can serve to represent a symbolic pattern recognition problem as a free-energy minimization process.
MOLECULAR COMPUTING
303
that control the actions of effector enzymes. The crucial point is that these readout enzymes are adaptor molecules with two distinct specificities, one directed to the mosaic and one directed to the effector enzyme. The situation is to some extent analogous to transfer RNA. This is also an adaptor molecule, with one specificity directed to the m-RNA codon and one directed to the appropriate amino acid. The physical assumption is that the self-assembly process serves to break the symmetry of the input signal patterns in a manner somewhat analogous to the conversion of the linear sequence of amino acids in a protein to a folded three-dimensional structure. If the collection of molecular shapes released is sufficiently rich, the self-assembled mosaic will have a wide variety of potential shape features, each naturally grouping different input signal patterns by virtue of the physics of self-assembly. The power of the self-assembly mode of molecular computing is that it converts a symbolic pattern recognition problem to a free-energy minimization problem. The self-assembly of the mosaic is a self-organizing (hence nonprogrammable) property. If emergence of the self-assembled structure is controlled in part by the electronic wave function, we can expect that our earlier arguments (Section 3.6) about the extra search power inherent in quantum systems would be relevant. The self-assembly mechanism is hypothetical as far as nerve cells are concerned. But in immune system cells, receptors are known to cluster when activated (Menen et al., 1984). Here speed of processing, which would be limited by diffusion or transport, is much less important than a correct decision. The time required for processing would be greatly reduced if signals impinging on widely different locations on the cell surface were brought together by fast propagating signals in the cytoskeleton. Cascades of fast, localized assembly and disassembly processes in the membrane or on the cytoskeleton could mediate substantial pattern processing in the cell. Antigenantibody interactions and membrane reconstitution techniques could be used to implement the self-assembly mode of computing in artificial devices.
6.
The Molecular Computer Factory
Now let us consider the enabling technologies that could mold the above mechanisms into functional devices and architectures. We will conceptualize the situation in terms of a factory, or, really, infrastructure of development and production. Our objective is is not to explicate the details of these technologies, but rather to indicate how they must work together, and, more generally, to explicate what is new in the type of engineering thinking that is required. What is new is the essential evolutionary nature of the factory. The major elements of the factory are illustrated in Fig. 10. At the center
c
Enzyme
Membrane
Biochemical
Immobilization
Reconstitution
Reaction-Diffusion
\
\r
+
Integrated Computer Simulation of Integrated System (as design aid)
Cytoskeletal
Reconstitution
System
information processing devices biosensors energy transduction devices
Immunolog ica + Techniques
bioreactors diagnostic or therapeutic agents
I c (other)
Evolutionary Design Process Performance Evaluation Selection and Reproduction
c
Variation through Gene Manipulation (e.g. recombinant DNA)
Culturing System Computer Simulation of Culturing System
New Proteins
and Evolutionary Process Mass Production Techniques FIG. 10. Design for a molecular computer factory. The key enabling technologies and methodologies required for the development and production of molecular functional systems are pictured. It is really a technological infrastructure rather than a factory per se.
MOLECULAR COMPUTING
305
is the device. It performs a function. It could be sensing, pattern processing, memory, energy conversion, control, measurement, or any of the modes of computing discussed. The function is specified by an “evolutionary engineer.” One main task is to spell out the criteria for evaluating the device. If the device were created by modifying an extant organism, the engineer might concentrate exclusively on the specification of function and the mechanism of evolution, ignoring completely how the system works. It would be a pure “artificial selection” system. But if it is composed by integrating macromolecular components, an initial act of human creativity is necessary to produce a protodevice on which the selection scheme can operate. The design process thus divides into two broad stages. The first, that of conceptualizing a scheme for performing the function, emphasizes mechanism. The level of performance achieved is marginal. The second stage uses variation and selection (evolutionary programming) to transform the mechanism and tune it for excellent performance. Since the device, or the component of it that is being developed, is, in general, structurally nonprogrammable, a clear (machine-like) understanding of the mechanism is generally lost at this stage, though a general grasp of it can certainly be of heuristic value. This is especially true if the performance of the device is powered by the unpicturability of the microphysical domain. If the biotechnologist insists on having the same kind of clear understanding of the structure-function relation as the digital computer engineer, he will lock himself into the structurally programmable, inefficient, classical, and nonevolvable domain of function. The stage of initial conception may in part be inspired by the analysis of actual biological systems, or, as already indicated, it could be de now inventive. Computer simulation can help as an aid for testing the value of the concept. Trial-and-error experimentation with plausible combinations of components might in some instances lead, and in other instances follow, the simulation studies. If the combination yields a primitive functional capability, this could be amplified through the variation-selection procedure. The initial experiments and subsequent evolution require an initial trial set of components and subsequent variations on this trial set. Tools for creating these components include: gene manipulation (e.g., recombinant DNA); protein engineering and self-assembly; monoclonal antibodies and antigen-antibody networks; immobilized enzymes; membrane reconstitution (such as the L-B technique); and cytoskeletal (specifically microtubule) reconstitution, bioreactors, and conventional analytic and synthetic methods. We have already mentioned most of these, and have emphasized the central role of DNA and protein engineering, first because of its generative power and second, because of the ubiquitous structural and enzymatic importance of proteins. But variations on which selection can act may also be introduced through membrane reconstitution and antibody
306
MICHAEL CONRAD
techniques. In the latter case the monoclonal antibody technique can be used to produce specifically selected antibodies in large quantities. We recall the importance of high dimensionality for the evolution process; the successful evolutionary engineer will deliberately introduce mechanistically superfluous (mutation-buffering) features into the components with which he works, in order to increase the efficiency of the search process. The whole factory is, in its essential structure, an evolutionary bioreactor. The bioreactor may also be one of the device types at the center of the scheme. In principle the scheme can become entirely recursive. The device at the center of Fig. 10 then plays a second role as an agent for the highly specific synthesis of all sorts of variant materials. An inevitable problem is that evolutionary systems tend to define their own goals. It is quite possible for traits to emerge that are epiphenomena1 or to proliferate by virtue of hitchhiking along with traits that are of clear selective value. This is why it is important for the evolutionary engineer to specify with great care the criteria for selection. It is necessary to organize the culture system in the development stage so that evolutionary discovery and optimization are facilitated. This means using effective recombination and speciation strategies. But the opposite side of the coin is that it is necessary to prevent evolutionary change from occurring in the production phase. Here mutation and other genetic variation must be reduced to a minimum. The factors responsible for selection must be stabilized. The evolutionary factory, in short, is an ecosystem. Evolutionary and ecological theory, far removed from the floor of the ordinary factory and at first sight secondary to the physical, chemical, and preparative aspects of new materials, thus makes its relevance known. For operating the molecular computer factory, predictive simulation models would be most helpful (Rizki and Conrad, 1985). According to the tradeoff principle it is impossible for such models to keep pace with the real system. Here, as with the materials themselves, we cannot expect a precise predictive tool. We can expect a form of CAC (computer-aided cultivation) that is an analog of computer-aided design and manufacturing. Production would be too expensive and the production process too uncontrolled without a flexible machine line. Modern robotry is already capable of many of the manipulations associated with gene sequencing and protein engineering, membrane reconstitution, mixing of chemicals and extraction of samples, culturing and environmental control. A robot line can be reprogrammed for different regimes of operation, including alterations in component production (such as controlled alterations in the laying down of L-B films) and different regimes for integrating components, and for switching between those culturing regimes directed to evolution of new products and those directed to production per se. One of the contributions of the molecular computer factory could be the development of molecular devices that increase the efficacy of robotry, and thereby increase the efficacy of the factory.
MOLECULAR COMPUTING
307
As flexible mass-production techniques become available, the inventory of components will increase. The molecular computer designer will less and less be required to go through the arduous task of mutually adapting each piece of a molecular jigsaw puzzle. This is the great bottleneck today. The practicality of electronic, mechanical, and conventional materials technologies is based on a mature infrastructure for development and production. With the advance of molecular biotechnology, however, the time should eventually arrive when it will be possible for the molecular computer engineer to go to a specialty house for a required component, or even to a component bank, just as it is now possible to do so in the case of conventional computer engineering.
7.
Molecular Computer Architectures
Now let us suppose that the device at the center of the factory’s activities is in fact a molecular computer. Table I lists a series of conceivable architectures. All are subcategories of the M-m model, except for the evolutionary architecture (which in general is a supercategory). The classification begins with the more conventional designs and, roughly speaking, moves on to those that are increasingly biological in style. Actually there is no entirely natural order, due to the fact that some of the emerging molecular electronic technologies offer new approaches to conventional design. Our very quick characterization of each architectural type has two goals. The first is provide more of a sense of the possibilities for functional organization in the large than could be included in the discussion of mechanisms. The second is to elicit the computational characteristics of each architectural type. Our argument will be that as we remove features of the evolutionary architecture we recede from true cognitive computation. Most of the designs considered (biosensors and biochrome excepted) have an aspect of futurity. Many factors bear on their actual feasibility, in particular the maturity of the technological structure, the development of competitive technologies, the insightful recognition of application areas, and unanticipated developments. We cannot analyze each architecture from this point of view here. However, the discussion should help to form some overall impression in this regard. 7.1
Conventional (von Neumann) Architectures
The tradeoff principle and our analysis of biomolecular mechanisms suggest that the conventional (von Neumann) digital computer is quite alien to the natural mode of operation of bioorganic materials. But at the same time we have seen a number of approaches to tricking biomolecules (or biomimetic
308
MICHAEL CONRAD
TABLE I
ARCHITECTURAL TYPES Conventional (von Neumann) architectures Carbon-silicon interface Optomolecular electronic Molecular memory augmented Parallel architectures Structurally programmable Structurally nonprogrammable Neural Architectures Neuromolecular Optomolecular Optical architectures Integrated Fiber optic Mirror-based open designs Memory-manipulation architectures Conformation-driven designs Enzyme-driven Reaction-diffusion Self-assembly-driven Analog and dynamics-based designs Hybrid Systems Integrated networks of processor types Heterogeneous distributed networks Molecular-device-augmented robots Evolutionary architectures Intrinsic Extrinsic Other?
molecules) into behaving like memory switches or logic elements. There is also the organic chemistry approach, which attempts direct silicon mimicry. The main difficulties of the pure organic or bioorganic approach are establishing connections between switches and eliminating side effects. Reliability may be an issue, but fault-tolerance schemes exist that can mask both switching failures and incorrect connections (Winograd and Cowan, 1963; Dal Cin, 1979). The difficulty of fabricating a purely organic version of a von Neumann machine at this time should not obscure the possibilities inherent in semiorganic approaches. There are at least three approaches. The first involves silicon-organicinterfaces, either between proteins and silicon, membranes and silicon, or protein-membrane complexes and silicon. Electron flow in dry protein films interstitially incorporated into a silicon chip could conceivably allow for greater miniaturization, though there is no definite evidence for this
MOLECULAR COMPUTING
309
at the present time. The Lazarev-Gilmanshin effort to utilize electrontransport proteins as elements in circuits that take advantage of singleelectron tunneling could also fit into this category (cf. Section 5.2). The second approach is optical. Pigmented organics or bioorganics could be interfaced with, or incorporated into, a chip in order to establish optical links. The dry protein films would in this case be attached to an optically active group. Such optical linkages would probably be more significant for unconventional architectures (see below). Nonlinear organic materials with optical switching properties would eliminate the need for the silicon interface, but this would obviously call for more sophisticated optics. The Birge program of using the bacteriorhodopsin photocycle is a bioorganic version of this approach (cf. Section 5.2). The third approach is to restrict the organic component to memory storage and retrieval. Most of the current attempts in this direction use optical techniques (including electron optics) for storage and access. The potential advantage is the high density of memory switches that could be obtained. A fault-tolerant design would obviously be necessary, both because of the molecular fluctuations (cf. Lawrence et al., 1987) and because of inevitable imprecision in the optics. The use of bioorganic molecules with jigsaw-like self-assembly properties would allow for more precise configurations on the memory disc, and for some intrinsic reliability (based on self-reassembly). The architecture of the conventional machine, since it is structurally programmable, is too delicate for evolutionary methods of improvement. The evolutionary character of the molecular computer factory is relevant to sculpting the molecular components, however. Enzymes could be developed in the factory that would make it possible to synthesize organic molecules that presently can only be contemplated on paper. The most significant obstacle to the approaches indicated above is the advanced state of the silicon industry, and the economies of scale afforded by the existing infrastructure. The molecular biotechnology infrastructure will have to undergo considerable maturation before substantial use of organics becomes possible. The selective introduction of organic components to improve performance of basically silicon architectures, or to improve their performance for special-purpose applications, is a nearer-term possibility. 7.2
Parallel (including Neural) Designs
These fall into two types: structurally programmable and structurally nonprogrammable. The structurally programmable systems would be built out of the same components as the conventional serial designs discussed above. Each component would require definitive specifications (as nearly as possible), and the system would have to run on a clocked basis (except for small
310
MICHAEL CONRAD
circuits, in which the timing could be controlled by choosing lengths of the wires). The difference, as with silicon parallel architectures, is that either a number of processes share the same processors or a given process utilizes a large fraction of the processors at any given time. This is why conventional (effective) programmability generally breaks down. Structurally programmable molecular computers run in the parallel mode would clearly incur all the same technical fabrication problems as those run in the sequential mode. The situation with structurally nonprogrammable designs is different. Many of the synthesis, reconstitution, and assembly techniques under intensive current development in molecular electronics lead to rather intricate (in some cases tangled appearing) molecular complexes. Some of the techniques (such as the L-B technique, the antigen-antibody network method, the protein self-assembly approaches, and some of the organic chemistry methods of polymer synthesis) allow for control over the resulting structures, and for a high degree of repeatability that is within the bounds set by faulttolerant designs in at least some cases. Such structures can serve as a medium for signal integration through neighbor-neighbor interactions, in the fashion of cellular automata. For example, optical input could be transduced to a flow of electrons in an organic metal, such as polyacetylene, or in a network of selfassembled electron-transfer proteins. Dye molecules or membrane electrochemical activity could provide the output. This is a variant of the pattern processing biosensor. It would serve either to recognize patterns or transduce them to more manageable forms. It is necessary to tune the structures and the locations of output elements through an evolutionary process. The evolutionary procedures of the molecular computer factory are well suited to developing such signal integration devices. A great deal of time would be required to specify preparation procedures for useful devices. But this should be a viable approach once automation procedures for varying and selecting structures are set up. Recently a great deal of interest has focused on connectionist neural nets as examples of parallel computing distributed over a collection of simple processors. Earlier (Section 4.3) we viewed connectionist neural networks as a special case of neuromolecular nets in which all the internal information processing dynamics are turned off. This is, of course, not the practical point of view if the objective is to fabricate a network of elements with simple inputoutput dynamics. Today’s neural computers are virtual machines built on top of conventional architectures. Many of the suggestions for actual implementation involve optical techniques. Here the excellent optical properties of molecular materials could help out. For example, suppose that each neuron is a liposome (spherical membrane) layered on the inside with a number of species of pigment molecules, each of which is sensitive to a particular frequency, and that these are linked to an
MOLECULAR COMPUTING
31 1
output pigment that emits at a frequency characteristic of the neuron. The linkage could be through a membrane potential change if the (homogeneous) ensemble of output pigments is voltage sensitive. Neuron 1 would be connected to neuron 2 if it contains an input pigment that responds to the frequency broadcast by neuron 2. A threshold property could be introduced by using output molecules with nonlinear properties. The connection pattern could be modified by adding or deleting input pigment species, or, if the whole repertoire of pigments is initially present, by establishing direct links (e.g., covalent bonds) between input and output pigments. This could be facilitated optically or enzymatically. In order to ensure that the output signal is broadcast sufficiently widely, it would have to trigger an externally powered laser. The signal could be weaker if a system of mirrors or a diffraction grating were used to direct light to addressable neurons. The different frequencies would not be necessary in this case. This design can be elaborated on paper, though it is obviously somewhat artificial even for small networks. It does point to the fact that if an optical design is possible, it becomes more possible when augmented with engineered organic and bioorganic materials. 7.3 Optical Architectures (including Memory-Based Designs)
It is probably a fair judgement that optical computing has, in general, been extremely attractive on paper, but, except for special applications, very tentative as a technology. Optical methods should therefore not be viewed as providing the route to molecular computing. However, as indicated above, wherever optical technology can work, the use of molecular electronic materials and techniques should provide substantial help. Optical designs are typically divided into three classes: integrated, fiber optic, and mirror-based open designs. The Langmuir-Blodgett film method pioneered by Kuhn (Section 5.2.2) is an ultra-integrated optical design insofar as the exchange particles are photons. As indicated above, it could be evolutionarily engineered to perform a signal combination function. L-B films and other membrane reconstitution techniques suffer from instability problems and require a very controlled milieu. In the absence of improved preparation techniques, some fault-tolerant scheme requiring redundant units would be necessary. Self-assemblyof proteins with optically active groups on a membrane or in an antigen-antibody network would allow for molecular networks that are more elaborate, precise, and regenerative. Fiber optics would be amenable to more conventional computing. Here the interfacing of optically sensitive organic materials with a silicon chip could play a useful role. Mirror-based designs are extremely elegant, but suffer from dissipation of the light signal. Here new organic and bioorganic pigments
312
MICHAEL CONRAD
could make a major contribution. Such systems would most plausibly be directed towards special applications requiring fast signal integration at a macro scale. The reason for including memory manipulation architectures under optical computing is that optomolecular memory storage systems are probably the nearest-term commercial possibility. We have already mentioned that biochrome film (consisting of the photosynthetic protein, bacteriorhodopsin) has been used as the basis for a prototype image storing and erasing system, and has properties that make it suitable to a holographic memory (Section 5.2.1). But in actual fact the most interesting memory structures do not fit this type of architecture or any conventional architecture, but instead belong to bona Jide neuromolecular systems. 7.4 Conformation- and Dynamics-Driven Designs
This category, of which the biosensor is the prototype, has already been discussed extensively. Enzyme-driven processors, reaction-diffusion processors that use enzymatic readout, and the self-assembly mechanism of integrating input signals are all examples. The term Brownian computing can also be used, because signal integration depends on a combination of Brownian motion and shape-based specificity. Conformation-driven processors, because of their close connection to biosensors, are a near-term prospect. They should serve as sensors and preprocessors for conventional computers. Banks of conformation-driven pattern processors could be arranged to act independently on different segments of an input pattern, or arranged in layers connected by macroscopic optical or electrochemical signal links. Another useful and feasible arrangement is to have processors with different pattern processing capabilities act on the same input patterns to detect different features of them or to interpret them differently to control different degrees of freedom of a robot. The dynamics-based designs utilize transduction dynamics, but not enzymatic readout. The term “analog” is sometimes used here to capture the sense of continuity. Strictly speaking, though, it should be used only if the dynamics are isomorphic or homomorphic to some other system, and if this fact is utilized for problem solving. The image processing system based on the B-Z reaction is an example of transduction dynamics used for image transformation rather than pattern recognition per se (Section 5.2.5). If probes less sensitive than readout enzymes are used, bona fide processing can be obtained, but it is no longer specificity-driven in the conformational sense. Such systems could, of course, also be used as analogs for slower or larger-scale systems with the same dynamics. Pattern transformation or processing based on chemical dynamics, such as afforded by the B-Z reaction, differs from molecular signal
MOLECULAR COMPUTING
313
integration in membrane and protein matrices (Section 7.2) or optical signal integration (Section 7.3) by virtue of being a collective property of large numbers of molecules. It is, in general, less microscopic than the former and less macroscopic than the latter. 7.5
Hybrid Systems
These fall into three broad categories: integrated networks of processor types, heterogeneous distributed networks, and molecular-device-augmented robots. The integrated-network category would comprise combinations of conventional processors and conformation-driven and dynamics-based processors in continual communication, with each performing computing functions natural to them. The attempt to develop silicon-carbon interfaces is a step in this direction, but it does not necessarily entail integration of processors operating in different computing modes. Heterogeneous distributed networks would, in general, involve larger collections of processors in more infrequent communication and perhaps without physical proximity. A specific example would be a conventional computer controlling the manner in which conformation-driven pattern processors exchange information, or controlling the evolution process as applied to these processors (e.g., Akingbehin and Conrad, 1989). Systems augmented with molecular devices would be conventional computers or robots enhanced with biosensors or bioactuators capable of controlling chemical, medical, or biotechnological processes. Future computing systems directed to process-control and similar specific applications are plausible candidates for enhancement by molecular functional devices. 7.6
Evolutionary Architectures
The evolutionary architecture may be the extrinsic infrastructure used for development and production (the factory discussed in Section 6).In this case it stands above the M-m architecture. The evolutionary mechanism could, in principle, also be embedded in the M-m architecture itself, as in the evolutionary selection circuits scheme (Section 4.4).The mechanisms that would have to be implemented internally are a subset of those described for the molecular computer factory. This would afford an important adaptive capability to stand-alone systems. But the technological feasibility of systems of this type is clearly remote at the present time. We need not reiterate here the various design features either of the selection circuits model or of the molecular computer factory. The pertinent question is how the evolutionary process relates to each of the architectures described above, and what this signifies for the issue of control.
314
MICHAEL CONRAD
We have two situations. The first is that of the structurally programmable system with structurally nonprogrammable molecular components. The components are evolved, but then used for precisely defined switching functions. The advantage is one of size, speed, coupling to photons or other signal modalities, or subtlety of control over electronic motions. This is how evolution relates to the conventional (serial) architectures, the structurally programmable parallel designs, some of the optical designs, and some features of hybrid systems. It is not absolutely different from conventional computer engineering, where the engineer must in practice call on trial and error to tool up the various components. Once the system is made, the user need not worry about the underlying physics; if the user runs the machine in a sequential mode, he can exert what appears to be complete control. The second situation is where the architecture is structurally nonprogrammable, and therefore itself an object of the evolutionary process. This may be because the system is an individual processor that is molded for the task at hand (as in the conformation-driven and dynamics-based designs) or because it is a network built from an indefinitely large repertoire of processor types. There is no possibility for the user to have the kind of prescriptive control that is familiar from serial digital computers (unless the system embeds a high-level interpreter). Evolutionary programming becomes necessary; the user must specify goals and selection criteria, and then essentially breed the system to perform in a desired fashion. The objection often raised is that the loss of prescriptive control is too disturbing to accept. One can never be sure that the system will behave as desired. This is in fact the case. But it must also be admitted that conventional programs, when they become sufficiently large, are not guaranteed to behave as desired. More to the point, structurally programmable systems operated in a parallel mode generally do not admit prescriptive programming. The programmer does not in general know in advance what program is actually communicated to the system when he sets the states and connectivity among the components. The move to structurally nonprogrammable designs is not a loss in this respect; it is a gain in that it increases the chance that the system can be molded for a desired task through the evolutionary process. Furthermore, the tradeoff principle leaves no choice. If one demands the ability to prescribe in advance what program a system will follow, it is necessary to accept as well the compromise of efficiency, and accept the fact that certain problem domains that are commonly managed by biological organisms will be refrzctory to automation. 7.7 Towards Cognitive Computation
Now let us return to our claim that the more we remove features of evolution from computational systems the more we recede from modes of
MOLECULAR COMPUTING
315
computation that can reasonably be associated with cognition. If the architecture is structurally programmable, all evolutionary features are extirpated from it so far as its operation is concerned, even if the components are molded through evolution. The only possibility for evolution is at the virtual, or simulation, level. Thus if our thesis is correct, we must be dealing with features of the evolutionary architecture as applied to the structurally nonprogrammable domain. We can elicit three major features. The first is connected with interpretation, the second with intrinsic ambiguity, and the third with measurement. Interpretation is connected with semantics. We have already noted (Section 4.2) the connection between semantics and functional significance in the context of evolution. Formal symbols have no intrinsic meaning. This feature of arbitrariness is graphically illustrated by Wittgenstein’s example of a pointing finger (Wittgenstein, 1953). Adults almost universally interpret this as a directional symbol (if not as an accusation). But a child might interpret it quite differently, not atypically as an instruction to look at the finger. Formal computer programs, and structurally programmable systems as realizations of formal programs, are patterns of symbols none of which have intrinsic meaning. Any meaning in terms of reference or function that they can assume must be provided by an extrinsic source. If our universe is confined to a gigantic simulation on a digital machine, there is no possibility for the symbols to possess an interpretation. This is not the case when physical forces are pertinent; there is no arbitrariness as to direction in a gravitational field. The system operates not simply with symbols, but with physical processes and actions that are driven in a nonarbitrary way by principles of energy and entropy. The recognition of a substrate by an enzyme is an example; once the shapes are specified it cannot be said that the interaction is in any way arbitrary. Functionality of the molecules is inherent in their molecular structures. If our system is a (structurally nonprogrammable) molecular computer, we could not say that the strata of computationally relevant elements and processes in it have no intrinsic relations with one another, and therefore no intrinsic interpretation. The actual specification of the function would, of course, still depend on the broader context of performance. The second feature is the indefiniteness of the functional specification of any structure or process in an evolutionary system.” The key point is that structurally nonprogrammable systems not only allow for interpretation, but for multiple and transformable interpretation. This bears on the issue of fuzziness and ambiguity. Any given physical structure could perform different functions in different contexts. A rock could be used as a weapon, as a The use of the term “indefinite” in connection with the evolution process in due to Matsuno (1989).
316
MICHAEL CONRAD
paperweight, as a building material, to produce fire, and so forth. Similarly the functional description assigned to the emergent shape of a protein could change significantly in different contexts. Researchers in artificial intelligence have gone to great lengths to introduce this type of fuzziness,imprecision, and ambiguity into their programs in order to create the possibility of function change. The rationale is that such features could abet learning, pattern recognition in ambiguous environments,judgement, intuition, and, in general, processes involving analogy and metaphor (cj. Hofstadter, 1987). But conventional computers are in principle precise, and it is extremely difficult to build virtual machines on top of them that are in any way indefinite or ambiguous. Structurally nonprogrammable molecular computers naturally provide this feature, which is essential to the transformations of function that occur in the course of evolution. The connection between intrinsically ambiguous computing and evolvability is implicit in the tradeoff principle. Programmable systems are organized to preclude transformation of function, and as a consequence an enormous amount of computation must be expended to simulate the ambiguous mode of computing. The advantage is prescriptive programmability; the disadvantage is that cognitive capacities that draw on the multiple functionality of self-organizing structures in evolution are suppressed. The third feature is measurement. The M-m architecture is very similar to a quantum measurement system. The downward flow of signals from macroscopic to microscopic form corresponds to state preparation, and the upward flow to measurement proper. The important feature of a quantum measurement system is that the unpicturable microdescription must be converted (projected)into a classical description picturable in terms of definite positions and momenta. This is sometimes called the collapse of the wave function. It raises severe conceptual difficulties, since the equations of motion of quantum mechanics are time reversible, whereas the collapse process is not. We cannot explore this complicated issue here. Suffice it to say that many authors have noted the analogy between the transition from indefiniteness to definiteness that occurs in quantum measurement and the transition from indefiniteness to definitenessthat occurs in cognitive decision making (Pattee, 1973; Josephson, 1974; Stapp, 1985). This transition must occur if an M-m system is to exploit the unpicturability of its microphysical dynamics to control its macroscopic actions (Section 3.6). Many current models of cognition are deeply influenced by a computational paradigm. The assumption is that the digital computer should provide the paradigm. The above considerations suggest that a broader paradigm, incorporating features of intrinsic interpretation, intrinsic ambiguity, and measurement, may be more suitable. To what extent can these features actually be incorporated in practical molecular computer architectures? Macroscopic collectivedynamics captures
MOLECULAR COMPUTING
317
the interpretation and ambiguity features, but not the measurement aspect. This is also true for macroscopic optical signal integration. The nonprogrammable molecular signal integration modes have a microscopic aspect; the readout that makes this macroscopic supplies the ingredient necessary to capture the measurement aspect. Conformation-driven computing supplies all these aspects in the most complete way. It does not mean that measurement in the sense of wave function reduction actually occurs; this is an open question. It means that all the necessary ingredients for the features of intrinsic interpretation, functional ambiguity, and measurement are present. A biosensor could, in principle, incorporate all these features. The technical difficulty is not overwhelming. But this does not mean that the types of cognitive capacities that we associate with intelligence would manifest in the dramatic way that they do in highly evolved biological organisms. We recall our argument (in Section 4.4) that intelligence depends on the interplay of microscopic dynamics and macroscopic architecture. The neuromolecular architecture initially used to illustrate the full M-m architecture (Fig. 7) is obviously much more elaborate than a biosensor. It is a connectionist collection of conformation-driven and dynamic cellular processors, with elaborate memory manipulation and learning mechanisms that make it possible to mold internal molecular dynamics of the cells and orchestrate the repertoire of cells for coherent function. Models along these lines (of which many are possible) are better viewed as proto-brain models rather than as architectures to be implemented as a whole. They provide features that can be abstracted in virtual machines or in more fragmentary synthetic molecular computers, and they provide a basis for comparing the capability of natural and artificial systems. This is not simply because of the technological difficulties that they entail. The time required to evolve all the required pattern processing capabilities, even if facilitated by the intelligent choices of the evolutionary engineer, could be on the order of hundreds of thousands or millions of years.
8.
Conclusions and Prospects
Molecular computing is both a science and a technology. These two faces are highly synergetic. The attempt to synthesize biomimetic or de nouo molecular computing devices is an outgrowth of fundamental research in molecular and cellular biophysics, condensed-matter physics, polymer chemistry, neurophysiology, and computer science. But it is likely to lead to new insights into mechanisms and materials that bear importantly on these areas as well. Let us briefly summarize some of the main applications that molecular computing may have to the production of new science. We can include the
318
MICHAEL CONRAD
contribution of microphysical processes to information processing and control in biological cells, the dynamics of evolutionary processes (particularly in relation to biological organization), the development of new classes of chemical materials, new nonlinear aspects of physics and chemistry connected with the flow of information between microscopic and macroscopic forms, new insights into the quantum theory of measurement, new models of brain function and biological information processing, a more precise concept of computation and the linkage between the structure and function of computing systems, and a broader concept of cognitive computation. The question inevitably arises as to the technological prospects. The judgement of this author is that the field could achieve concrete technological results by following its present lines of development, even in the absence of any major breakthroughs in the areas of materials, techniques, or architectures. A large number of avid investigators have entered the field worldwide, and the distinct impression is that the general pace of development is increasing. The infrastructural support and the development of multidisciplinary teams is all important for the technological development. How fast matters progress will depend on these factors. There is no fundamental reason why primitive versions of all the basic arcbitectural types cannot eventually be fabricated and applied with advantage to suitably chosen problems. However, the unexpected is more likely; a significant advance could facilitate some particular direction of development and, if care is not taken, retard others. A recent report on the state of computer science (Denning et al., 1989) asserted that the key question in this field is: What can be efficiently automated? The science and technology of molecular computing has a bearing on this. According to the tradeoff principle, many processes that biological systems perform well, such as pattern recognition and learning, cannot be efficiently automated in structurally programmable machines. To the extent that structurally nonprogrammable systems, in particular molecular computers, can be fabricated, it should be possible to make new inroads into the class of problems that can be efficiently automated. If they cannot be fabricated, the clear implication is that some human computational functions are not good candidates for automation. In either case the theoretical, experimental, and technical sides of molecular computing provide a new and more comprehensive framework for the comparative computational analysis of brain and machine.
ACKNOWLEDGMENTS The preparation of this paper was supported in part by the US.National Science Foundation (Grant IRI 87-02600) and by the Digital Equipment Corporation. The author is deeply indebted to numerous colleagues throughout the world for valuable discussion.
MOLECULAR COMPUTING
319
REFERENCES Akingbehin, K., and Conrad, M. (1989).A hybrid architecture for programmable computing and evolutionary learning. J. Parallel and Distrib. Computing 6, 245-263. Ashby, W. R. (1956).“An Introduction to Cybernetics.” Wiley, New York. Aviram, A. (1988).Molecular components for electronics-concept and theory. In “Bioelectronic and Molecular Electronic Devices,” pp. 9-12. Research & Development Association for Future Electron Devices, Tokyo, Japan. Barraud, A. (1985). Langmuir-Blodgett active molecular assemblies designed for a specific function. In “Bioelectronic and Molecular Electronic Devices,” pp. 7-13. Research & Development Association for Future Electron Devices, Tokyo, Japan. Benioff, P. (1982).Quantum mechanical Hamiltonian models of Turing machines. J. Stat. Phys. 29 (3), 515-546. Bennett, C. H. (1973). Logical reversibility of computation. IBM J. Res. Den 17, 525-532. Birge, R. R.,Zhang, A. F., and Lawrence, A. F. (1989). Optical random access memory based on bacteriorhodopsin. In “Molecular Electronics: Biosensors and Biocomputers,” pp. 369- 379. (F.T. Hong, ed.). Plenum Press, New York. Boden, M. A. (1988). “Computer Models of Mind.” Cambridge University Press, Cambridge, England. Bohm, D. (1951).“Quantum Theory.” Prentice-Hall, Englewood Cliffs, New Jersey. Bohm, D. (1981). “Wholeness and the Implicate Order.” Routledge and Kegan Paul, London. Bremermann, H. J. (1962). Optimization through evolution and recombination. In “SelfOrganizing Systems,’’ pp. 93-106. (Yovits, Jacobi, and Goldstein, eds.). Spartan Books, Washington, D.C. Brillouin, L. (1962).“Science and Information Theory.” Academic Press, New York. Carter, F. L. ed. (1982).“Molecular Electronic Devices.” Marcel Dekker Inc., New York. Carter, F. L. (1983).The chemistry in future molecular computers. In “Computer Applications in Chemistry” (S. R. Heller and R. Potenzone, Jr., eds.), pp. 225-261. Elsevier Science Publishers, Amsterdam. Carter, F. L., Siatkowski, R. E., and Wohltjen, H., eds. (1988).“Molecular Electronic Devices.” North-Holland, Amsterdam. Chernavskii, D. S. (1986). Solitons on bacteriorhodopsin. Preprint 295, P.N. Lebedev Physical Institute, USSR Academy of Sciences, Moscow, USSR. Conrad, M. (1972). Information processing in molecular systems. Currents in Modern Biology (now BioSystems) 5 (I), 1-14. Conrad, M. (1974a).Molecular automata. In “Physics and Mathematics of the Nervous System” (M. Conrad, W. Giittinger, and M. Dal Cin, eds.), pp. 419-430. Springer-Verlag, Heidelberg. Conrad, M. (1974b).Molecular information processing in the central nervous system, parts I and 11. In “Physics and Mathematics of the Nervous System” (M. Conrad, W. Giittinger, and M. Dal Cin, eds., pp. 82-127. Springer-Verlag, Heidelberg. Conrad, M. (1974~). Evolutionary learning circuits. J . Theoret. Biol. 46, 167-188. Conrad, M. (1976a). Complementary molecular models of learning and memory. BioSystems 8, 119-138. Conrad, M. (1976b). Molecular information structures in the brain. J. Neurosci. Res. 2,233-254. Conrad, M. (1979a). Bootstrapping on the adaptive landscape. BioSystems 11, 167- 182. Conrad, M. (1979b). Mutation-absorption model of the enzyme. Bull. Math. Biol. 41, 387-405. Conrad, M. (1979~). Unstable electron pairing and the energy loan model of enzyme catalysis. J. Theor. B i d . 79, 137-156. Conrad, M. (1983). “Adaptability.” Plenum Press, New York. Conrad, M. (1984). Microscopic-macroscopic interface in biological information processing. BioSystems 16,345-363.
320
MICHAEL CONRAD
Conrad, M. (1985). On design principles for a molecular computer. Comm. ACM 28,464-480. Conrad, M. (1986).The lure of molecular computing. IEEE Spectrum 23,55-60. Conrad, M. (1987a). Molecular computer design: a synthetic approach to brain theory. In “Real Brains, Artificial Minds” (J. Casti and A. Karlqvist, eds.), pp. 197-226. North-Holland, Amsterdam. Conrad, M. (1987b). The water-membrane interface as a substrate for H+-H+ superflow. Int. J. Quantum Chem.: Quantum Biol. Symp. 14,167-188. Conrad, M. (1988a). The price of programmability. In “The Universal Turing Machine: A HalfCentury Survey” (R. Herken, ed.), pp. 285-307. Oxford University Press, New York. Conrad, M. (1988b). Quantum mechanics and molecular computing: mutual implications. Int. J. Quantum Chem.: Quantum Biol Symp. 15,287-301. Conrad, M. (1988~).Proton supermobility: a mechanism for coherent dynamic computing. J. Molec. Electronics 4, 57-65. Conrad, M., and Hong, F. T. (1985). Molecular computer design and biological information proccessing: an electrochemical and membrane reconstitution approach to the synthesis of a cellular automaton. In “Bioelectronic and Molecular Electronic Devices,” pp. 89-94. Research & Development Association for Future Electron Devices, Tokyo, Japan. Conrad, M., and Kirby, K. G. (1988). Harnessing the inner dynamics of neurons for the performance of complex tasks. In “Bioelectronic and Molecular Electronic Devices,” pp. 1-7. Research & Development Association for Future Electron Devices, Tokyo, Japan. Conrad, M. and Rosenthal, A. (1980).Limits on the computing power of biological systems. Bull. Math. Biol. 43, 59-67. Conrad, M., Kampfner, R. R., and Kirby, K. G. (1988). Neuronal dynamics and evolutionary learning. In “Advances in Cognitive Science’’ (M. Kochen and H. M. Hastings, eds.), pp. 169189. Westview Press, Boulder, Colorado. Dal Cin, M. (1979). “Fehlertolerante Systeme.” Teubner, Stuttgart. Davydov, A. S. (1985). “Solitons in Molecular Systems” (E. S. Kryachko, trans.). D. Reidel, Dordrecht. Denning, P. J., Comer, D. E., Gries, D., Mulder, M. C., Tucker, A., Turner, A. J., and Young, P. R. (1989). Computing as a discipline. Comm. ACM 32 (l),9-23. Dirac, P. A. M. (1958).“The Principles of Quantum Mechanics,” 4th ed. Oxford University Press, Oxford. Drummond, G. I. (1983). Cyclic nucleotides in the nervous system. In “Advances in Cyclic Nucleotide Research,” vol. 15 (P. Greengard and G. A. Robinson, eds.), pp. 373-494. Raven Press, New York. Ebeling, W., and Feistel, R. (1982). “Physik der Selbstorganisation und Evolution.” Akademie Verlag, Berlin. Ebeling, W., and Jenssen, M. (1988).Trapping and fusion of solitons in a nonuniform toda lattice. Physica D 32, 183-193. Edelman, G. M. (1978).Group selection and phasic reentrant signalling-a theory of higher brain function. In “The Mindful Brain” (G. M. Edelman and V. B. Mountcastle, eds.), pp. 51-100. MIT Press, Cambridge, Massachusetts. Elliot, H. C. (1969). “Textbook of Neuroanatomy.” J. B. Lippincott, Philadelphia. Feynman, R. P. (1986). Quantum mechanical computers. Found. Phys. 16 (6), 507-531. Fong, P. (1962). “Elementary Quantum Mechanics.” Addison-Wesley, Reading, Massachusetts. Friend, R. H. (1988). Optical investigations of conjugated polymers. J . Molec. Electronics 4, 37-46. Frohlich, H. (1983).Evidence for coherent excitation in biological systems. Int. J. Quantum Chem. 23, 1589-1595. Gardner M. R., and Ashby, W. R. (1970). Connectance of large dynamical (cybernetic) systems: critical values for stability. Nature 228, 784.
MOLECULAR COMPUTING
321
Gibson, J. J. (1966). “The Senses Considered as Perceptual Systems.” Houghton-Mifflin, Boston. Gilmanshin, R. I., and Lazarev, P. I. (1987). Biotechnology as a source of materials for electronics. Biotekhnologiya 3 (4), 421-432. Gilmanshin, R. I., and Lazarev, P. I. (1988). Molecular monoelectronics. J . Molec. Electronics 4, 583-590.
Goel, N. S., and Thompson, R. L. (1988). “Computer Simulations of Self-organization in Biological Systems.” Croom Helm, London and Sydney. Goldstein, H. (1950) “Classical Mechanics.” Addison-Wesley, Reading, Massachusetts. Gould, S. J., and Eldridge, N. (1977). Punctuated equilibrium: the tempo and mode of evolution reconsidered. Paleobiology 3, I 15- 15 1. Greengard, P. C. (1978). “Cyclic Nucleotides, Phosphorylated Proteins and Neuronal Function.” Raven Press, New York. Gulyaev, Y. V., Sandomirskii, V. B., Sukhanov, A. A., and Tkach, Y. Y. (1984). Physical limitations on miniaturization in microelectronics. Sou. Phys. Usp. 27 (1 l), 868-880. Hameroff, S. R. (1987). “Ultimate Computing.” North-Holland, Amsterdam. Hasting, H. M. (1982). The May-Wigner stability theorem. J. Theor. Biol. 97, 155-166. Hastings, H. M., and Waner, S. (1984). Low dissipation computing in biological systems. BioSystems 17,241-244. Hebb, D. 0.(1949). “The Organization of Behavior.” Wiley, New York. Hofstadter, D. (1979). “Godel, Escher, Bach: An Eternal Golden Braid.” Basic Books, New York. Hofstadter, D. (1987). “Ambigrammi.” Hopeful Monster editore, Firenze. Holland, J. H. (1975). “Adaptation in Natural and Artificial Systems.” University of Michigan Press, Ann Arbor, Michigan. Hong, F. T. (1986). The bacteriorhodopsin model membrane system as a prototype molecular computing element. BioSystems 19, 223-236. Hong, F. T. (1989) “Molecular Electronics: Biosensors and Biocomputers.” Plenum Press, New York. Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. 79, 2554-2558. Josephson, B. D. (1974). The artificial intelligence/psychology approach to the study of the brain and nervous system. In “Physics and Mathematics of the Nervous System” (M. Conrad, W. Guttinger, and M. Dal Cin, eds.), pp. 370-375. Springer-Verlag, Heidelberg. Kirby, K. G., and Conrad, M. (1984). The enzymatic neuron as a reaction-diffusion network of cyclic nucleotides. Bull. Math. Biol. 46, 765-782. Kirkpatrick, F. H. (1979). New models of cellular control: membrane cytoskeletons, membrane curvature potential, and possible interactions. BioSystems 11 (2, 3), 85-92. Koruga, D. (1974). Microtubule screw symmetry: packing of spheres as a latent bioinformation code. Ann. N Y Acad. Sci. 466,953-955. Krinsky, V. I., Medvinskii, A. B., and Panfilov, A. V. (1986). Evolution of autowave vortices. Mathematicslcybernetics 8, 3-48 (in Russian). Kuhn, H. (1985). Molecular engineering-a begin and an endeavor. In “Bioelectronic and Molecular Electronic Devices,” pp. 1- 6. Research & Development Association for Future Electron Devices, Tokyo, Japan. Kuhnert, L. (1986) Photochemische manipulation von chemischen wellen. Naturwissenschafen 73,96-97.
Landauer, R. (1982). Uncertainty principle and minimal energy dissipation in the computer. Int. J. Theoret. Phys. 21 (3-4), 283-297. Lawrence, A. F., McDaniel, J. C., Chang, D. B., and Birge, R. R. (1987). The nature of phonons and solitary waves in alpha-helical proteins. Biophys. J . 51, 785-793. Liberman, E. A. (1979). Analog-digital molecular cell computer. BioSystems 11 (2, 3), 111-124.
322
MICHAEL CONRAD
Liberman, E. A., Minina, S. V., and Golubtsov, K. V. (1975). The study of the metabolic synapse 11: Comparison of cyclic 3’,5’-AMP and cyclic 3’,5’-GMP effects. Biophysics 22,75-81. Liberman, E. A., Minina, S. V., Shklovsky-Kordy, N. E., and Conrad, M. (1982). Microinjection of cyclic nucleotides provides evidence for a diffusional mechanism of intraneuronal control. BioSystems 15, 127-132. Liberman, E. A., Minina, S. V., Mjakotina, 0.L., Shklovsky-Kordy, N. E., and Conrad, M. (1985). Neuron generator potentials evoked by intracellular injection of cyclic nucleotides and mechanical distension. Brain Res. 338, 33-44. Likharev, K. K. (1987). Possibility of creating analog and digital integrated circuits using the discrete, one-electron tunneling effect. Sou. Microelectr. 16 (3), 109- 120. Matsumoto, G., Tsukita, S., and Arai, T. (1989). Organization of the axonal cytoskeleton: differentiation of the microtubule and actin filament arrays. In “Cell Movement, Volume 2: Kinesin, Dynein, and Microtubule Dynamics,” pp. 335-356. Alan R. Liss, Inc., New York. Matsuno, K. (1989). ”Protobiology: Physical Basis of Biology.” CRC Press, Boca Raton, Florida. May, R. M. (1973). “Stability and Complexity in Model Ecosystems.” Princeton University Press, Princeton, NJ.wJersey. Maynard-Smith, J. (1970). Natural selection and the concept of a protein space. Nature 225, 563-564. Mayr, E. (1963). “Animal Species and Evolution.” Harvard University Press, Cambridge, Massachusetts. McCulloch, W. S., and Pitts, W. (1945). A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5, 115-133. Menen, A,, Holewka, D, and Baird, B. (1984). Small oligomers of immunoglobin E (IgE) cause large-scaleclustering of IgE receptors on the surface of rat besophilic leukemia cells. J . Cell Biol. 98, 577-583. Minsky, M. L. (1961). Steps toward artificial intelligence. Proc. Inst. Radio Engineers 49, 8-30. Minsky, M. L., and Papert, S. (1969). “Perceptrons: An Introduction to Computational Geometry.”MIT Press, Cambridge, Massachusetts. Moriizumi, T. (1985). Solid state biosensors. In “Bioelectronic and Molecular Electronic Devices,” pp. 73-80. Research & Development Association for Future Electron Devices, Tokyo, Japan. Nagle, J. F., and Nagle, T. (1983). Hydrogen-bonded chain mechanisms for proton condensation and proton pumping. J . Membrane Biol. 74, 1-14. Nicolis, G., and Prigogine, I. (1977). “Self-Organization in Nonequilibrium Systems.” WileyInterscience, New York. Okamoto, M, Sakai, T., and Hayashi, K. (1987). Switching mechanism of cyclic enzyme system: role as a “chemical diode.” BioSystems 21, 1-1 1. Pattee, H. H. (1973). Physical problems of decision-making constraints. In “Physical Principles of Neuronal and Organismic Behavior” (M. Conrad and M. Magar, eds.),pp. 217-225, Gordon and Breach, New York. Popp, F. A. (1988). Biophoton emission. Experientia 44 (7), 543-544. Potember, R. S., Hoffman, R. C., Kim, S. H., Speck, K. R., and Stetyick, K. A. (1988). Molecular optical devices. J. Molec. Electronics 4 (I), 5-16. Rambidi, N. G., Zamalin, V. M., Sandler, Y. M., Todua, K. S. and Kholmanskii, A. S. (1987). “A Molecular-Elemental Basis for Promising Information-Logic Devices.” Moscow, VINITI 22. Rescigno, A., and Richardson, I. W. (1973).The deterministic theory of population dynamics. In “Foundations of Mathematical Biology” (R. Rosen, ed.), pp. 283-359. Academic Press, New York. Rizki, M. M., and Conrad, M. (1985). EVOLVE 111: a discrete events model of an evolutionary ecosystem. BioSystems 18, 121- 133.
MOLECULAR COMPUTING
323
Robison, G . A,, Butcher, R. W., and Sutherland, E. W. (1971). “Cyclic AMP.” Academic Press, New York. Rosen, R. (1970).“Dynamical System Theory in Biology.” John Wiley and Sons, New York. Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psych. Rev. 65, 386-407. Rosenblatt, F. (1962). “Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.” Spartan Books, Washington, D. C. Rossler, 0. E. (1974a). Adequate locomotion strategies for an abstract organism in an abstract environment-a relational approach to brain function. In “Physics and Mathematics of the Nervous System” (M. Conrrid, W. Guttinger, and M. Dal Cin, eds.), pp. 342-369. SpringerVerlag, Heidelberg. Rossler, 0. E. (1974b). Chemical automata in homogeneous and reaction-diffusion kinetics. In “Physics and Mathematics of the Nervous System”(M. Conrad, W. Guttinger, and M. Dal Cin, eds.), pp. 399-418. Springer-Verlag, Heidelberg. Rossler, 0. E. (1983).The chaotic hierarchy. Z . Naturforsch. 38a, 788-801. Rumelhart, D. E., and McClelland, J. L., eds. (1986). “Parallel Distributed Processing: Explorations in the Microstructure of Cognition.” MIT Press, Cambridge, Massachusetts. Sagiv, J. (1988).Progress in the synthesis of planned layered organizates of organic molecules via chemically controlled self-assembly. I n “Bioelectronic and Molecular Electronic Devices,” pp. 13- 14. Research & Development Association for Future Electron Devices, Tokyo, Japan. Schneiker, C., HameroK, S., Voelker, M., He, J., Dereniak, E., and McCuskey, R. (1989). Nanoelectronics and scanning tunneling engineering. In “Molecular Electronics: Biosensors and Biocomputers” (F. T. Hong, ed.). Plenum Press, New York. Shannon, C. E. (1948). The mathematical theory of communication. Bell System Tech. J . 27, 379-423; 623-656. Singer, S. J., and Nicholson, G. L. (1972). The fluid mosaic model of the structure of cell membrane. Science 175,720-731. Smalz, R., and Conrad, M. (1990).A credit apportionment algorithm for evolutionary learning with neural networks. To appear in Neurocomputers and Attention. Manchester University Press. Stapp, H. (1985). Consciousness and values in the quantum universe. Foundations of Physics 15, 35-47. Stebbins, G. L. (1950). “Variation and Evolution in Plants.” Columbia University Press, New York. Street, G. B., and Clarke, T. C. (1981).Conducting polymers: a review of recent work. IBM J . Res. Dev. 25,51-57. Su, W. P., Heeger, A. J., and Schrieffer, J. R. (1979).Solitons in polyacetylene. Phys. Rev. Lett. 42, 1698- 1701. Sugi, M. (1988). Langmuir-Blodgett films for molecular electronics-recent trends in Japan. In “Molecular Electronic Devices” (F. L. Carter, R. E. Siatkowski, and H. Wohltjen, eds.),pp. 4 4 481. North-Holland, Amsterdam. Thom, R. (1970).Topological models in biology. In “Towards a Theoretical Biology,” vol. 3 (C. H. Waddington, ed.), pp. 89-1 16. Edinburgh University Press, Edinburgh. Tien, H. T. (1988). Ultrathin bilayer film: an experimental approach to biomolecular electronic devices. I n “Molecular Electronic Devices” (F. L. Carter, R. E. Siatkowski, and H. Wohltjen, eds.), pp. 209-226. North-Holland, Amsterdam. Tien, H. T., Salamon, Z., Kutnik, J., Krysinski, P., Kotowski, J., Ledermann, D., and Janas, T. (1988). Bilayer lipid membranes (BLM): an experimental system for biomolecular electronic device development. J . Molec. Electronics 4. Trenary, R., and Conrad, M. (1987). A neuron model of a memory system for autonomous
324
MICHAEL CONRAD
exploration of an environment. I n “Intelligent Autonomous Systems” (L.0. Hertzberger and F. C. A. Green, eds.), pp. 601-609. North-Holland, Amsterdam. Triestman, S.N., and Levitan, I. B. (1976). Alteration of electrical activity in molluscan neurons by cyclic nucleotides and peptide factors. Nature 261,62-64. Volkenstein, M. V. (1982). “Physics and Biology.” Academic Press, New York. von Foerster, H., and Zopf, G. W., eds. (1962). “Principles of Self-organization: Transactions of the University of Illinois Symposium on Self-organization.” Pergamon Press, New York. von Neumann, J. (1966). “Theory of Self-reproducing Automata.” (A. W. Burks, ed). University of Illinois Press, Urbana. Vsevolodov, N. N., Ivanitskii, G. R., Soskin, M. S.,and Taranenko, V. B. (1986). Biochrome films: reversible media for optical recording. Optoelect. Instruments. Data Proc. 2,41-48. Winfree, A. T. (1972). Spiral waves of chemical activity. Science 175,634-636. Winograd, S.,and Cowan, J. D. (1963). “Reliable Computation in the Presence of Noise.” MIT Press, Cambridge, Massachusetts. Wittgenstein, L.(1953). “Philosophical*Investigations.”Macmillan & Co., London. Wolfram, S. (1984). Cellular automata as models of complexity. Nature 311,419-424. Wolpert, L. (1969). Positional information and the spatial pattern of cellular differentiation. J . Theoret. B i d . 25, 1-47. Wright, S.(1932). The roles of mutation, inbreeding, cross-breeding, and selection in evolution. Proc. Sixth Int. Cong. Genet. 1, 356-366. Yates, F. E., ed. (1984). “Report on the Conference on Chemically Based Computer Design.” Crump Institute for Medical Engineering Report CIME TR/84/1, University of California, Los Angeles. Yovits, M. C., and Cameron, S., 4 s . (1960). “Self-organizing Systems: Proceedings of the Interdisciplinary Conference.” Pergamon Press, New York. Yovits, M. C., Jacobi, G. T., and Goldstein, G. D., eds. (1962). “Self-OrganizingSystems.” Spartan Books, Washington, D. C.
Foundations of Information Science ANTHONY DEBONS Robert Morris College Coraopolis, Pennsylvania and University of Pittsburgh Pittsburgh, Pennsylvania Prologue . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . 2. Essences:The Nature of Information . . . . . . 2.1 Measurement. . . . . . . . . . . . . . 3. Structure: The Science of Information . . . . . . 3.1 Historical Perspective . . . . . . . . . . . 3.2 Surveys. . . . . . . . . . . . . . . . 3.3 Academic Development . . . . . . . . . 3.4 Literary Sources. . . . . . . . . . . . . 3.5 Research Activities . . . . . . . . . . . . 3.6 Summary. . . . . . . . . . . . . . . 4. Synthesis: On a Theory of Foundations . . . . . 4.1 Counting . . . . . . . . . . . . . . . 4.2 Recording. . . . . . . . . . . . . . . 4.3 Data-Information-KnowledgeSystems . . . . 5. Overview. . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . References. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
325 326 327 332 338 338 342 345 352 359 360 363 363 363 364 369 370 371
Prologue
A perennial question posed by individuals both inside and outside the field of information science concerns its nature: What is it? What are its essences, its structures, its boundaries? (Borko, 1989). The study of information can be traced to antiquity, to the Egyptians, and to the Greek and Roman philosophers and scholars concerned with the nature of knowledge. Contemporary information science arose from a distinct set of circumstances created by the scientific renaissance of the present century spurred by the launching of Sputnik. Advances in electronics, referred to as the “communication revolution,” increased the ability to transmit data on an event for processing quickly and for greater distances. Meanwhile, advances in computers provided a greater capacity for accessing, storing, retrieving, or in general processing the data or the account 325 ADVANCES IN COMPUTERS, VOL. 31
Copyright 01990 by Academic Press, Inc. All rights of reproduction in any form reserved.
ISBN 0-12-012131-X
326
ANTHONY DEBONS
(record) of the event. These advances provided the possibility of new approaches to the marshalling of “knowledge power” (human understanding and meaning) which caught the attention of certain scholars. These scholars pioneered in developing concepts and mechanisms for dealing with such power. Studies were initiated on the practices then used in generating and processing of the human record of experience (documentation). The objectives of these studies were to determine the best environment that could meet the needs of a diverse constituency of users in need of the resource. These studies have been claimed as engineering, and this orientation remains to the present. Thus the basic nature or essence of information, or for that matter Information Science, remains secondary to the practical matters involved in the management of the knowledge resource. We see this emphasis reflected in the research activity, in the technical publications representing the field and in the educational programs established to prepare individuals for careers in the science and profession. Advances in computers have also served to bolster the recognition of the importance of information and knowledge in the management of organizations and institutions at large. More and more information scientists are sought to develop, design and implement information systems that aid organizations in the planning, operating and control functions. Management Information Systems, Decision Support Systems, and Expert Systems become part of the professional landscape of information scientists and technologists. This evolution in the course of development of information science is compatible with the holistic philosophy represented, in part, through the construct of a system. The notion that the whole is greater the sum of the parts is fundamental to the system construct. In essence, the system construct provokes a search for those principles, theories and laws that conceivably govern form. In the idea that information flow enables the interdependent functioning of the structures or components within the system, information scientists can begin to establish fundamental principles that govern the dynamics of the flow. From this activity, principles on the design of information systems will emerge. Through these design principles, a synthesis of the counting and recording functions that have been alleged to be the foundations of the science could be achieved. Through this synthesis, the science could conceivably contribute to our present meager understanding of form and ultimately to a better understanding of human awareness or information. 1.
Introduction
The charge of this report is to explicate the foundations of information science, the essence of which remains the subject of continued dialogue and debate. Several essays have been written on the subject (Shera and Cleveland,
FOUNDATIONS OF INFORMATION SCIENCE
327
1977; Debons, 1980;Herner, 1984; Heilprin, 1989 and others). Several volumes of a publication titled Foundations in Library and Information Science (Stuart, 1978)exist, containing many articles that are intended to represent the subject. Of course, the subject of “foundations” of anything can be a matter of perspective and, at times, semantics. For example, the term “foundation” may refer to a benevolent organization, as in the novel Foundation by the noted science fiction writer Isaac Asimov. On the other hand, “foundation” can refer to the theories, laws, and principles that are part of a particular field or discipline. Foundations could also include beliefs, conceptual schemes or paradigms as referred to by Kuhn (1962). At times, the history of a field is considered to be its foundations. In this report, a statement of the foundations of information science is presented as one perspective among a number of possible alternative perspectives. In pursuing this goal, the plan is to account first for those efforts that have been applied toward determining the nature of the term “information,” before the question is raised as to whether information can or cannot be a matter of science. Thus, Section 2 is directed toward estimating the intellectual boundaries of the science and will include matters of definition and measurement of information. Section 3 deals directly with information science as a discipline. As in issues on the nature of information, the answers to the question as to whether information science is art, science, or engineering can be quite diverse and extensive. The development of the field from a historical perspective is presented first, followed by the findings of a number of surveys that have attempted to determine whether a consensus exists among the membership of the science on the nature of their science or field. The formal preparation of those who practice in the field can be a source for identifying the foundation of a science. Core courses of educational programs in information science are examined for this purpose. In addition, published texts, essays, and key papers are examined, as well as the research activities of scholars identified with the field. Finally, in Section 4, an attempt is made to synthesize the respective aspects of the science as treated in this paper.
2.
Essences: The Nature of Information
The terms “information” and “knowledge” are at times used interchangeably. In this sense one can say that interest in the nature of information has a long history. The slogan “information is power” shows the synonymity of the terms, in that the expression can be traced back to Francis Bacon’s dictum that “knowledge is power,” made in 1597. Although Bacon’s perception of knowledge as power has remained unchallenged, concern with knowledge
328
ANTHONY DEBONS
per se has occupied the attention of philosophers, particularly epistemologists, for centuries. On the other hand, concern with the meaning of information is of more recent origin, traced essentially to the post World War era. There have been several attempts at defining information and related terminology.’ Efforts at standardizing the terminology were made by professional associations (Hopper, 1954) and international standards organizations such as the British Standards Institution, which provided a glossary of terms used in automatic data processing. These terms were further elaborated by the International Federation for Information Processing/International Computation Center (IFIP/ICC) vocabularies in 1966. In 1970, the American National Standard Vocabulary for Information Processing of the American National Standards Institute was adopted by the International Standards Organization (ISO). This organization differentiated data from information and provided the following definitions (Theiss, 1983): Data. A representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing for communication by humans or by automatic means. Information (in data processing). The meaning that a human assigns to data by means of the human conventions used in their representation.
Prior to these efforts at establishing definitions, Shannon and Weaver (1949) published their seminal report, The Mathematical Theory of Communication, which postulated that information represented a reduction in the uncertainty of the receiver. Information was considered in the context of energy across a specific transmission channel. The report precluded the consideration of the meaning of the transmitted message in the equation representing the measurement of information. Despite the standardization of terminology that was attempted and the conceptual structuring that Shannon’s theory provided, surveys undertaken of professionals who identified themselves with the field did not suggest that such efforts promoted a consensus among these professionals as to what information meant (Wellisch, 1972; Houser, 1988). On the other hand, if the technical literature on information and information science is taken into account, then Shannon’s definition of information as a reduction of uncertainty in a receiver of a message has fared much better than others in arriving at consensus.
’
Many definitions of information have been proposed that precede those cited here. For example,the philosophical literature is replete with definitional issues (i.e.,Morris (1947),Carnap and Bar-Hillel (1964),etc.). Linguists such as Bar-Hillel (1957), Winograd (1972),and others have applied linguistic assessment to terms such as “data” and “information.”Also, the literature on communication can be cited as relevant (Cherry, 1966).
FOUNDATIONS OF INFORMATION SCIENCE
329
Meanwhile, the striving to understand the nature of information has continued unabated. Numerous international conferences (Schrader, 1984; Debons, 1974; Debons and Cameron, 1975; Debons and Larson, 1983) have addressed the subject, leading some to assert that the pursuit of an ontological definition of information should be abandoned. As a matter of fact, it was suggested that the word "information" should be obliterated from the language (Fairthorne, 1975). Nevertheless, the general attention to the subject has led to certain perspectives, namely that information is an abstraction similar to matter or energy (Wright, 1979); that information can be best understood through an understanding of language (Zunde and Gehl, 1979), particularly signs and symbols; and that information is a matter of the mind and therefore its nature can be understood through an understanding of how the mind processes the inputs from the external world. These perspectives are not mutually exclusive, and variations exist in expressing the principles that relate to them. There are those who claim that information is a term that is used to refer to both mind (process) and substance (commodity) (Debons and Cameron, 1975). Despite the variations, however, the appeal is to dissect whatever information means to an irreducible element-an element that can be tested and measured. The position that information is similar to matter and energy (Yovits, 1969; Otten and Debons, 1970; Tribus and McIrvine, 1971; Hoffman, 1980, Harmon, 1986) has its genesis in the second law of thermodynamics. This position renders information and energy as analogous constructs to the extent that they both represent abstractions. In this sense, it should be noted that science is replete with abstractions and in certain disciplines is structured around them (e.g., life, mind, etc.) (Beniger, 1986). They defy definition in the common way that definitions are considered. Yet the intrinsic nature of information and energy, as the argument goes, can be postulated by the outcomes generated from the quantification of certain operations (rules). For example, we understand the nature of energy when we consider the amount of energy needed to lift a pound of cement. The ability to lift infers the amount of energy needed. Likewise, we can estimate the ability to make a decision, given certain options, on the amount of information needed. The fundamental point in this position is that the state of nature can be expressed thorough a set of rules (algorithms) directed at achieving certain specified outcomes. In this connection, Otten and Debons (1970) pose the following questions: Does information represent a fundamental and universal phenomenon similar to matter and energy? And are the various operations performed on information based on fundamental phenomena and are there hence any different forms of some fundamental relations? As evidence for the fundamental nature of information and information processing, Otten and Debons refer to the ability of the computer to translate complex information processing tasks into sequences of elementary operations.
330
ANTHONYDEBONS
Only an increment away from the matter-energy postulates is the view that information can be understood by understanding language, particularly signs and symbols (the science of semiotics) (Pearson and Slamecka, 1977). Conforming to Shannon’s view that information can be expressed as the probability of occurrence of signs and symbols in messages (referred to as semiotic analysis), quantitative functions can be derived that establish and support theories about information. The various orderings- the sequencing of signs and symbols that constitute “software”-allow computer simulation applications in an attempt to establish formal relations between the rules (structure of the software) and the outcomes of the communicative process as could be predicted from information theory. “[Wle must see to understand. Unlike the physical world, information structures and software are invisible: to be seen they must first be represented in some “notation” (Davis and Zunde, 1984, pp. 1-18). This view of software (rule) structure and communication enables the generation of a number of “laws,” laws that are now part of the information science literature. These “laws” deal essentially with the generation, dissemination, and use of information (Zipf, 1949; Bradford, 1950; Lotka, 1976). These laws can be subject to semiotic analysis and thus are considered basic for an understanding of information (Zunde, 198 1). Another application of linguistic analysis to the question as to the nature of information is more philosophical. The initial discourses and contemplations on information have given birth to expressions that characterize information as “process,” “commodity” (Debons, 1975),“environment” (Rathswohl, 1983), ‘‘.image,” “event” (Pratt, 1975), and “thesaurus” (Schreider, 1965). Each of these concepts has been studied by Fox (1983) on the grounds that any understanding of information needs to conform to how it is commonly used. Fox applied the concept of semantic ascent (Wittgenstein, 1953; Quine, 1961; Austin, 1962), wherein the intrinsic properties of words and the objects they represent can be understood through their use in particular sentence structures. Because language is considered to be a “rule governed activity” (Fox, 1983, p. 22), a determination can be made as to the appropriateness of the use of the term “information” as commonly used in a sentence (sentential analysis). Through the use of deduction, the various definitions and expressions of information can be tested for logical consistency and validity. The analysis undertaken by Fox of the respective terms (e.g., image, process, event, etc.) led him to conclude that none could be defended against the criterion of common usage. Fox proposes that information be considered as a proposition-an assertion about the state of the world. When subjected to sentential analysis, this proposal as well was found wanting (Deer, 1985). This assessment led to the support of earlier claims (re: matter-energy) that information constitutes “an abstract phenomena and that it is concerned with the resolution of uncertainty in a sense that it is a record of resolved uncertainty” (Deer, 1985, p. 498).
FOUNDATIONS OF INFORMATION SCIENCE
331
Whether or not the nature of information analogically conforms to principles that can be applied to matter and energy, or to that which is embodied in symbols used to identify states of reality, or, for that matter, to the intrinsic properties of how these symbols are used in communications, it seems there are certain basic “givens.” Information, whatever its nature, is a requirement of all organisms. It can be equated to a state of awareness. It is synonymous to organismic consciousness. It is this consciousness that provides for the organism’s development and survival. With the evolution of the species, this consciousness is extended by the organism’s ability to see meaning of events. Fox (1983) perceives this to be the crucial problem of information science: “Without an adequate theory of meaning, we lack the means to determine accurately what information is carried by a given sentence or set of sentences. Although rarely recognized as such, the problem of meaning is thus one of the central obstacles to progress in information science (p. 97).”
Brookes (1980) expresses these considerations in a discussion of the foundations of information science. Brookes who is a bibliometrician, mathematician, and information scientist, refers to the philosopher Popper (1972), whose ontological scheme conceptualizes three worlds of reality: a physical world (environment in the global sense); the world of human subjective knowledge or mental states; and the world of objective knowledge, which constitutes the product of the mind.* These three worlds, Brookes claims, are independent, but they also interact. This postulation allows information to be a multifaceted phenomenon in which matters of mind are considered in relation to matters that are part of the recording and preserving of the product of the mind (knowledge). The latter exist in the physical world. From this perspective, the integration of our understanding of the properties of the mind (i.e., meaning) with our understanding of how best to deal with the products from these properties of the mind seems to be basic to any understanding of information (Debons, 1974; Wright, 1979; Fox, 1983). From the preceding excursion, what can be concluded about the term “information”? It seems to offer no problem for the average lay person who uses the term loosely to conform to the venacular-namely some fact or news, whether on paper, in books or in someone’s head. Information is taken to be part of living and doing one’s business. At the professional level, however, the term demands greater precision if some understanding of the phenomenon is to be achieved. It is this demand for precision that presents a challenge to the information scientist. Thus, the attempt to define information is relevant to
* Other philosophers and scholars have alluded to the subjective-objective dichotomy (Furth, 1969; Monasterio, 1975; Piaget, 1971; Polanyi, 1958; Wright, 1979; and many others).
332
ANTHONY DEBONS
our understanding of the foundations of the science. If information is claimed to be basic (if not synonymous) to life (Beniger, 1986; Miller, 1988; Debons, 1986), then the foundations of the science could rest on many disciplinesdepending on the issues that are raised. 2.1
Measurement
Definition and measurement of phenomena are inextricably related. Measurement of a phenomenon depends, in general, on its definition. To measure means that an a priori concept of the phenomenon exists. There are two schools that represent the interests of information scientists for measurement. There are those who see the study of information as following the path or model of the physical sciences-a matter of identifying critical variables of the phenomenon that can be quantified and are subject to experimentation. There are those who are concerned with the day-to-day field or logistical problems that arise when dealing with information. Information (fact, news, account, etc.) in this case is considered to be something contained in a conveyer. The conveyers can be objects such as paper packages (records, books, documents, etc.), film, electronic devices (displays),etc. Conveyer can even be humans who hold such facts, news, etc. in their heads! Measurement is conceived in terms of efficiency and effectiveness with which access to these conveyers of information provide some sort of value (Taylor, 1986) to those who use them when needed. 2.1.1
The Physical Science Model
As has been discussed, some perceive information in physical terms, as matter, energy, and meta-energy (Yovits, 1969; Otten and Debons, 1970, Harmon, 1971), and thus arrive at a concept of measurement that corresponds to that of the physical sciences. The following quotation is an example: There would also be a physically reliable method of knowing when a given quantity of observed information is greater than, equal to, or less than a fixed point on a continuous scale.. . The International System (SI) of units can be mapped onto information phenomena, to the end that information measures might be developed which are truly commensurable with SI units. SI units can be expressed in terms of the meter-kilogram-seconds (MKS)system or the centimeter-gram-second (cgs) system and thus applied to the measuring of differencesin the behavior of systems as a consequence of information use. Once these SI units are used they can be transplanted into a fully commensurable set of other basic or derived SI units (Harmon, 1986, p. 103).
FOUNDATIONS OF INFORMATION SCIENCE
333
Perhaps the most influential contribution to measurement in information science is that provided by Claude Shannon (1948) referred to previously. Based on mathematical theory (Nyquist, 1924; Hartley, 1928), Shannon considered information to be the logarithm of the number of symbols available to the sender and subsequently acquired by a receiver. Shannon formalized this definition by postulating a measure of information based on the probability of the occurrence of a particular signal from a number of possible signals over a particular transmission channel having certain capacity and noise potential. Shannon linked this concept to the state of the receiver expressed as the extent of uncertainty that the message reduced in the receiver. The degree of reduction of uncertainty as the result of the nature of the content (code structure, expressed in bits) of the message is the measure (amount)of information. The following equation, now classical, expressed this relationship, in which H is the amount of information and p the probability of choices. The amount of information ranges from zero to log2(l/p). When one event is certain to occur the amount of information is zero. When all events are equally likely to occur the information is maximized.
where
H
= amount
of information
p = probability of an event. In this work, the cognitive (thinking, meaning) nature of uncertainty was not initially considered. Later, Shannon and Weaver (1949) attempted to expand the concept to include meaning. Nevertheless, Shannon’s information theory remains applicable to the transmission of a coded signal over a particular channel. Despite its focus on transmission engineering, Shannon and Weaver’s work has evoked applications in a number of fields (psychology,economics, etc.). In information science, Shannon’s influence is expressed in numerous ways, ranging from discourses that relate the nature of information to states of certainty (Deer, 1985;Brookes, 1980),to communication (Heilprin, 1985),and others (Belkin, 1978).The works of Slamecka (1980) and Zunde (1981) at The Georgia Institute of Technology and that of Yovits (1981), formerly at the Ohio State University and presently at Purdue University, represent a sample of efforts illustrating and extending Shannon’s measure of information to specific applications. In referring to information theory, Zunde (1981) states that “although the postulates of the theory can be demonstrated for certain physical properties, these same postulates may not be directly applied to information, inasmuch as there is no empirical law of any kind relating information associated with an event to its probability.”
334
ANTHONY DEBONS
Nevertheless, the postulates of Shannon and Weaver’s theory can be applied to physical conveyers of information-namely signs, sign formations, and codes. These conveyers are elements of communication where physical laws can be applied. Thus Zunde interprets Shannon’s theory as a theory of “information potential of sign vehicles.’’This in turn provides the basis for the application of semiotic analysis of information potential and its measurement. Through this reasoning, Zunde refers to certain prescriptions that are considered as foundational to the study of information as an empirical science. Each of these prescriptions (referred to as “regularities” or “laws”) yields quantitative indicators for the specific phenomena in question. Zunde’s specific reference is applied to such phenomena as the relation that exists between the number of letters used in a word and the amount of (potential) information conveyed (Zipf, 1949; Mandelbrot, 1966); the number of periodicals that publish articles on a specific subject (Bradford, 1950) and the number of scientists who publish a number of papers in a particular field (Lotka, 1976); word association tests (Skinner, 1937); human reaction time; and the indexing of documents, among others. These approaches in no way exhaust the extensive research efforts engaged in at The Georgia Institute of Technology by Slamecka, whose research program has extended over a number of decades and who presently is applying efforts to establish quantitative measures to guide computer software development (Slamecka, 1980); Zunde and Gehl, 1979. Like Zunde, Yovits’s work is based on a strong conviction that information can best be understood by experimentation after quantifiable parameters have been established. Again, Shannon’s information theory provides a base for the quantification through its concept of uncertainty, which links information to decision-making. Yovits (1969) claims that information is a property of the entire environment and that decision making is the overt act that deals with this uncertainty. Yovits (1969) provides two concepts that are important to the measurement and understanding of information. First, Yovits introduces the term “informon” as a unit of information-some element of energy that is manifested in some observable action that can be measured (resulting from a de~ision).~ The second concept pertains to the detailing of a model of a “generalized information system.” Y ovits’s model represents a closed system where in the various components provide feedback to each other, thus maintaining the equilibrium of the entire system. Yovits’s research attempts to show the relationship between the decision maker’s effectiveness (DME) as a Yovits and his colleagues define an informon as the amount of information required to change a decision-a discrete process. Their more recent research considers the probabilitiesof making different decisions-a continuous process. Thus in his more recent work, Yovits has not used the concept of information.
FOUNDATIONS OF INFORMATION SCIENCE
335
function of a number of factors that influence the flow of information through the system. Yovits links decision-making effectivenesswith the value achieved through the use of data. Learning and a state of confidence are attained by feedback which is the result of the use of data in making a decision when confronted with the possibility of several courses of action. The importance of this research is its objective of establishing functions that are quantitative in nature and that can be replicated either through simulation or in the field when decision making is operational. Meanwhile, there have been other applications of Shannon’s uncertainty concepts specifically to the problems of information distribution and dissemination (Wersig and Neveling, 1975). Others, again basing their formulation on information theory, have attempted to quantify information as a cognitive state. For example, Schrieder (1965), in applying Shannon’s message concept, proposes that the information that is transmitted acts on a “belief state” (thesaurus). As explicated by Fox (1983, p. 70), . . . a statement conveys information if I(0, T ) > 0, that is, when the statement has some effect, however slight, on the thesaurus. If the thesaurus already contains T (i.e., 0, the receiver already believes that T ) or T is not understood, then I ( & T ) = 0 and T contains no information. If two statements T1 and T 2 are such that I(@,T1) = I ( & T2),that is, they result in changes of 0 in just the same degree, then they contain the same amount of information, but not necessarily the same information.
2.1.2
The Field Model
The field model refers to the Engineering (or design) of Environments that help individuals to more efficiently acquire, use, and give to others information they need to meet day to day demands. Usually these demands for information are undefined (Brittain, 1970). From this perspective, the subject of measurement of information can represent a number of approaches, ranging from attempts to gauge how well what a person received from a service source matches the person’s needs (Saracevic, 1971),the cost of information services (Mason, 1978),and the cost in the production and distribution of information in containers (books, records, etc.) (Machlup, 1962) to the value that can be reaped through facilitating storage of and access to information material (Taylor, 1986, Kent, 1971). Much work has been accomplished and reported on the question of information retrieval systems, both manual and automated, that can be designed to improve their efficiency and effectiveness. Commencing with the early work by Perry and Kent (1958), measures such as recall and precision have received continuing attention in information retrieval research. Recall refers to the capability of the system to provide documents
336
ANTHONY DEBONS
relevant to the needs of the user, while precision refers to the capability of the system to limit what is retrieved from what is available so as not to clutter the user with irrelevant documents. The recall measure is the ratio of the number of documents received (times 100) to the total number of relevant documents in the collection. Precision is expressed as the ratio of the number of relevant documents retrieved (times 100) to the total number of documents retrieved. Figure 1 shows the anticipated relationship between the two measures (Kent and Lancour, 1973).According to F. W. Lancaster, the figure “represents the average of the recall and precision ratios for all 50 searches, with each search being conducted at four different ‘levels’ ... when searches are conducted generally point A), a recall of around 90%is achieved; the precision, however, is very low. When on the other hand, the searches are made specific, a high precision, low recall (point D) is achieved. The points B and C represent compromise strategies between these two extremes.” The subjects of indexing and classification have also received attention in attempts to identify measures that could establish their effectivenessin serving the needs of users. Indexing, for example, has been studied to determine the level of generality and specificity of content material required for an index to be effective in representing the content of the document. The art of indexing requires the availability of expert indexers who can pass judgment as to the exhaustiveness of the index in relating the material on hand. Another area of interest to information scientists related to indexing has been the classification of subject material-a concern dating back prior to Aristotle and more recently to Dewey and others. Generally,the technique or measures used to determine the relative effectiveness of various means of
0 Precision
FIG. 1. Relation between recall and precision.
FOUNDATIONS OF INFORMATION SCIENCE
337
classifying materials has been through the comparison of varying classification systems. More recently, computer-aided classification has assisted this process. Lastly, bibliometrics deserves mention as a program of statistical bibliography which can establish hypotheses on various properties of documents, their producers, and users. Due to the growing volume of documents, the need to understand the nature and conservation of this important reservoir of potential knowledge has stimulated the work done under the name of bibliometrics (Brookes, 1973). For example, the question of obsolescence of professional literature has been a matter of concern. Medical literature is a case in point. It has a half life of about four and a half years. The scientific literature is also growing exponentially, while also being discarded at the same rate. Lukasiewicz (1972) writes about the “ignorance explosion,” referring to the obsolescence of engineering education. Measurement of obsolescence is difficult and continues to be a matter of interest to bibliometricians. The various efforts cited by Zunde and discussed previously in this section (Bradford, 1950; Zipf, 1949; Lotka, 1976) fall within the province of bibliometrics. These applications do not exhaust the wide spectrum of possibilities providing a base upon which measurement of iinformation can be implied. For example, neurology, metrology, and genetics have been suggested (Harmon, 1986). A case in point is Brookes’s conceptualization of information in relation to the concept of knowledge. Brookes expresses this relationship in the following equation: K ( S ) + A1 = K ( S + AS) where K ( S ) is knowledge structure, A 1 is the increment of information and As is the effect of the modification. Of course, much depends on our ability to establish the measures and methods to validate this equation. It is important to conclude this segment on measurement of information by emphasizing that much depends upon the definition of information. Depending upon how information is conceived, a wide spectrum of related measures for it can be recruited. For example, if information is seen as energy that stimulates or initiates a receiver to awareness, then the concept of minimal amount of energy (light, sound, etc.) for response is relevant. This concept of minimal amount of energy required for response is referred to as threshold and has been subject to considerable study in the literature (Stevens, 1974, 1988; Swets et al., 1961). If information, on the other hand, is seen as some object (stimulus) that needs to be regulated and managed for an adequate response, then concepts related to logistics and accounting can be applied. It is also important to acknowledge that technology permits measures to be established and used. It is reasonable to expect that, with increasing
338
ANTHONYDEBONS
sophistication in data processing technologies, new approaches to the measurement of information will evolve and a better understanding of the phenomena will be achieved. But perhaps more importantly, if the nature of information is indeed a metaphenomenon, or perhaps an epiphenomenon (Beniger,1986) the need to establish measurement for such concepts remains a challenge of considerable proportions.
3.
Structure: The Science of Information
Although the yearning to understand the nature of information and its measurement can be said to be generic to information science, there are other aspects that are germane. Since the coining of the phrase “information science” in the late 1960s, the perennial quest has centered on determining whether the study of information can be the basis upon which a science of information can be structured. The argument can be advanced that all sciences study infromation, which leads to the question as to what differentiates information professionals from others concerned with information. (Debons and Larson, 1983, p. 25). There has been a wide range of efforts reported in the information sclence literature that have attempted to provide the basis for an understanding of the nature and the foundations of the science and its struggle for definition as a discipline. These efforts can be construed as attempts to elucidate the fundamental structure of the science, whether theoretical or applied (Shera and Cleveland, 1977; Zunde and Gehl, 1979; 1984). Proposed foundations have been included as part of edited compilations of essays (Weiss, 1977; Machlup and Mansfield, 1983; Heilprin, 1985, 1989) and published volumes referred to as key papers in the field (Griffiths, 1980; King, 1978). The NATO Advanced Study Institutes in Information Science, supported by NATO’s Science Division, were formulated to suggest that the foundations centered around the nature of information, the nature of information systems, technology and its impact on society and resources (personnel and education) (Debons, 1974; Debons and Cameron, 1975; Debons and Larson, 1983). More recently, the question of the structure has received attention by several leaders of the field (Williams, 1988; King, 1988; Bearman, 1986 and others). 3.1
Historical Perspective
Despite the claim that a detailed historical account of the origins of information science does not exist (Shera and Cleveland, 1977), there are hints as to the beginnings of the science (see Wooster, 1988). Historically, the foundations of information science can be traced to the accessing and
FOUNDATIONS OF INFORMATION SCIENCE
339
handling of documents (Shera and Cleveland, 1977; Herner, 1984). Concepts regarding the classification and cataloguing of documents date back to Aristotle, Plato, and the later work of Mills, Dewey, and more recently by Ranganathan (see Machlup and Mansfield, 1983, p. 382). There are two primary theses about the foundations of information science from the historical perspective. One thesis claims that information science has its beginnings in the documentation movement-a movement that has its genesis in a number of events that were part of library science occurring in the later part of the nineteenth century, culminating in the emergence of the American Documentation Institute (ADI) which was redesignated as the American Society of Information Science in 1968 at the suggestion of Eugene Garfield. The other thesis traces the origins of information science to the development of computer technology, the aim of which was to enhance the ability of individuals to solve problems and make decisions (primarily military). This thesis is advanced by those who see information science as a basic science conforming to the canons of science suggested by Bacon, Descartes, Mills and others. The two theses are not mutually exclusive, in that problem-solving and decision-making are dependent on the effective acquisition, change and accessing of data, information and knowledge. The thrust of the documentation movement was to improve the access and retrieval of information-a movement born out of the needs generated by the launching of Sputnik, which increased scientific productivity in critical areas (particularly space research). Emanating primarily from Europe, and particularly from the pioneering work of Paul Otlet and Henri LaFontaine in Brussels (Rayward, 1983), the basic concepts of documentation became formalized through the establishment of the International Federation of Documentation (FID), later in the United States as the American Documentation Institute. Documentation was referred to as “the creation, transmission, collection, classification, and use of ‘documents’;documents may be broadly defined as recorded knowledge in any format” (Tate, 1950). Documentation became an essential part of the modern system of graphic communication within the world of scholarship during the later part of the nineteenth century. The movement stressed the importance of the flow of information among scientists and area specialists (Shera and Cleveland, 1977, p. 25 1). Documentation as a specialization was not concerned directly with the human need for information in dealing with day-to-day preoccupations. Rather, documentation referred to the principles directed at facilitating the processing and logistics of packaged, graphic material through automation. Basically, this concept of documentation remains in Europe and India to the present. “Informatique” and “informatics” are terms that refer to the study of documentation in this sense.
340
ANTHONYDEBONS
The desire to increase the capability of libraries to deal with the demands of scholarship expanded the role of documentation from that of ordering and accessing information (knowledge)objects to that of determining the best way to present the information. Since the onset of printing (Gutenberg press), the hardcover record (or book) was the standard packaging concept. Photoimaging techniques (microfilm)offered different options for the recording of information, particularly with respect to size. Aided by governmental interests and funds, mainly by the National Science Foundation, expanded technological applications provided new vistas to knowledge handling in the increased support of research and development (Herner, 1984). Spurred by the thinking of Vannevar Bush’s essay “As We May Think” (Bush, 1945), which emphasized the potential in the varied applications possible of electronics, computers, and photo-imaging, documentation and information dissemination in general enjoyed renewed attention and emphasis. One of the implications from Bush’s “Memex Machine” was that information could now be presented at the user’s (scientists’s) doorstep through the computer on the desk, making information available at the push of a key. Ralph Shaw (1957), a pioneer information scientist, expressed this specificallythrough his reference to the need to compress distance in the access of information and knowledge (basically through mechanical means). It is in this direction that the work by Casey et al. (1958) at Western Reserve University in the 1950s and 1960s culminated in a formal concept tying automation to document processing. This work can be seen as an extension of the early developments in mechanizing operations in textile looms in Charles Babbage’s “Analytical machine,” in statistical punch cards and ultimately in H. Hollerith’s card sorter developed for the U.S. Census Bureau. In 1960, Calvin Mooers coined the phrase “information storage and retrieval” with which many of the pioneers of information science can be identified (Taube, Casey, Perry, Kent, Berry, Belze, Shaw, and others). Hans Peter Luhn has been called the Thomas Edison of information science (Herner, 1984). Luhn captured the potential made possible through the mechanical and electronic technology advances of the time, extending the intellectual discourse on documentation of the era. Luhn’s work on coding, multiple-attribute search techniques, the development of a thesaurus of indexing and retrieval terms, machine-readable book and catalog records can be considered the predecessors of the MARC (Machine-Readable Catalog) program. These activities made it feasible to apply the power of the computer to automating the Library of Congress (LC) collection (Buckland, 1965; Avram, 1969). These Activities can also be considered to be the predecessor of Eugene Garfield‘s indexing of citations from the work of scientists in numerous fields (Garfield, 1977). The expansion of these efforts is further seen in the pioneering work of Roger Summit (1972), in the development of on-line
FOUNDATIONS OF INFORMATION SCIENCE
34 1
systems (Dialog, Eric, etc.) and in the achievements applied to the processing of medical knowledge done for the National Library of Medicine (Medical Literature Analysis and Retrieval System on Line (MEDLARS)). On October 4, 1957, “information science” as a phrase in the literature did not exist, or, if it did exist, was not much heard of. Yet the launching of Sputnik that day by the Russians set into motion a number of forces in the scientific community that would give added impetus to the documentation movement. The challenges that were defined by that event stirred all the sciencesphysical, behavioral, and social, including the humanities-to the need for a closer examination of the way that knowledge is generated, created, produced, disseminated and used. In 1963, the Weinberg report, initiated by the government’s interest in determining how knowledge use and distribution could be improved, critically assessed the habits exercised by scholars in the production and distribution of knowledge and the infrastructure that supported that production. This spirit for the revitalization of knowledge production and distribution systems exemplified the scientific renaissance emerging at the time. As the result of the “Sputnik scientific renaissance,” libraries were challenged to improve the quality of their services. These services were to be retailored to the demands of the scientists (Havelock, 1971). Those more technologically oriented in the library community and those who were particularly identified with the documentation movement saw their interests in automated data processing directed more to the personalization of these resources and capabilities to the users of information, in contrast to the documents themselves. Documentation became part of the total revolution in data processing, which was highly influenced by space exploration and defense. It is in this direction that the National Science Foundation, National Library of Medicine and other governmental agencies as well as industrial institutions directed their attention and funding. While the Sputnik affair led to a more focused and critical attitude toward scientific information and the community directly attributable to it, the ballistic missile threat in the early 1970s added a new and important dimension that is often eschewed in discourses on foundations of information science. Because of the threat implicit in the launching of ballistic missiles, data processing technology in support of strategic, tactical, and logistical planning and operations became critical. The design of command, control, and communications systems to support such functions became a significant part of the research and development enterprises of the military establishments (Debons, 1971). Human engineering (later called Human Factors, or Ergonomics) was pioneered by the Cambridge Laboratories in Great Britain, the Aeromedical Laboratory at the Wright-Patterson Air Force Base in Dayton, Ohio, and supported by the Office of Naval Research, the Air Force
342
ANTHONY DEBONS
Office of Scientific Research and other quasi-military and industrial institutions. All of these elements contributed to the increased attention given to the use of data processing technology in improving the problem-solving and decision-making functions of military personnel. The design of information systems stressed the role and use of electronic displays and computers in such functions. As a result of the military interests in generating a research community outside of its in-house capabilities, professional societies like IEEE, ACM, and the Society of Information Display were created or re-oriented and strengthened. The significant increase in governmental funding of research enabled the recruitment of many scholars from diverse fields to encompass the creation (and strengthening) of many scientific laboratories, both public and private (e.g., System Development Corporation, Mitre Corp., and others). As a matter of fact, many of the areas that are now generally associated with information science (such as artificial intelligence, cybernetics, and bionics) owe their emergence to these developments. As one example from many possible, the reports of the First, Second, and Third International Congresses in Information System Sciences (Walker, 1967), sponsored by the Electronic Systems Division of the United States Air Force, provided the first effort at exploring the breadth of issues related to information science. The Congress in Information System Sciences held in Hot Springs, Virginia, provided a dialogue among some of the foremost scientists of the day. Engineering disciplines and topics, ranging over linguistics, computer design (hardware) and language (software), system analysis and design, human information processing and decision-making, and numerous other areas, were part of the Congress’s agenda. The congress explicitly indicated that information system science was interdisciplinary in character and focused on improved data processing. It is through these movements that the early work by Slamecka and Zunde on language, and presently on software engineering, together with the experimental work by M. Yovits on decision making and problem solving as constituting part of the foundation of information science can be understood. Also, these efforts are foundational to the work of Wise and Debons (1987), who are attempting to determine whether or not information system failure (e.g., Three Mile Island, Challenger, etc.) can be investigated and thereby isolate those factors in information system analysis and design that are critical to such failures.
3.2 Surveys Studies to determine the general areas of interest of information scientists and the terms used to express these interests reveal extensive breadth of subject area interest. They also seem to indicate a lack of consensus as to the
FOUNDATIONS OF INFORMATION SCIENCE
343
meaning of the terms that are used to express these interests (Wellisch, 1972; Schrader, 1984; Houser, 1988). Data on areas of interest and consensus on terminology have been obtained through questionnaires. The findings suggest a wide range of interests and presumably diverse overlaps in the meaning and use of terminology. These surveys and the commentaries that accompany their interpretation support the view that the study of information may not have reached the stage appropriate to a science and a discipline (Vagianos, 1972).A recent survey led to the following conclusion: “. ..is there justification for the notion of a branch of science named Information Science? If JASIS, the official journal of the American Society of Information Society, is the representative of the field then there is no such justification” (Houser, 1988). Unfortunately, these studies do not provide sufficient background on the formal training (education) and work experience of the respondents to permit the conclusions they draw. Background data on individuals considered to be information professionals is not available, although it may be expected that they range from those who work as librarians, computer scientists and other specialized fields (e.g., mathematicians, statisticians, psychologists, etc.) in a number of different institutions and capacities (Debons, et al., 1980).
3.2.1 Special lnterest Groups A source for understanding the nature and composition of a science is the professional groups and organizations to which the membership of the field belong. In this sense, the generation of special interest groups as a function of time provides a tangible indicator of its foundations. Table I presents a listing of the current (1989) special interest groups of the American Society of Information Science (ASIS), as indicated in the official publication of the Society. It is interesting to note the similarity of the current range of subject areas reflected in Table I with that obtained two decades earlier and presented in Table I1 (Donohue and Karioth, 1966, pp. 117-118). The nature, character, objectives and goals of professional societies evolve as the interests they represent and develop become more defined. Special interest groups can be considered tangible indicators of this development. The roster of special interests of information scientists suggests that such interests are founded basically on prevailing problems in dealing with information, whether in texts (documents) or in computers (data base). Most of these problems are institutional in the sense they focus primarily on the management and logistics of information (in whatever form). From a more specific perspective and relevant to the objective of this report are the subject areas that have concerned the Special Interest Group (SIG) that
ANTHONYDEBONS
344
TABLEI SPECIAL INTEREST GROUPS OF ASIS (1989) ~~
Arts and Humanities Automated Language Processing Behavioral and Social Sciences Biological and Chemical Information Systems Classification Research Computerized Retrieval Services Education for Information Science Foundations of Information Science Information Analysis and Evaluation Information Generation and Publishing International Information Issues Law and Information Technology Library Automation and Networks Management Medical Information Systems Numeric Data Bases Office Information Systems Personal Computers Storage and Retrieval Technology Technology, Information and Society User On Line Interaction
TABLEI1 SPECIAL AREAS OF INTERFSTS (DONOHUE AND KARIOTH, 1966) Professional Aspects of Information Science Technology Information Needs and Uses Content Analysis, Specialization and Control for Documents Retrieval Systems File Organization and Search Techniques Automated Language Processing Evaluation of Indexing Systems New Hardware Developments Man Machine Communications Information Systems Applications Library Automation Information Centers and Services National Information Issues and Trends ~~~
Reprinted with permission from John Wiley and Sons, Donohue, J. C. and Karioth, N. E., “Coming of Age in Academe - Information Science at 21,” American Documentarion 17 (3). 117-119 ( 1966).
FOUNDATIONS OF INFORMATION SCIENCE
345
TABLE111 AREASOF ASIS SPECIAL INTERESTGROUP/FOUNDATIONSOF INFORMATION SCIENCE* Year
Subject
1977 1979
Classification Categories Entity Process Combinations Information Theory Current Trends in Information Science Research Linguistics and Information Science Pragmatics Homomorphisms in Information Science and A1 Basic Concepts of Signs Dichotomy Issues Theory of Signs Epistemology Models of Observations Effects of Biology on the Information Science Concepts Theoretical Aspects of Simulations
1980
1981 1982
* For certain years the contributions of SIG/FIS were not identified separately in the proceedings of the annual program of ASIS. addressed the Foundations of Information Science (FIS). Table I11 is a compilation of subject areas covered at the annual meetings of this SIG as part of the program of the parent organization. The emphasis is on fundamental issues of language, measurement, and, in general, epistemology. As a matter of fact, one could conclude from this array of subject areas that information science could be another name for applied epistemology. 3.3
Academic Development
We can also get insight into the foundations of a field by looking at the curricula, course requirements, and textbooks used in educational programs in the field. 3.3.1 Curriculum Development
Efforts at developing a cohesive program for educating and training information professionals have been extensive and, on the whole, indecisive (Belzer, 1971; Jackson and Wyllys, 1976). This can be expected from a developing field. Academic programs are the product of social institutions that are at times guided (if not governed) by economic and political forces. There are also philosophical issues that prevail among some information
346
ANTHONYDEBONS
scientists who, for example, differentiate education from training (Harmon, 1970; Brittain, 1987; White, 1983). Central to the present interest is the content of foundational courses now being offered by academic institutions. Allen and Lancaster (1981) have provided an outline of an information science program where the foundational aspects are included (Fig. 2). The a focus of the course is on the mechanisms of the services that the institution of the library can provide. What is needed is a detailed study of a number of foundational courses offered by information science departments of several academic institutions. This will help in determining what topics are included that could be considered as fundamental to the science. Meanwhile, the extensive essay by Jackson and Wyllys published in 1976 is a comprehensive account of the substance and trends of professional education in information science to that time. It examined the role and position of the introductory course in both accredited A. Foundations (10) 1. The Library in Society (1) 2. The Interface Function of the Library 3. Concepts of Communication (1) 4. The Profession (2) 5. Types of Libraries (2) 6. History of Libraries and Librarians (3) B. Materials (12) 1. The Physical Item (2) 2. Bibliographic Tools (3) 3. Forms of Material (1) 4. Building Collections (4) 5. Publishing Industry (2) C. Methods (37) 1. Circular Systems (2) 2. Reference Services (16) 3. Information Retrieval (3) 4. Library, Cooperation (2) 5. Acquisitions/Ordering (4) 6. Cataloging and Classification (10) D. Management (7) 1. Measurement and Evaluation (2) 2. Personnel (1) 3. Library Buildings (2) 4. Public Relations (1) 5. Finances (1) E. Technology (4) 1. Library Automation (2) 2. Communications Technology (1) 3. Reprography (1)
FIG.2 : Outline of an informationscience program. Reprinted with permission from Allen, N! C., and Lancaster, E W.(1981). “A Directed Independent Study Approach to a Foundations Course;’ J. Educationforfibmrianship, 21 (4), 313-326.
347
FOUNDATIONS OF INFORMATION SCIENCE
I. The Profession A. Library and Society B. The Library Profession 11. Concepts C. Communication Models D. Interface Functions E. History of Libraries F. Types of Libraries 111. Materials G. The Physical H. Bibliographic Tools I. Collection Development IV. Services and Operations J. Acquisition K. Cataloguing and Classification L. Document Delivery M. Reference N. Information Retrieval 0. Library Automation V. Management P. The Library Manager Q. Personnel R. Public Relations S. Library Buildings T. Finance VI. Summary U. Cooperation V. Intellectual Freedom FIG.3: Foundations course outline. Reprinted with permission from Allen, W.C., and Lancaster, E W (1981). “A Directed Independent Study Approach to a FoundationsCourse,” J . Education for Librarianship, 21 (4). 313-326.
and non-accredited institutions. (A summary is presented in Table IV). At the time there were only 14 courses in accredited programs specifically identified as an introductory program in information science, and only one course in non-accredited programs. TABLEIV PLACEOF INTRODUCTORY COURSEIN INFORMATION SCIENCE PROGRAMS
As as optional separate course Integrated in other courses Part of block or foundations course As a required separate course Other Totals
Accredited
Non- Accredited
Total
22 21 16 14 2 15
13
35 28 20
I 4 1 0 25
15 2 100
Reprinted with permission from Davis, C. H..and Shaw, D. (1981). “A Brief Look at Introductory Information Science in Library Schools,” J. Education for Librarianship, 21 (4). 341-343.
348
ANTHONYDEBONS
Table V is a summary by the present writer of the courses listed as courses in information science programs in school catalogs. In 1988, again only 13 colleges or universities provide an introductory course in information science. Presumably much of what can be included as an introduction to the field is embedded in other courses, particularly in library and information services and management. In general, the curriculum suggested by the various information scientists conforms to that suggested by Taylor (1973), namely, to include 1) the nature of the science; 2) the engineering of information systems (analysis and design); and 3) the structuring and management of services. In a more recent article, Hayes (1986), in discussing information science education, provides a more extensive curriculum domain, including six major areas: core courses (Introduction to Information Science, Information System Design), formal disciplines (i.e., mathematics and linguistics), applied disciplines (i.e., statistics and operations research), computer-oriented courses (i.e., data base management and computer retrieval), management-oriented courses (i.e., accounting and organizational theory), and information organization and service courses (i.e., cataloging, referencing, etc.). This wide range of courses supports the interdisciplinary character of information science. The present educational programs in information science are pivoted towards principles of service based on technologies that provide new vistas in the management (logistics) of information (knowledge) conveyers. They do not support the claim of “helter-skelter” or “ubiquity” that some claim characterize information science educational programs (Vagianos, 1972; Wellisch, 1972; Houser, 1988). A more recent survey of the education of information professionals by the American Society of Information Science (Richards, 1988)reveals that there is no substantial agreement as to the basic core (foundations) for the education of information professionals. Accreditation of academic programs in information science remains under consideration, although there is a sense that establishing standards should procede such action (ASIS, 1988). In contrast to this area of uncertainty, the work by the American Association for the Advancement of Science (Project 2061: Science for all Americans) can be seen as providing guidelines for the conceptualization and standardization of educational programs in information science. As indicated by the American Society for Information Science bulletin on the subject (Vol. 15, No. 2, June, July, 1989, p. 21). “Project 2061 is an effort to establish a conceptual base for educational reform by spelling out the knowledge, skills, and attitudes all students should acquire as a consequence of their total school experience from pre-school through high school.” The report Physical and Information’Sciences and Engineering cites four concepts that are fundamental to all aspects of the teaching of sciences in interdisci-
FOUNDATIONS OF INFORMATION SCIENCE
349
plinary field, namely, 1. the concept of 2. the concept of 3. the concept of 4. the concept of
materials energy information systems.
The report further suggest the following topics on the concept of information: 1. The formal definition of information and its relation to probability (for example, the more surprising a message is, the more information it carries). 2. The concepts and units by which we endeavor to quantify information and information flow (for example, words, bits, and bits per second). 3. Errors and redundancies and their use in reducing errors (for example, by repeating a message). 4. The way in which information can be elaborated (for example, by computation or pattern recognition), carried (messages from transmitter to recipient), and stored (for example, in data banks). 5. The physical means of conveying and storing information, the limits imposed by nature (how many can we convey? How fast?) and energy and materials requirements (for example, the electric energy needed to convey a telephone message). 6. Examples of information processes in a social system (for example, learning), in a biological system (for example, in the genes, the nervous system, or the brain), and in a human-engineered system (for example, in computers, writing systems, and radar). 7. Strategies for acquiring and using information (for example, getting the most information from a deep-space probe with a low-power transmitter-or even the very simple problem of weighing an unknown object using a minimum number of steps, given a set of known weights) and energy savings achieved through information (for example, avoiding wasted motion or other activities that ultimately-through their efficiency-determine the economic competitiveness and wise utilization of resources, human and otherwise, of a society). 8. Information paucity and information overload (an overload is a function of time-that is, of the speed with which information is processed). The following are key concepts pertaining to both information and computer science, the first five directly related to information science. 1. Information is the meaning attributed to data. 2. Different kinds of information can be derived from the same data.
x x
X
X
X
x
X
x
X
x
X
X
X
X
X
X
X
X
x
x x
X
X
x X
X
x x
x x
X
X
X
X
X
X
x x x x x x
x
x x x
X
X
x x
x
x x x x
X
X
x x
x
X X
x x
x x
Perfor. Eval. X
X
X
Organiz. Cont.
Lib. Inf. Center
Inf. Organ.
Inf. Use
Automation
Online Inf. Ser.
Catalog. Class
Inf. Systems
Inf. Proc.
Inf. Environ.
Intro. Lib.
Media
Resources
Inf. Tech.
Reference X
Inf. Agencies
User Nd. Asses.
Design
Communications
Comp. Fund.
Sys. Analysis
College Dev.
X
x x
X
x x
Bib. Control
Inf. Prof. X
x
Research
Lib. Mgt. & Adm.
Services
x x
X
x x
Retrieval
x
x
Inf. Science
x
X
X
x
x
x x
x x x x x
X
x
09E
*
II
2
h
0
2
Rosary Laughborough U. Louisiana St. U. of MD N. Texas St. Polytech Inst. N. London U. of Okla.* McGill UCLA U. College, London* u. of Pgh.* Queens Coll. Pratt U. of Sheffield u. of sc Syracuse U. of Toronto U. of Tenn. U. of Texas, Austin Totaloffering Courses
X
X
X
x
x
x
x
x
x
X
X
X
X
X
X
x
X
X
X
X
X
X X
X
x
x
x x x
x x x
X
X
x
X
X
x
x
X
x
X
x
x
x X
X
X
x
X X
x X
x
X
X
x
X
X
13
7
20 20
X
X
x
x
X X
X
x
x
X
X
X
X
X
x
x
8
X
2
7
12
3
1
7
1
3
1
2
8
1
4
11
X
3
3
3
2
7
5
3
4
4
6
4
* No courses were identified core as such. Subject areas included could be speculated as representing a core by the language used in describing the representative program. ** The schools reported cannot be considered as representative of all schools offering information science programs. The schools listed are those for which catalogs were available in the library used at the time. Nor does the sample represent data from the latest catalog for the school listed.
352
ANTHONY DEBONS
3. Information can be expressed in many forms and can be represented in analog (continuous) or digital (discrete) formats. 4. Information generally degrades during transmission or storage. 5. All systems, both natural and human-made, are internally coordinated by processes that convey information. 6. Information is more useful when it is represented by orderly collections of symbols called data structures. 7. Procedures can be formalized as algorithms. 8. Computing machines are constructed from simple components. 9. All general-purpose computing machines are fundamentally equivalent. 10. To ensure that an information system will be successful in the real world, the design must include both logical rigor and an understanding of social forces, cultural beliefs, and economic realities. Preceding lists reprinted with permission from John Wiley and Sons, Donohue, J . C., “A Bibliometric Analysis of Certain Information Science Literature,” J . ASIS, 23,313-319 (1972).
3.4
3.4.1
Literary Sources
Texts
Another source for understanding the foundation and development of a field or discipline is the instructional material generated for the education and training of its scholars and professionals. Table VI includes an account of the several publications from 1962 to the present that directly or indirectly have served as texts in the introductions to information science. Perhaps the earliest textbook in information science was Kent’s Textbook on Mechanized Information Retrieval (Kent, 1962). This textbook shows the early emphasis on the role of technology in the retrieval of information, but it is also significant in relating the human aspects of searching that are part of the retrieval process. Considering the importance of the documentation movement during this period in the history of information science, this text represented a landmark in the attempt to provide a systems view of the factors (design) important to the field. Information Storage and Retrieval, by Becker and Hayes (1963), is an extension of this emphasis. Although not specifically citing “information science,” the authors allude that the science is correlated with information storage and retrieval-an interdisciplinary activity. Thus, Becker and Hayes were considering the foundational concepts of information science to rest on the work of librarians, aided by advances in technology (automation). Tefko Saracevic’s 1970 Introduction to Information Science consists of an
TABLEVI COMPILATION OF PUBLICATIONS USEDI N SUPPORTOF ACADEMIC INSTRUCTION.*
Kent (1962)
W
8
Textbook on Mechanized Information Retrieval Data Processing and the Library
Codes and Notations
Becker-Hayes (1963)
Saracevic (1970)
Information Storage Introduction to & Retrieval: Tools, Information Elements, Theories Science The Librarian & Recorded Knowledge Documentation & Development of New Techniques Information on Framework & the User
Printed Data & Creation of a Machine Language
Becker (1975) The First Book in Information Science
Davis-Rush (1979) Guide to Information Science
Flynn (1987) Introduction to Information Science
Debons, Horne & Cronenweth (1988) Information Science: An Integrated View
Numerical Techniques What is Information?
Representation of Information
Information Science as Question/ Answering: Decision Making: Problem Solving The Use of Database Management Systems
Perspective (Definitions; Scope)
(continues)
TABLE VI (continued) Debons, Home & Cronenweth Kent (1962) Principles of Analysis
Becker-Hayes (1963) Analysis, Logical Processing & the Computer
Saracevic(1970)
Becker (1975)
Question Handling & Search Procedures Dissemination
Getting Information History & from a Computer: Fundamental Putting Inforof Computing mation into a Computer: Fipding Information in Microfilm
Aquistion, Information Retrieval
Storing & Retrieving Information
w P UI
Words, Language Indexing, & Meaning in Documents & Retrieval Storage of Data systems
Principles of Searching
Interdisciplinary Character of Information Systems Elements of Usage
Davis-Rush (1979)
Human Factors
Flynn (1987)
(1988)
Data Analysis: Methods in Understanding Information the Data: Data science Collection Analysis: Storing the Data: Data Manipulation: Data Collection Methods: Information Science as Coding the Data General Principles Data Display: Human/Computer of Retrieval Dialog: Writing systems: Abstracts & Graphic Programs: Text Processing Classification Indexing and Classification Information System Synthesis Human Factors
Estimating and Using Data Values
Physical Tools: Manipulation of Searching Devises Research on Scientific Documentation
Elements of Organization Elements of Equipment
Organization of Communicating Information Stores Information
Parameters & Implementation Role of Theory
Testing
Information Technology: Communication Technology
Communication: Unifying Theory: Nature of Information: Notions of Information
Retrieval Systems Evaluation What is Information?
Data Structures & Fields, Records & File Files: File StrucOrganization tures: Inverted File
Theories of File Organization System Design Criterion
Theories of System Design
Communication Model
Information Systems Economics & Growth
Future of Information Science
Making Predictions
System Theory & Information Science Professional, Social & Morale Aspects of Information Science: Future of Information Science
* With the exception of Becker-Hayes where the subject chapters are Listed in sequence as indicated in the contents of the publication, the ordering of subject contents in the other texts is the present writers.
356
ANTHONYDEBONS
organized compilation of articles that represent the basic framework of the field, The intention of the text was made explicit in the preface: For teaching purposes, we here also desperately needed texts summarizing, synthesizing, interpreting, and connecting the results of investigations widely dispersed over time, over fields, over approaches, and over literature.Although, in general,we have sensed that there is unity, the unifying structure has yet to be explicitly stated. Although we also have felt that there are significant relationships among a number of philosophical statements, theories and experiments and a variety of practical information systems, such relationships have not been formally or practically worked out. (Saracevic, 1970, p. 410)
Saracevic’s treatment of information science mirrors the prevailing views at the time (and perhaps the present) that the substance of information science is centered around information storage and retrieval, thus preserving the Becker-Hayes formulation. Saracevic, however, extends this formulation through his reference to Goffman’s (1960)epidemic theory of communication, a theory that sees the dissemination of information as following the metaphor of medical transmission of disease. Saracevic views Goffman’s theory as synthesizing the various interests of information scientists. The communication process is central to Becker’s 1973 The First Book of Information Science. In this book information and communication enjoy a common base for theory development and praxis. The technical detail of information systems provided both by Becker and Hayes (1963) and by Saracevic (1970) is lacking in this brief text by Becker. The text follows the tradition that the focus of attention of information science is very much applied, very much related to how we obtain and disseminate what we know to others and how technology can help the process. Davis and Rush’s Guide to Information Science (1979) extends the applied emphasis, focusing directly again on information storage and retrieval systems, particularly on matters relating to the use of computers in document processing. Davis and Rush join their predecessors in asserting the importance of the human as a matter of interest and concern to information science. The discussion of human factors pertinent to the operation of information systems in this and in previous texts supports the view that the human element is increasingly considered as fundamental in the field. Flynn’s omnibus Introduction to Information Science (1987) is an attempt to emphasize the technological framework of information systems without denying the importance of the human reference. The emphasis is on data processing- the methods and technologies that enable and facilitate it. Flynn’s book considers information science as concerned with questioning and answering. Undoubtedly, Flynn has been influenced by Churchman’s concept of information science as an inquiring system. Churchman considers
FOUNDATIONS OF INFORMATION SCIENCE
357
five of the principle Western epistemological systems (such as Locke, Leibnitz) as ways of inquiring about the nature of knowledge. By relating data processing systems to inquiring systems, Flynn has suggested a foundational notion of information science that is related to the province of knowledge itself. Information Science: An Integrated View, by Debons, et al., attempts to show the broad range of subject areas related to information and its interdisciplinary character. It rests its case on the complex, diverse nature of an information system. It postulates that all organisms are information systems and that the science of information concerns the principles that govern the design of such systems. Technologically augmented information systems increase the capability of humans to be aware or conscious of their own state in it. In achieving this purpose, information science serves to synthesize the numerous contributions of a number of disciplines (both scientific and engineering). In addition, the science-as all science-plays a social role by addressing relevant social issues- where information plays an important role. In summary, the introductory texts that exist include the basic principles and issues of the field. These texts basically stress service. Service includes the processing (acquiring, storing, and delivery) of the human record of experience, in physical form, whether on parchment, film or other medium usually in an institutional setting (i.e., library, informational analysis center.) Among these texts, some reference is made to mathematics, linguistics, philosophy and the behavioral sciences. By and large however, they are marked by an absence of an in-depth treatment of the principles, theories and laws of these disciplines to specific issues relevant to information science. 3.4.2
Key Papers, Essays and References
The most extensive treatment of the theoretical and technical substance that can be subsumed as information science is that provided by Machlup and Mansfield in The Study of Information: Znterdisciplinary Messages (1983).This eclectic work attempts to “analyze the logical (or methodological) and pragmatic relations among the disciplines and subject areas that are centered on information” (Machlup and Mansfield, 1983, p. 3). Through diverse explorations of many and various subjects by eminent scholars in the field, the editors provide support for the thesis that the foundations of information science lie in the several paradigms of the physical and social sciences. Again, the field is not unbounded, even though it is interdisciplinary. There are specific areas that are stressed, namely the nature of information, the use of technology, and the management of services. Across this core are subject matters relating to human cognition (and language and semantics), artificial
358
ANTHONY DEBONS
intelligence, communications and control (cybernetics), information theory and systems theory. Key Papers Since the establishment of the American Society of Information Science, several attempts have been made to identify critical (key) library contributions to the science. Under the aegis of the American Documentation Institute, Carlos A. Cuadra in 1964 reported his assessment. Cuadra identifies a number of key scholars in the field and the number of times that they or their work were cited in various publications of the science (Cuadra, 1964). In 1971, A. W.Elias presented what was to be the first of a series of key papers. The key papers were organized by the following areas of interest:
Background and philosophy (relating to information as a phenomenon) Information needs and systems Organization and dissemination of information Other areas of interest, including: Information storage and retrieval The development of administration of automated systems in academic libraries Language and machines Analysis and design of systems Technology and the future of the copyright principle Although the 1971 compilation of key papers in information science was guided by the ASIS education committee, the compilation of key papers in the design and evaluation of information systems (King, 1978) was guided by an editorial advisory board consisting of a number of prominent information scientists. The key papers included three major areas, namely Evaluation Methods pertaining to information retrieval, documentation, library services, economics; Evaluation Applications relating to large collections, data bases, management of libraries and information needs; and Systems Design, specifically information retrieval systems, scientific and technical communication, bibliographic search systems and computer networking, among others. In the same year, King (1978) published his Key Paper in the Economics of Znformation, which included papers on 1) cost of information products and services, 2) pricing of information products and services, and 3) information value. In 1989, Belvier C. Griffith edited the last version of key papers. Griffith described the selection of the papers based on the following criteria (Griffith, 1980, pp. 5-6): “citations of articles written by authors who had been mentioned in the first seven volumes of the Annual Review of Information Science and Technology (ARIST).” The volume includes five parts:
FOUNDATIONS OF INFORMATION SCIENCE
359
structure and dynamics of science information flow information in innovation: required flow of knowledge structure of literature and documents information retrieval and analysis tools and ideas: interface of information science and librarianship The foregoing array of publications on key papers in information science supports the thesis that the foundations of information science are based on the logistics of information-knowledge products and the services that make them renderable to individuals and institutions efficiently and effectively. Further, the close approximation of information science interest with those of librarianship leads one to postulate that the foundations of these two enterprises are closely shared. Encyclopedias The most prominent attempt at integrating the intellectual composition of information science and related science has been undertaken by Professor Allen Kent and his colleagues at the University of Pittsburgh. Their pioneering effort has resulted in three major encyclopedias. The Encyclopedia of Library and Information Science includes 25 volumes addressing a diversity of terms, subject areas and concepts. This encyclopedia has included as co-editors a number of highly respected colleagues, including Dr. William Nasri, and Harold Lancor, Dean of the Graduate School of Library and Information Science at the University of Pittsburgh, and others. An allied publication, The Encyclopedia of Micro Computers, is now in its fifth volume. This publication has been edited by Allen Kent, James Williams and the late Albert Holzman, all from the University of Pittsburgh. It covers a broad spectrum of knowledge about microcomputers in such a way that the novice will gain easy access to current information while obtaining at the same time a sense of the future impact of such technology. Lastly, The Encyclopedia of Computer Science and Technology is now in its 21st volume, edited initially by Jack Belzer, Albert Holzman and Allen Kent and later by Allen Kent and James Williams. These encyclopedias, together with available textbooks that have been discussed in this report, should tend to quiet the most skeptical individuals who are concerned with the intellectual breadth of the field. 3.5
Research Activities
The kinds of questions asked and pursued by information scientists can be an important reference point to suggest the underpinnings of the structure of the science. In 1979 a conference was held at the School of Information Science of the Georgia Institute of Technology to discuss the status of research plans of the various nations represented at the roundtable. The conference was
360
ANTHONY DEBONS
supported by the National Science Foundation and included programs of the Federal Republic of Germany, France, Japan, the Soviet Union, the United Kingdom, and the United States (Slamecka, 1980).The interest in user needs for information and the language used for retrieval of documents were considered generic. In reporting the content of the various national research plans, the following was stated: Research in Information Science may have a number of motivations: particular paradigms of a science of information, the study of information phenomena occurring in nature, empirical problems arising in the handling and use of information, the exploration of the utility of a particular researcher’s methods and tools, etc. In contrast with these motivations of individual researchers, the surveyed national research plans in information science are motivated-with one exception-by the utilitarian objective to support the development and operation of large, nationwide information systems whose purpose is to manage recorded knowledge resources and provide them to diverse clienteles(Slamecka, 1980, p. 253).
This report implies that the foundations of the science are now structured uis-ci-uis the research that is being undertaken or projected. The character of
the research deals with ptactical problems-problems that are grounded primarily on logistics (ordering and handling of objects, nature of services), secondarily on the development of new tools (measurement)for dealing with such problems, and thirdly on the development of an understanding of the nature of information-by discourse or experimentation. This perspective seems to represent the international orientation as far as the basic essence of information science is concerned. (In a more recent statement on intellectual foundations of information science, Buckland (1987)cites three processes that are basic, namely cognitive and epistemological processes, retrieval processes, and the provision and use of information systems.)
3.6 Summary 3.6.1
Themes
If the historical perspective, interests, and attempts to develope a pedagogical structure are taken into account, there are basically three recurring themes expressed directly or indirectly that comprise the formal intellectual and operational activity of information science. These themes pertain to the need to understand the nature of information and human information processing; the determination of principles for a greater storage, access and use of
FOUNDATIONS OF INFORMATION SCIENCE
361
information; and lastly, the sociology of information, namely the political, economic, and other properties of information that reflect social significance. These themes, of course, are not mutually exclusive. Each of these themes is reviewed briefly. Nature of Information As previously discussed, the discourses as to what is meant by information can be broad in perspective (Machlup and Mansfield, 1983), but essentially the direction is towards developing a concept of information that can be subject to quantification and experimental verification. To date, the closest approximation to an objective analysis of information is that which is available through logic in general (Brookes, 1980) and semantic (language) analysis in particular (Fox, 1983). Other efforts have been based on Shannon’s theory, which provides a concept of measurement of information (Zunde, 1981). Yovits et al. (1981) have extended these principles to the situation where is the use and value of the information that are important. The value is determined by the change in the decision maker’s effectiveness caused by the new information. Yovits explicitly defines the relevant quantities in measurable terms. Admittedly, Shannon’s concept is restricted in range and perspective, particularly in its inability to bridge the sensory and cognitive dimensions (human information processing) that are implied by the word “information” (Suppe, 1985). Furthermore, some have proposed that the axiom and practices implied in the term “science” may be inappropriate in deriving an understanding of information (Wright, 1979). My view of information is that the term can represent either the state of being informed (e.g., John is aware that he has lost the game), or the object that instigates the state of being informed (e.g., The report states that John lost the game). As a state, the result of a process, the basic nature of information rests on how the organism responds physically and behaviorally to the energy s/he receives through the senses. In the case of information as an object (book, document, film), logistical (acquiring, storing, retrieving) principles apply. Logistics of Information Because the field at present is largely centered around practical rather than theoretical concerns, some have viewed information science as an applied science, perhaps closer to engineering (Gorn, 1967; Williams and Kim, 1975). How information (however defined) is generated, used, and given to others is the main question that consumes the interest and activities of information scientists. Some information scientists identify themselves with an interest in automation-primarily applied to the library. By and large, the thrust of activity of these professionals is largely custodial. Their charge is to acquire knowledge resources that are of direct
362
ANTHONY DEBONS
value to their constituencies. The value of the objects in their custody rests primarily on accessibility of these objects and on the ability to deliver such objects to the user quickly. What is not explicit is whether the access to and retrieval of a record alone is sufficient to provide the user with the cognitive support that many tasks demand. Thus, in the past two decades or so, the cognitive needs of the user have received more focused attention. Nevertheless, attention to the power of data processing technologies in facilitating the access of the record rather than the intellectual requirements of the user dominate. Much of the present education in information science reflects this trend. The accompanying research literature is consumed with an understanding of the practices that can integrate the access, storage, and retrieval of records thus improving service to users. Sociology of information Information and communication are directly linked in the information science literature and also in people’s minds. We often think of information as something given to us and as something we give to others. Distribution or dissemination of what is known is assumed to be part of what can be called the “information transfer” process. Many conceive of information as power, and concerns about the information rich and information poor, whether for individuals or nations, prevail. More directly, the sociological dimensions of information are reflected in the following activities and interests of information professionals.
Understanding the impact of information on individuals and institutions (Toffler, 1970), on the composition of the workforce as to individuals who do information work (Debons, 1981),and on the changing character of the work place and workforce (Porat, 1977). Determining the cost incurred in the production and distribution of information and knowledge in terms of both human and technological resources (Machlup, 1962; King, 1978). Enhancing the value provided by those services that are available through institutions (e.g., libraries, information analysis centers, etc.) that serve human needs (Taylor, 1986). In the establishment of information systems that assist in the management of resources, both human and technological, in the decision-making and problem-solving functions (Nesbit, 1982). In the formulation of national policies and standards that govern the access and distribution of information and knowledge to the individual and the society at large (Bearman, 1986, 1987), related to questions of privacy, confidentiality and censorship, although not limited to them.
FOUNDATIONS OF INFORMATION SCIENCE
4.
363
Synthesis: On a Theory of Foundatlons
The ideology, interests, and activities of information scientists as sketched in this paper point to several concerns. How to process and manage the records (documents) that account for human experience and insight, how technology can serve this aim, and how institutions can provide services that meet these needs are primary. The foundations of these needs rest in the principles of counting and recording operations which are part of most human enterprise.
4.1
Counting
Counting is the act or process of acknowledging a condition, state, element or force in nature occurring in space or time through some established symbolization (e.g., meter, second, etc.). Counting coincides with the development of a numbering system, leading to the formal system of mathematics (e.g., arithmetic, geometry). From early on in history, counting was used in activities ranging from barter to the exchange of money, etc., to the study of phenomena in general. The genesis of computer science lies in counting and the technologies that provide for its facilitation. From the time of pebbles to the emergence of the Chinese abacus, to Babbage’s “analytic engine,” the practice of counting, whether by hand or machine, made possible developments in automata theory (Turing, 1950), which is the basic theory of computer science. Later, the thinking of mathematician John von Neumann (1958) extended the theory to a wide range of applications which are now included in Artificial Intelligence. 4.2
Recording
While counting is acknowledging of events through symbols, recording is preserving the outcome of the counting process again through symbols. It is the fixing of events, states, or conditions in space and time through some symbolic (language) system expressed physically through light, sound, chemistry, or other means. In addition to preserving the experience, recording enables the dissemination of the experience to others. Clay, for example, has been used in the production of pottery and other products which served as a means to record states, conditions, and events of cultures and civilizations existing prior to known history. Pebbles, rocks and gravel also serve as direct or indirect recording mechanisms, reflecting shifts in geological patterns. Writing has been said to have existed 5,000 years ago (Miller, 1988; Gelb, 1952). Two centuries before Christ, discoveries related to lead, ink, led to the printing press with the use of parchment. These advances made possible the process of
364
ANTHONY DEBONS
printing first established in China. Printing was instrumental in advancing the dissemination of ideas and is foundational to the concept of the library (and the allied concept of a document), which represents “a collection of written, printed, or other graphic material (including files, photographs, tapes, disks, microforms, and computer programs) organized and maintained for reading and calculations” (Shera and Cleveland, 1977, p. 251). The institutionalization of the library to refer to a building, room, or a number of rooms goes back to the fifteenth century, but actually has a longer history emerging from Babylonian times in the first half of the third millennium B.C., extending to Egypt and Greece and conceivably prior to antiquity. The library founded by Aristotle in the Fourth century B.C. provided the ground work for the establishment of the library in Alexandria, a precursor of famous libraries existing now throughout the world. The creation of public libraries is attributed to Julius Caesar, who recognized the power of providing individuals with recorded knowledge. The library as an institution enjoyed further development in the Byzantine and Islamic movements, later in the development of the monastic libraries and culmination in several great libraries, such as the Vatican Library in Rome, the Manchester Free Library in England, the Bibliotheque Nationale in Paris, the British Museum in London, the Library of Congress, and State V.I. Lenin Library in Moscow.
4.3 Data-lnformation-Knowledge Systems Most, if not all, living organisms are data-information-knowledgesystems. Living systems can be differentiated in their capacity to be aware. They are aware when they receive an energy source from the environment. Theoretically this capacity is extended by the organism’s need to cope with increasing environmental complexity. This capacity is correlated with an evolving brain structure and functioning. It is this evolution that makes possible the process of inquiry (Churchman, 1971), a process conceivably linked to several cognitive functions (understanding, application, analysis, synthesis and evaluation) associated with knowledge and learning (Bloom, 1956). 4.3.1
Components of a Data-lnformation-KnowledgeSystem
The various functions (structures) of organisms have been well established, although the principles relating to them continue to be studied. These functions can be briefly alluded to as sensory (data-information-driven) and cognitive (knowledge-driven) (Lindsay and Norman, 1977). Events represent several energy systems (light, sound, etc.) existing in space and time. The acknowledgement of such energy systems is through that called a signal (symbol). Sensor systems (eyes, ears, radars, binoculars, etc.) acquire (A) the
FOUNDATIONS OF INFORMATION SCIENCE
365
signal and transmit (T) the signal to the processing (P) component. Symbols and codes facilitate changes in energy states. The formalization of such symbols denote a datum. The datum enables the processing component to store and manipulate the representation of the event states. This manipulation facilitates the cognitive functions that are necessary for utilization (U). Utilization refers to the evaluative, judgmental process that are part of the decision and problem solving functions-functions demanded by the event in question. The actions (or lack of action) generated by such functions directly influence the event. This influence on the event is part of the transfer (TR) component. The transfer component consists of those mechanisms, both organismic and technological (voice, motor, action, etc.), that provide the capability of the entire system to influence the event, either positively or negatively. These functions are included in almost all models of information systems-the variation lying in their ordering and sequences (flow) of functions. The dynamic properties of these functions preclude their consideration as linear; rather, they are recursive in their relation to each other. The laws and principles related to them are included in a number of disciplines, both physical-social, as well as the humanities (See Table VII). 4.3.2
Componential lntegration
The matter of combining the various components of a system to serve specific functions is a question of analysis and design. The manner in which this can and should occur is included in a vast body of literature (Martin, 1976; Friedman, 1975; Zimmerman, 1983; Debons and Montgomery, 1974).Analysis represents a careful scrutiny of the needs to be served by the system (e.g., retrieval, decision making, planning, etc.) given the event contingencies to which the system is expected to respond. Design constitutes the matching of the requirements of each component of the system to the overall requirement of the total system components in unison. This entails an acknowledgment of the limits of the human capabilities and an understanding of the technological state of the art in serving to extend these capabilities. Simon (1981), in discussing the substance of science of the design of artifacts (the science of the artificial), makes clear the distinction between the knowledge goals of sciences that attend to how things are, which are the goals of the natural sciences, and those that are concerned with how things ought to be, which is the province of a science of designing artifacts. The science of design is grounded in procedures that enable the interfacing of several components that could differ in complexity. Optimization of function is possible to meet specific user needs, given the available resources, both human and machine. The best solution, Simon claims is almost always “satisficing”-a compromise between what you can get and what you possibly
TABLE VII EATPUTR CROSSMATRIX-INFORMATION SYSTEM MATRIX E
A
T
P
U
Tr
Foundational principles, theories and general concepts underlying Information System (EATPUTr) components and their interaction* * Completeness of the concept inclusion cannot be assumed. Legend a. Artificial intelligence a, Automata theory a, Ambience c. Classification theory Categorization theory c, Cybernetics (control) c2Communication theory c3 Code theory d. Decision theory (decision support) d, Problem solving d, Display capabilities d3 Data base structure d, Data management e. Hardware standardization el Hardware thresholds e, Hardware calibration f. Feedback (control) g. Change dynamics h. Human information processing h, Human factors application i. Information theory (mathematical) i, Information theory (communications) 1.
k. Knowledge representation I. Linguistics
I,Learning theory I, Literary science
m. Management theory n. Noise n, Network 0. Operations analysis p. Perception pI Personality style dynamics pz Signal prioritization p3 Human physiology (sensory, neurological) 4. r. Reaction time r, Reliability measurement s. Signal detection theory s, Sociological parameters s2 Software design s3 Semantics s4 Semiotics t. Time space measurement t, Threshold (physical-sensory) x. Expert system formulations
E = Event World A = Acquisition T = Transmission P = Processing U = Utilization Tr = Transfer
Reprinted with permission from Elsevier Science Publishing Co., William B. Rouse and Kenneth R Boff, ed. System Design: Behavioral Perspectives on Designes Tools and Organizations, New York (1987).
FOUNDATIONS OF INFORMATION SCIENCE
367
cannot get (Simon, 1981, p. 36). The task is to combine the capabilities inherent in the human information system with those that are possible through the technology that complements it. The matter of design is to determine connections (bridges) between existing capabilities that define the components of the system and those that are possible through existing tools or by extending the state of the art of such tools to achieve specific ends. It is the definition and arrangement of the interlinks between components that comprise the form of complex structures which data-information-knowledge systems represent. Figure 4 shows the relationships that can exist in an integrated datainformation-knowledge system. 0
0
0
0
0
The human as a data-information-knowledge system is central to the idea. The human organism consists of sensors (A), a central nervous system (T), cranium or brain (P) allowing the processing of signals received from the sensors and transmitted to it. Based on the processing of such signals (data), interpretive and judgmental processes are exercised (U) prior to the execution of action (Tr). All of these components as subsystems act in unison-each interacting with each other through feedback.(arrows). To the extent that the human responds to events in the external world, the inputs to the human consist of data which are formalized through some code system (language). This input is transformed to some state of awareness (information) which serves as a stimulus for the derivation of understanding and meaning (knowledge). Each of the foregoing elements of the data-information-knowledge system is augmented either through an auxiliary human support system or a technologically augmented (artificial) system which is derived through analysis and design. The data-information-knowledge system, whether human or augmented, supports individuals who are responding to events by establishing new insights and understanding (generation, use, and dissemination of knowledge). Services and institutions (libraries, information centers, etc.) are established to aid such individuals in these functions. The services and institutions are served by humans and augmented by datainformation- knowledge systems who possess specific capacities to do so. Formal institutions of the culture who are directly involved in the production of human needs, or in the management of affairs, require data-information-knowledge systems to respond to events. These organizations or institutions, public or private, require planning, operating, and control of operations. These functions are served by datainformation-knowledge systems, whether organic or augmented.
-
-
Physical Mi0 Cultural Environment
t
INPUTS
OUPUTS Individual / Collective
I [ DATUM
(Library, Brokers. Consultants. Counselors. Journalists, Lawyers,
Changing world D-CK Classification/ c rgorizationof D-I-K
generation. use. and dweminatbn of DCK
1 Support Interlink
D I
K. System
The Human
Organism
-U-
A-T-P-
Acquiiiiii Transmission Processing Sensod Neural Brain States 01 the world. both physical / soda1
Radars
sonar
Support Interlinks
I
Utilization Transfer Judgement Motor Fundion; Interpretation Language
echnoloaiil Coualino lArtiiiiiaU LighL0IJ.d Cable Cornputen Mi Saeliea Lighl, sound. Film
- -
-
Telephones D I
b
Tr
-
I
Achievement Individual Goals Problem Solving Decision Making
7Support Interlinks
I
K System
Support Interlink
I
t
resource arrangements and movements) Poliical intetface. finance
-
I
-
I
Physical Sod0 Cullural Environment
Controlling Resources
FOUNDATIONS OF INFORMATION SCIENCE
369
To restate the above for clarity: Individuals in their day-to-day commerce are in the process of either generating, using, and providing others with awareness of existing conditions or instilling others with the meaning and significance of this awareness. Individuals are aided in this process by technologies that sharpen and quicken the collection and processing of data (counting). They are also aided by institutions that serve as custodians of the outcome (record) or the data processing function (librarians) or agencies or individuals that can apply this outcome of creating insights and new significance (e.g., information analysis centers, brokers, consultants, counselors). Thus, the organismic data-information-knowledge system, augmented by the artificial data-information-knowledge system, provides capacities that are essential in aiding the human to achieve goals and objectives. The increase of awareness and intellect (knowledge) made possible through services and technological augmentation is not restricted to individuals but is also available to institutions, public and private, which are confronted with managerial decision-making and problem-solving requirements in their responsibility to plan, operate, and control operations. These functions are heavily dependent on data collection and the meaning of the data gathered on events that concern them. All of the constituents in this configuration, personal or institutional, are governed by the physical, socio-cultural forces that shape the events of which they need awareness and understanding if they are to grow and survive. Thus, it is in the development of principles regarding the interlinking of the elements of this mapping (indicated by arrows) that the nature of information science as a discipline can be proposed. To a large extent, the principles will be relegated to those that deal with the logistics of record generation, use, and dissemination. In this connection, advances in counting and recording technologies tend to both exacerbate and facilitate the logistical requirement. Growth of knowledge will continue, and its access, storage, use, and dissemination will remain a serious problem. The issue is seen as a matter of design. Design is an engineering activity that presupposes certain fundamentals about form-an integration and synthesis of process and content. If this is indeed the case, then the foundations of information sceince rest in the principles, theories and laws governing the creation of form (whole and parts in unity) and the application of such to design (science of the artificial).
5.
Overview
Information scientists continue to struggle in their attempt to understand the nature of information and the boundaries of their science. The inclination is to refer to the physical sciences in their attempt to understand the
370
ANTHONY DEBONS
phenomena, leaving the questionbf appropriateness at the subliminal level, if even at that. In general, however, determining the essence of information is secondary to more practical concerns. These concerns involve the determination of the most efficient and effective way to access, store, use and disseminate the record of human experience.Dealing with the logistics of such commodities permeates the writings and interests of many scholars and professionals in the field. These concerns also represent the focus for the education of future scholars and professionals. Meanwhile, these concerns continue to be influenced by advances in technology, particularly computers, the use of which has permeated a wide range of public and private institutions. These advances have tended to highlight the important role that the access to information and knowledge has in institutional operations, particularly in problem solving and decision making. This influence is particularly marked in the emergence of decision support, management information and expert systems during the past three or so decades. It is in the structuring (design) of such systems that information science will continue to find definition. As such it will continue to reflect its interdisciplinary character. The design of data-information-knowledge systems will empower us to integrate and apply our intellectual endowment with the challenges of the present together with those of the future. The task is formidable. Implicit in this task is our understanding of form-the orchestration of the human potential with the tools that extend it. Toward this goal, information scientists are fortunate in their predilection to consider themselves interdisciplinary,a predisposing condition for the achievement of synthesis-an amalgamation or coalescence of process and function in achieving specific goals. As it now stands, information science is a field whose basic principles, theories and laws-even its foundations-lie in many disciplines, both applied and theoretic. In its attempt to derive a definition of form, namely, the principles governing information and knowledge that marry the human and technological potentials, a metascience could emerge with the power to serve all disciplines. ACKNOWLEDGMENTS Many people have contributed to the production of this report. Particular appreciation is extended to Dr. Glynn Harmon at the University of Texas, who provided several relevant publications, to Professor Allen Kent, Distinguished Service Professor at the University of Pittsburgh, who reviewed the draft and directed attention to several omissions in the initial draft as well as other valuable comments, to Dr. Esther Horne (University of North Texas), Dr. Robert. Skovora (Robert Morris College), Dr. John Wise (Emery-Riddle University), Dr. Keith Stirling (Brigham Young University) and to Dr. Harold Borko, (UCLA), who also reviewed the initial drafts and provided many constructive comments. Appreciation is also extended to Dr. Anne V. Thompson, Head of the Graduate School of Librarianship at the University of Puerto Rim, who made the resources of the school and its excellent library available for the research necessary in the
FOUNDATIONS OF INFORMATION SCIENCE
371
development of the report. Last but not least, to Dr. Toni Carbo Bearman, Dean of the Graduate School of Library Science at the University of Pittsburgh and Dr. Robert Korfhage, Head of the Department of Information Science at the University of Pittsburgh for their support and the resources they made available in the production of the report.
REFERENCFS
Achleiter, Herbert K., ed. (1987). “Intellectual Foundations for Information Professionals.” Columbia University Press, New York. Alexander, Christopher (1967). “Notes on the Synthesis of Form.” Harvard University Press, Cambridge, Massachusetts. Allen, Walter C., and Lancaster, F. Wilfrid (1981). A Directed Independent Study Approach to a Foundations Course. J . Education for Librarianship 21 (4), 313-324. Artandi, S. (1969). The Relevance of Information Science to Library-School Curricula. American Documentation 20 (4), 337-338. ASIS Bulletin (1988). Educational Foundations for Tomorrow’s Information Scientists. Report from the American Association for the Advancement of Science. Austin, J. L. (1962). “How to Do Things with Words.” Oxford University Press, London. Avram, Henriette D. (1969). Implications of Project MARC. In “Library Automation, A State of the Art Review” (Stephen R. Salmon, ed.), pp. 79-89. American Library Association, Chicago. Bar-Hillel, Yehoshua (1957). A Logician’s Reaction to Recent Theorizing on Information Search Systems. American Documentation 8 (2), 103-1 13. Bar-Hillel, Yehoshua (1964). “Language and Information: Selected Essays on their Theory and Application. Addison-Wesley, Reading, Massachusetts. Bearman, Toni Carbo (1986).National Information Policy: An Insider’s View. Library Trends 35 (l), 105-118. Becker, Joseph (1973). “The First Book of Information Science.” U.S. Energy Research and Development Administration, Office of Public Affairs, Washington, D.C. Becker, Joseph, and Hayes, Robert M. (1963). “Information Storage and Retrieval: Tools, Elements, Theories.” John Wiley and Sons, New York. Belkin, N. J. (1978). Information Concepts for Information Science. J . Documentation 34 (l), 62. Belzer, J., et al. (1971). Curricula in Information Science: An Analysis and Development. J . Am. Soc. Information Science 22 (3), 193-223. Beniger, James R. (1986). “The Control Revolution.” Harvard University Press, Cambridge, Massachusetts. Berkowitz, Marvin (1970). The Conversion of Military-Oriented Research and Development of Civilian Uses. In “Praeger Special Studies in US. Economic and Social Development.” Praeger Publishers, New York. BertalanKy, L. V. (1968). “General Systems Theory.” George Braziller, New York. Bloom, B. S., et al. (1956). “Taxonomy of Educational Objectives: The Classification of Educational Goals. Handbook I: The Cognitive Domain.” David McKay Co., Inc., New York. Borko, Harold, ed. (1962). “Computer Applications in the Behavioral Sciences.” Prentice-Hall, Englewood Cliffs, New Jersey. Borko, H. (1968). Information Science: What is it? American Documentation 19 (l), 3-5. Borko, H. (1989). Personal communication. Boyce, Bert R., and Kraft, Donald H. (1985). Principles and Theories in Information Science. Annual Review of Information Science 20, 153-178. Bradford, S. C. (1950). “Documentation.” Public Affairs Press, Washington, D.C. Brittain, J. M. (1970). “Information and its Users: A Review with Special Reference to the Social Sciences.” Wiley-Interscience, New York.
372
ANTHONY DEBONS
Brittain, J. M. (1987). Information Specialists: New Directions for Education and Training. J. Information Science 136 (6), 321-326. Brookes, B. C. (1967). Education for Research in Information Work. Int. Conf. Education for ScientiFc Information Work, London, April 3-7, pp. 235-242. Brookes, B. C. (1973). Statistical Bibliography and Information Science. In “Perspectives in Information Science” (A. Debons and Wm. Cameron, eds.), pp. 449-456. Noordhoff-Leyden. Brookes, B. C. (1980). The Foundations of Information Science, Part I: Philosophical Aspects. J . Information Science 2 (3), 125-133. Buckland, Lawrence F. (1965). “The Recording of Library of Congress Bibliographical Data in Machine Form: A Report Prepared for the Council on Library Resource, Inc.,” rev. Council on Library Resources, Washington, D.C. Buckland, M. (1987). Emerging Communalities: Library Systems, Record Management, and Management Information Systems. In “Intellectual Foundations for Information Professionals” (Herbert K. Achleitner, ed.). Columbia University Press, New York. Bugliarello, George (1989). “Physical and Information Sciences and Engineering Report.” A Project 2061 Panel Report. American Association for the Advancement of Science, Wa_shington,D.C. Bush, V.(1945). As we may think, Atlantic Monthly 176 (l), 101-108. Carnap, R., and Bar-Hillel, Y. (1952). An Outline of a Theory of Semantic Information. Technical Report 247, Research Laboratory of Electronics, M.I.T. Casey, R. S., Perry, J. W., Berry, M. M., and Kent, A., eds. (1958). “Punched Cards: Their Application to Science and Industry,” 2nd ed. Reinhold Publishing Corporation, New York. Chartrand, Robert Lee (1986). Information Science for Emergency Management. Bull. Am. SOC. Information Science 12 (3), 4-9, 12-13. Cherry, Colin E. (1960). A History of the Theory of Information. Proc. Institute of Electrical Engineers. Cherry, C. (1966). “On Human Communication.” MIT Press, Cambridge, Massachusetts. Churchman, C. West (1971). “The Design of Inquiring Systems.” Basic Books, New York. Cuadra, Carlos A. (1964). Identifying Key Contributions to Information Science. American Documentation 15,289-295. Davis, Charles H., and Rush, James E. (1979). “Guide to Information Science.” Greenwood Press, Westport, Connecticut. Davis, Charles H.,and Shaw, Debora( 1981)f‘A Brief Look at Introductory Information Sciencein Library Schools,”J. Education for Librarianship 21 (4), 341-343. Davis, John S.,and Zunde, Pranus (1984). Empirical Foundations of Information and Software Science. Inf. Proc. Mgmt. 20 (l), 1-18. Debons, A. (1971). Command and Control: Technology and Social Impact In “Advances in Computers” 11 (M. Yovits, ed.). Academic Press, New York. Debons, A., ed. (1974). “Information Science: Search for Identity.” Marcel Dekker, New York. Debons, A. (1980). Foundations of Information Science. In “Theory and Application of Information Research” (D. Harpo and Leif Kajberg, eds.), pp. 75-81. Mansell Publishing, New York. Debons, A. (1986). Information Science. In “ALA World Encyclopedia of Library and Information Sciences,” 2nd ed., pp. 354-358. Alamantine Press Limited, London. Debons, A., and Cameron, William J., eds. (1975). “Perspectives in Information Science.” Nordhoff-Leyden. Debons, A., and Larson, Arvid G. (1983). “Information Science in Action: System Design,” vols, I, 11. Martinus Nijhoff Publishers, The Hague. Debons, A., and Montgomery,X. L. (1974). Design and Evaluation of InformationSystems. In Annual Reviewoflnformarion Scienceand Technology, 9 (C. A. Cuadra, A.W Luke. and J. L. Harris, eds.). American Society of Information Science., Washington, DC.
FOUNDATIONS OF INFORMATION SCIENCE
373
Debons, A., Horne, E., and Cronenweth, S. (1988). “Information Science: An Integrated View.” G. K. Hall, Boston, Massachusetts. Debons, A., King, Donald, Mansfield, Una, and Shirey, D. (1980). “The Information Professional: Survey of an Emerging Field.” Marcel Dekker, New York. Deer, Richard L. (1985). The Concept of Information in Ordinary Discourse. lnf. Proc. Mgmt. 21 (6), 488-499.
Deetz, Stanley, and Mumby, Dennis (1985). Metaphors. Information and Power. In “Information and Behavior,” vol. 1 (Brent D. Ruben, ed.),pp. 369-386. Transaction Books, New Brunswick, New Jersey. de Solla Price, D. J. (1963). “Little Science, Big Science.” Columbia University Press, New York. Donohue, J. C. (1972). A Bibliometric Analysis of Certain Information Science Literature. J . ASIS 23,313-319.
Donohue, Joseph C., and Karioth, N. E. (1966). Coming of Age in Academe-Information Science At 21. American Documentation 17 (3), 117-119. Elias, Arthur W. (1971). “Key Papers in Information Science.” Society for Information Science, Washington, D.C. Fairthorne, R. A. (1975). Information: One Label, Several Bottles. In “Perspectives in Information Science” (A. Debons and Wm. Cameron, eds.), pp. 65-74. Nordhoff-Leyden. Farradane, J. (1979). The Nature of Information. J. Information Science: Principles and Practices. 1, 13-18.
Flynn, Roger R. (1987). “An Introduction to Information Science.” Marcel Dekker, New York. Fox, C. J. (1983). “Information and Misinformation: An Investigation of the Nations of Information, Misinformation, Informing, and Misinforming.” Greenwood Press, Westport, Connecticut. Friedman, Lee (1975). The Measure of a Successful Information and Storage Retrieval System. In “Perspectives in Information Science” (A. Debons and Wm. Cameron, eds.), pg. 379-408. Nordhoff-Leyden. Furth, H.G. (1969). “Piaget and Knowledges: Theoretical Foundations.” Prentice-Hall, Englewood Cliffs, New Jersey. Garfield, Eugene (1977, 1979, 1980). “Essays of an Information Scientist,” vols. 1-3. IS1 Press, Philadelphia. Gelb, I. J. (1952). “A Study of Writing.” University of Chicago Press, Chicago. Goffman, W. (1960). Mathematical Approach to the Spread of Scientific Ideas-The History of Mass Cell Research. Nature 212,449-452. Goffman, William (1970). A General Theory of Communication. In “Introduction to Information Science” (Tefko Saracevic, ed.), pp. 727-747. R. R. Bowker, New York. Gorn, Saul (1967). The Computer and Information Sciences in the Community of Disciplines. Behaoioral Science 12,433-452. Griffith, Belvier C., ed. (1980). “Key Papers in Information Science.” Knowledge Industry Publications, Inc., White Plains, New York. Griffiths, Jose-Marie, and King, Donald (1986). “New Directions in Library and Information Science Education: The Final Report.” King Research, Rockville, Maryland. Harbo, O., and Kajberg, Kerf, eds. (1980). “Theory and Application of Information Research.” Mansell Publishing, New York. Harmon, Glynn (1971). On the Evolution of Information Science. J . Am. Soc. fnformation Science 22 (4), 235-241.
Harmon, Glynn (1976). Information Science Education and Training. Annual Review of Information Science 11, 347-380. Harmon, Glynn (1984). The Measurement of Information. Info. Proc. Mgmt. 20 (1-2), 193-198. Harmon, Glynn (1986). Information Measurement: Approaches, Problems, Prospects. Proc. Am. Soc. Information Science 23. Knowledge Industry Publications, Inc., White Plains, New York.
374
ANTHONYDEBONS
Harmon, W. W. (1970). “Alternative Futures and Educational Policy.” Prepared for the Bureau of Research, US. Office of Education, by Stanford Research Institute, WERIC No. ,67476,9.
Hartley, R. V. L. (1928). Transmission of Information. Bell System Technical J . 7 (7), 535. Havelock, Ronald G. (1971). “Planning for Innovation through Dissemination and Utilization of Knowledge.” University of Michigan, Center for Research and Utilization of Scientific Knowledge, Institute for Social Research, Ann Arbor, Michigan. Hayes, Robert M. (1986). Information Science Education. In “ALA World Encyclopedia of Library and Information Sciences,” 2nd ed., pg. 360. Alamantine Press Limited, London. Heilprin, Laurence B. (1985). “Towards Foundations of Information Science.” Knowledge Industry Publications, White Plains, New York. Heilprin, Laurence B. (1989). Foundations of Information Science Reexamined. Annual Reoiew of Information Science and Technology 24,343-372. Henderson, L. J. (1913). “The Fitness of the Environment.” Macmillan Co., New York. Herner, Saul (1984). Brief History of Information Science. J . American Society for Information Science 35 (3). 157-163. Hoffman, E. (1980). Defining Information: An Analysis of the Information Content of Documents. Info. Proc. Mgmt. 16 (6), 291-304. Hopper, G. M., et al. (1954). The First Glossary of Programming Terminology. Report to the ACM, June 1954. Horton, Forest Woody Jr. (1983). The Emerging Information Manager Professional. I n “Information Science in Action: System Design” (A. Debons and A. Larson, eds.), pp. 751-761. Martinus Nijhoff, The Hague. Houser, Lloyd (1988). A Conceptual Analysis of Information Science.Library Information Science Research 10,3-34. International Federation for Information Processing/InternationalComputation Center (1966). “Vocabulary of Information Processing,” 1st English language ed. North Holland Publishing Co., Amsterdam. Jackson, Eugene B., and Wyllys, Ronald E. (1976). Professional Education in Information Science-Its Recent Past and Probable Future. In “The Information Age: Its Development, Its Impact” (Donald P. Hammer, ed.). Scarecrow Press, Inc., Metuchen, New Jersev. James, Patricia, Wiedenbach, E., and Dickoff, J. (1968). Theory in Practice Discipline. Nursing Research 17,415-553. Kent, A. (1971). “Information Analysis and Retrieval.” Becker and Hayes, New York. Kent, Allen, and Lancour, H., eds. (1973). “Encyclopedia of Library and Information Science,” Vol. 8, pp. 234-257. Kent, Allen, Taulbee, Orrin E., Belzer, Jack, and Goldstein, Gorden D. (1967). “Electronic Handling of Information: Testing and Evaluating.” Academic Press, London. King, Donald W., ed. (1978). “Key Papers in the Design and Evaluation of Information Systems.” Knowledge Industry Publications, White Plains, New York. King, Donald W. (1988). Perspectives of an Information Scientist. Bull. Am SOC.Information Science 14 (2), 20-21. King, Donald W., Roder, Nancy, and Olsen, Harold A., eds. (1978). “Key Papers in the Economics of Information.” Knowledge Industry Publications, White Plains, New York. Kuhn, T. S. (1962). “The Structure of Scientific Revolutions.” University of Chicago Press, Chicago. Lancaster, Frederick W. (19 ). J . of Education for Librarianship 21 (4), 321,324. Licklider, J. C. R. (1960). Man-computer symbiosis. Human Factors Electron 1,4-11. Lindsay, Peter H., and Norman, Donald A. (1972). “Human Information Processing: An Introduction to Psychology.” Academic Press, New York. (2nd ed., 1977)
FOUNDATIONS OF INFORMATION SCIENCE
375
Lotka, A. J. (1976).The Frequency Distribution of ScientificProductivity. J. Washington Academy of Science 16 (12), 317-323. Lukasiewicz, J. (1972). The Ignorance Explosion: A Contribution to the Study of Confrontation of Man with the Complexity of Science-Based on Society and Environment. Trans. New York Academy of Science (Series 2 ) 34 (5), 373-391. Lunin, Lois F., and Cooper, M. (1988). Education of the Information Professional: New Dimensions, New Directions. Introduction and Overview. J. Am. SOC.Information Science 39 ( 5 ) , 308-309. Machlup, Fritz (1962). “The Production and Distribution of Knowledge in the United States.” Princeton University Press, Princeton, New Jersey. Machlup, Fritz, and Mansfield, Una (1983). “The Study of Information: Interdisciplinary Messages.” John Wiley and Sons, New York. Mandelbrot, B. (1966). Information Theory and Psycholinguistics: A Theory of Word Frequencies. In “Readings in Theoretical Social Science” (P. F. Lazarsfeld, ed.). MIT Press, Cambridge, Massachusetts. Martin, J. (1976). “Principles of Data-Base Management.” Prentice-Hall, Englewood Cliffs, New.Jersey. Mason, Donald (1978). “Information Management.” Peter Peregrinus, Ltd., London. Mayer, Adolph (1877). “Geschichte des Princips der Kleinstenaction.” Leipzig. Miller, George A. (1988). The Challenge of Universal Literacy. Science 241, 1293-1299. Monasterio, Xavier (1975). A Note on Paradigms in Information Science. In “Perspectives in Information Science” (A. Debons and Wm. Cameron, eds.). Nordhoff-Leyden. Mooers, Calvin N. (1960). The Next Twenty Years in Information Retrieval-Some Goals and Predictions. American Documentation 11 (3), 229-236. Morris, C. W. (1955). Foundations of the theory of signs. In “International Encyclopedia of Unified Science,” Vol. 1, Part 2, pp. 78-137. University of Chicago Press, Chicago, Illinois. Morris, C. W. (1971). “Writings on the General Theory of Signs.’’ Mouton, Paris. Nesbit, John (1982). “Megatrends: Ten New Directions Transforming Our Lives.” Warner Books, New York. Neumann, von, John (1958). “The Computer and the Brains.” Yale University Press, New Haven, Connecticut. Nyquist, H. (1924). Certain Factors Affecting Telegraph Speed. Bell System Technical J . 3, 324. Otten, K., and Debons, A. (1970). Towards a Metascience of Information: Informatology. J. Am. SOC.Information Science, January-February, 89-94. Pearson, Charles, and Slamecka, Vladimir (1977). A Theory of Sign Structure. Semiotic Science: Bull. Semiotic SOC.Amer. 1 (2), 1-22. Perry, J. W., and Kent, A. (1958).Tools for machine literature searching. Interscience, New York, pp. 3-18. Piaget, Jean (1971). “Biology and Knowledge: An Essay on the Relations between Organic Regulation and Cognitive Processes’’ (Beatrix Walsh, trans.). Edinburgh University Press, Edinburgh, Scotland. Pierce, R. (1961). “Symbols, Signals, and Noise: The Nature and Process of Communication.” Harper and Row Publishers, New York. Poincare, H. (1982). “The Foundations of Science.” University Press of America, Inc., Washington, D.C. Polyani, M. (1958). “Personal Knowledge: Towards a Post Critical Philosophy.” Harper, New York. Popper, Karl R. (1972). “Objective Knowledge: An Evolutionary Approach.” Clarendon Press, Oxford. Porat, M. U. (1977). Information Economy: Definition and Measurement. U.S. Department of Commerce, Office of Telecommunications, Washington, D.C.
376
ANTHONY DEBONS
Pratt, A. D. (1967). “An Essay on the Nature of Information.” Proc. Am. Documentation Institutel Thompson Book Co., Washington, D.C. Pratt, A. D. (1975). Libraries, Economics, and Information: Recent Trends in Information Science Literature. College Res. Libs. 36,33-38. Quine, W. V. (1961). “From a Logical Point of View.” Harper and Row, New York. Rathswohl, Eugene J. (1983). Information Utilization and User Initiatives. In “Information Science in Action: System Design,” vol. I (A. Debons and Arvid Larson, eds.), pp. 120-127. Martinus Nijhoff Publishers, The Hague. Rayward, Boyd W. (1983). Librarianship and Information Research. In “The Study of Information” (F. Machlup and U. Mansfield, eds.), pp. 399-405. John Wiley and Sons, New York. Richards, Pamela S., et al. (1988). Historical Note. J. Am. SOC.Information Science 39(5),301-366. Roe, M. F. H. (1986). The Meaning and Philosophy of Cybernetics. Cybernetics and System: Proc. Seventh Int. Cong. Cybernetics and Systems, vol. 2, pp. 841-844. Rose, J., ed. (1986). Cybernetics and Systems: Proc. Seventh Int. Cong. Cybernetics and Systems, vols. 1, 2. Rosenberg, V. (1974). The Scientific Premises of Information Science. J. Am. SOC.Information Science 25. Rouse, William B., and Rouse, Sandra H. (1984). Human Information Seeking and Design of Information Systems. Info. Proc. Mgmt. 20 (1-2), 129-138. Rouse, W. 9.. and Kenneth, R. B.. eds., 1987. ”System Design: Behavioral Perspectives on Designer, Tools and Organi7ution.“ El\evier Science Puhlishing C o . . Inc.. New York. N Y Ruben, Brent D. (1985). The Coming of the Information Age: Information, Technology, and the Study of Behavior. In “Information and Behavior,” vol. 1 (Brent D. Ruber, ed.), pp. 3-26. Transaction Books, New Brunswick, New Jersey. Ruben, Brent D., ed. (1985). “Information and Behavior,” vol. 1. Transaction Books, New Brunswick, New Jersey. Salton, G. (1973). On Development of Information Science. J. Am. SOC.Information Science 24, 2 18 -220. Saracevic, Tefko, ed. (1970). “Introduction to Information Science.” R. R. Bowker, New York. Saracevic, Tefko (1971). Five years, five volumes, and 2345 pages of the Annual Review of Information Science and Technology. Information Storage and Retrieual7 (3), 141-146. Saracevic, Tefko (1979). An Essay on the Past and Future(?) of Information Science Education I, 11. Info. Proc. Mgmt. 15, 1-15,291-301. Schrader, Alvin M. (1984). In Search of a Name. Information Science and its Conceptual Antecedents. Library Information Science Research 6,227-271. Schreider, Yu.A. (1965). On the Semantic Characteristics of Information. Information Storage and Retrieval 3,221-233. Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical J . 27 (3), 379-423, and (4), 623-656. Shannon, Claude E., and Weaver, Warren (1949). “The Mathematical Theory of Communication.” University of Illinois Press, Urbana, Illinois. Shaw, Ralph (1957). Documentation: Complete Cycle of Information Service. College and Research Libraries 18 (6), 452-454. Shera, Jesse H., and Cleveland, Donald B. (1977). History and Foundations of Information Science. Annual Review of Information Science 12,249-275. Sherif, Muzafer, and Sherif, Carolyn W. (1969). “Interdisciplinary Relationship in the Social Sciences.” Aldine Publishing Co., Chicago. Simon, Herbert A. (1981). “The Sciencesof the Artificial,” 2nd ed. (Karl Taylor Compton Lecture, 1968). MIT Press, Cambridge, Massachusetts. Skinner, B. F. (1937). The distribution of associated words. Psych. Record 1,69-76. Skovira, Robert J. (1986). Information: A Description and Analysis of the Debonian and Foxian Views. Proc. ASIS Annual Meeting 23, pp. 306-309.
FOUNDATIONS OF INFORMATION SCIENCE
377
Slamecka, V. (1980). National Agendas of Research in Information Science: An Overview. Info. Proc. Mgmt. 16,251-257. Steen, Lynn Arthur (1988). The Science of Patterns. Science 240,611-616. Stevens, S., ed. (1974). “Handbook of Experimental Psychology.” John Wiley and Sons, New York. (Updated 1988). Stuart, Robert D., ed. (1978). “Foundation in Library and Information Science,” Vol. 1. JAI Press, Greenwich, Connecticut. Summit, R. K. (1972). DIALOG Interactive Information Retrieval System. In “Encyclopedia of Library and Information Science,” vol. 7. Marcel Dekker, New York. Suppe, Frederick (1985). Toward an adequate information science. In “Toward Foundations of Information Science” (Laurence C. Heilprin, ed.). Knowledge Industry Publications, Inc., White Plains, New York. Swets, J. A,, Tanner, W. P., Jr., and Birdsall, R. C. (1961). Decision Processes in Perception. Psych. Rev. 68,301-340. Tate, Vernon D. (1950). Introducing American Documentation. American Documentation 1,3-7. In Rayward (1983). Taube, Mortimer (1965). The Coming Age of Information Technology. In “The Coming Age of Information Technology” (Vladimir Slamecka, ed.), pp. 1- 10. Documentation Inc., Washington, D.C. Taube, M., et al. (1953-1959). “Studies in Coordinate Indexing,” vols. 1-5. Documentation Inc., Washington, D.C. Taylor, R. S. (1966). Professional Aspects of Information Science and Technology. In “Annual Review of Information Science and Technology” 1 (C. A. Cuadra, ed.). Taylor, R.S. (1973). Curriculum Design for Library and Information Science. In “Education and Curriculum,” Series 1. Syracuse School of Library Science, Syracuse University. Taylor, Robert S. (1986). “Value-Added Process in Information Systems.” Ablex Publishing Co., Norwood, New Jersey. Theiss, H. (1983). On Terminology. In “Information Science in Action: System Design,” vol 1 (A. Debons and A. Larson, eds.), pp. 84-94. Martinus Nifhoff Publishers, The Hague. Thompson, DArcy Wentworth (1942). “On Growth and Form,” 2nd ed. Cambridge University Press, Cambridge. Toffler, Alvin W. (1970). “Future Shock.” Random House, New York. Tribus, M., and McIrvine, E. C. (1971). Energy and Information. Scientijic American 225 (3), 179-188.
Turing, Alan M. (1950). Computing Machinery and Intelligence. MIND 59,433-460. UNESCO (1972). Interdisciplinarity. UNESCO Report. Vagianos, Louis (1972). Information Science: A House Built on Sand. Library J. 97 (2). 153-157. Walker, Donald E., ed. (1967). “Information System Science and Technology.” Academic Press, London. Weaver, Warren (1948). Science and Complexity. American Scientist 36, 536-544. Weinberg, Alan M. (1963). “Science, Government, and Information: The Responsib Technical Community and the Government in the Transfer of Information.” Report of the President’s Science Advisory Committee, 10 January, 1963. U.S. Government Printing Office, Washington, D.C. Weiss, Edward C., ed. (1977). “The Many Faces of Information Science.” Westview Press, Boulder, Colorado. Wellisch, H. (1972). From Information Science to Informatics: A Terminological Investigation. J. Librarianship 4, 157-187. Wersig, Gernot, and Neveling, Ulrich (1975). The Phenomena of Interest to Information Science. Information Scientists 9 (4), 127-140. White, Herbert S. (1983). Defining Basic Competencies. American Libraries 14 (8), 519-525. Whitemore, B., and Yovits, M. (1974). Concept for Analysis of Information. In “Information Science: Search for Identity” (A. Debons, ed.), pp. 29-45. Marcel Dekker, New York.
378
ANTHONY DEBONS
Whyle, Lancelot, ed. (1951). “Aspects of Form,” pp. 91-116. London. Williams, James G., and Kim, Chai (1975). On Theory Development in Information Science. J . Am. SOC.Information Science 26 (l), 3-8. Williams, Martha E. (1988). Defining Information Science and the Role of ASIS. Bull. Am. SOC. Information Science 14 (2), 17-19. Winograd, T. (1972). “Understanding Natural Language.” Edinburgh University Press, Edinburgh. Wise, John A,, and Debons, A., eds. (1987). “Information Systems: Failure Analysis,” NATO AS1 Series. Springer-Verlag. Wittgenstein, L. (1953). “Philosophical Investigations.” Basil Blackwell, Oxford. Wooster, Harold (1988). Historical Note: Shining Palaces, Shifting Sands: National Information Systems. J. Am. SOC.Information Science 39 (5), 307-366. Wright, Curtis H. (1979). The Wrong Way to Go. J. Am. SOC.Information Science 30 (2), 67-76. Yovits, M. C. (1969). Information Science: Towards the Development of a True Scientific Discipline. American Documentation 20 (4), 369-376. Yovits, M. C., Foulk, C. R., and Rose, L. L. (1981). Information Flow and Analysis: Theory, Simulation, and Experiments. 1. Basic Theoretical and Conceptual Development. J. Am. SOC. Information Science 32, 187-202. Zimmerman, Patricia J. (1983). Principles of Design of Information Systems. In “Information Science in Action: System Design” (A. Debons and A. Larson, eds.), pp. 29-30. Martinus Nijhoff Publishers, The Hague. Zipf, G. K. (1949). “Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology.” Hafner Publishing Co., New York. Zunde, P. (1981). Information Theory and Information Science. Info. Proc. Mgmt. 17 (6),341-347. Zunde, P., and Gehl, John (1979). Empirical Foundations of Information Science. Annual Review of Information Science and Technology 14,67-92.
AUTHOR INDEX Numbers in italics indicate the pages on which complete references are given. Bernstein, A,, 96 Berry, M. M., 340,372 Bertalanffy, L. V., 371 Bertcher, H. J., 96 Bice, K., 71, 96 Birdsall, R. C., 337,377 Birge, R. R.,294,296,309,319,321 Blake, J., 204-206,231 Bloom, B. S., 364,371 Boar, B., 9-10,28,96 Bobbio. A,, 209,218,231 Bobrow, D. G., 54,98 Boden, M. A., 280,319 Boehm, 9 . W., 8,96 Bohm, D., 266-267,319 Bolt, R. A., 69, 96 Borko, H., 325,371 Boroush, M., 54,97 Bourlard, H., 112, 128,170 Boyce, B. R., 371 Boyd, M. A., 204,218,231 Bradford, S. C., 330,334,337,371 Bremermann, H. J., 246,319 Brillouin, L., 242,319 Rrittain, J. M., 335,346,371-372 Broedling, L. A., 12,97 Brookes, 9 . C., 331,333,337,361,372 Brooks, F. P., 96 Brown, R. V., 42, 96 Buckland, L. F., 340,372 Buckland, M., 360,372 Bugliarello, G., 372 Bush, M., 100,171 Bush, V., 340,372 Butcher, R. W., 274,322
A Achleiter, H. K., 371 Adelman, L., 32,47,95 Aiken, P. H., 69-70,95-96 Akingbehin, K., 313,319 Aldefeld, B., 112, I70 Alexander, C., 371 Allen, W. C., 346-347.371 Ambron, R., 69,95 Andriole, S. J., 7-8, 14, 35,46-47, 53-54. 57. 61,65,14,95-96 Arai, T., 277,298,322 Arlat, J., 213, 231 Artandi, S., 371 Ashby, W. R., 259,289,319-320 ASIS Bulletin, 348,371 Atwood, M. E., 13, 28, 71,97 Austin, J. L., 330,371 Averbuch, A., 121,170 Aviram, A., 292,319 Avram, H. D., 340,371
B Bahl,L. R., 100,112,121,123,170-171 Baird, B., 303,322 Baker, J. K., 112, 119, I70 Barclay, S., 42, 96 Bard, Y.,201,205,231 Bar-Hillel, Y., 328,371-372 Baroudi, J. J., 21, 96 Barraud, A., 296,319 Baum, L. E., 163, I70 Bavuso, S. J., 205,21 I, 231 Bearman, T. C., 338,362,371 Becker, J., 352-354,356,371 Belkin, N. J., 333,371 Belzer, J., 345,371,374 Bengio, Y.,167,170 Beniger, J. R., 329, 332, 338,371 Benioff, P., 242,319 Bennett, C. H., 242, 251,319 Berkowitz, M., 371
C
Cameron, S.,241,324 Cameron, W. J., 329,338,372 Carlisle, J. H., 14, 96 Carnap, R.,328,372 Carroll, 9 . D., 210,232 Carter, F. L., 292,295,319 379
380
AUTHOR INDEX
Carter, W. C., 204-205,207-208,229,231 Casey, R. S., 340,372 Cashwell, L. F., 8,97 Chang, D. B., 296,309,321 Chartrand, R. L., 372 Chen, F. R., 112,170 Chernavskii, D. S., 296,319 Cherry, C. E., 328,372 Chimento, P. F., 209,231 Churchman, C. W., 364,372 Ciardo, G., 204-205,207-209,211,216,231 Cinlar, E., 183,205,211,231 Clark, L. F., 171 Clarke, T. C., 243,323 Cleveland, D. B., 326,338-339,364,376 Cohen, P. S., 111-112,170 Cole, A. G., 112, 170 Comer, D. E., 318,320 Conrad, M., 237,242,246-247,253-254, 260-261,264-265,268-269,272,275, 277-278,283-284,286-288,294,298, 300,306,313,319-323 Conway, A. W., 204-205,231 Copper, M., 375 Cosell, L., 100, 171 Cowan, J. D., 324 Cox, D. R.,183,211,231 Cronenweth, S., 353-354,356,373 Crouzet, Y., 213,231 Cuadra, A. C., 358,372 D Dal Cin, M., 261,308,320 DARTA/MIT, 65,96 Davis, C. H., 353-354,356,372 Davis, J. S., 330,372 Davydov, A. S., 243,296,320 Debons, A., 327,329-332,338,341-343, 353-354,356,362,365,372-373,375,378 Dee, D., 8,96 Deer, R. L., 330,333,373 Deetz, S., 373 Delgutte, B., 130, 170 DeMori,R., 112-113,120-123, 126,139,149, 151,153, 165,167,170-171 Denning, P. J., 318,320 Dereniak, E., 293,323 DeSanctis, G., 54, 96 de Solla Price, D. J., 373
de Souza e Silva, E., 204-205,207-208,218, 229,231-232 Dickoff, J., 373 Dirac, P. A. M., 267,320 Donnell, M. L., 95 Donohue, J. C., 343-344,373 Draper, S. W., 47,97 Drummond, G. I., 277,320 Dugan, J. B., 204-205,211,213,218,229,231
E Ebeling, W., 290,295-296,320 Edelman, G. M., 284,320 Ehrhart, L. S., 53,96-97 Eisner, H., 3, 97 Eldridge, N., 256,321 Elias, A. W., 373 Elliot, H. C., 285,320 Enk, G., 54,97 Erman, L. D., 120, 124-125,171-172 Esoda, R. M., 50,97
F Fairley, R., 32, 97 Fairthorne, R. A,, 373 Fant,C. G. M., 102, 103,171 Farradan, J., 373 Feistal, R., 290,320 Fennel, D. R., 120,171 Fennel, R. D., 120,124-125,172 Feynman, R. P., 242,320 Finger, J. M., 216-217,232 Fleishman, E. A,, 12,97 Flynn, R. R., 353-354,356,373 Fong, P. J., 113,173 Foster, G., 54,98 Foulk, C. R., 333-334,361,378 FOX,C. J., 330-331,335,361,373 Friedman, L., 365,373 Friend, R. H., 243,320 Frohlich, H., 299,320 Furth, H. G.,331,373 G Galitz, W. O., 97 Gallupe, R. B., 54, 96 Gardner, M. R., 259,320 Garfield, E., 340,373
381
AUTHOR INDEX
Gehl, J., 329, 338,378 Geisler, C. D., 130, 173 Geisler, N. Y. S., 171 Geist, R., 203, 205, 229,231 Gelb, I. J., 363,373 Gibson, J. J., 266,320 Gilloux, M., I71 Gilmanshin, R. I., 243,292,321 Gladstein, D. L., 54,97 Goel, N. S., 288,321 Goffman, W., 356,373 Goldstein, G.D., 241,324,374 Goldstein, H., 267,321 Golubtsov, K. V., 243,275,277,322 Gomaa, H., 97 Corn, S., 361,373 Could, S. J., 256,321 Goyal, A., 204-209,218,229,231-233 Gray, J., 21 1,231 Gray, R. M., 121,171 Greengard, P. C., 275,321 Grief, I., 97 Cries, D., 318, 320 Griffith, B. C., 338,358,373 Griffiths, J.-M., 373 Gulick, R. M., 50,97 Gulyaev, Y.V., 241,321
H Halle, M., 102, I71 Haltsonen, E., 112, 171 Hameroff, S., 293,323 Hameroff, S. R.,277,298,321 Hanazawa, T., 173 Harbo, O., 373 Hare, A. P., 97 Harmon, G., 329,332,337,373-374 Harmon, W. W., 346,374 Hart, S., 54,97 Hartley, R. V. L., 333,374 Hastings, H. M., 242,259,321 Haton, J.-P., 112-1 13, 125-126,171 Havelock, R. G., 341,374 Hayashi, K., 299,322 Hayes, R. M., 348,352-354,356,371,374 He, J., 293,323 Hebb, D. O., 282,321 Heeger, A. J., 295,323 Heffner, R.-M. S., 102, 171 Heilprin, L. B., 327,333, 338,374
Heimann, D., 219,223,231 Heise, G. A,, 111, 172 Henderson, L. J., 374 Hendrick, C., 97 Herner, S., 327,339-340,374 Hice, G. F., 8,97 Hinton, G. E., 128,140-141,167,171-173 Hoffman, E., 329,374 Hoffman, R. C., 243,322 Hofstadter, D., 248,266,281, 316,321 Holewka, D., 303,322 Holland, J. H., 246,261,321 Hong, F. T., 292-294,320-321 Hooper, C., 69,95 Hopfield, J. J., 281, 290,321 Hopper, G.M., 328,374 Hopple, G.W., 47,60-61,65,96-97 Horne, F., 353-354,356,373 Hornick, W., 54, 97 Horowitz, E., 8, 97 Horton, Jr., F. W., 374 Houser, L., 328,343,348,374 Howard, R. A,, 206,232 Howe, R., 204-205,229,232-233 Hsueh, M.C., 178,205,211-212,229,232
I Iannino, A,, 230,232 Ibe, O., 204-205,229,232-233 International Federation for Information Processing/International Computation Center, 374 Ivanitskii, G. R., 293,324 Ives, B., 21,96 Iyer, R., 21 I, 229,232 Iyer, R. K., 178,205,212,232
J Jackson, E. B., 345-347,374 Jacobi, G. T., 241,324 Jakobson, R., 102,171 Jalanko, M., 112,171 James, P., 373 Janas, T., 323 Jelinek, F., 100, 112,121, 123,170-1 71 Jenssen, M., 295-296,320 Joint Chiefs of Staff (JCS), 33,97 Josephson, B. D., 316,321
382
AUTHOR INDEX
K Kahn, D., 112,173 Kahn, K., 54,98 Kaiser, K. M., 21, 98 Kajberg, K., 373 Kampfner, R. R., 286,320 Karioth, N. E., 343-344,373 Kelly, 111, C. W., 42, 96 Kent, A., 335-336,340,352,359,372-375 Kholmanskii, A. S.,241,296,322 Kiang, N. Y.S.,130,170 Kim, C., 361,378 Kim, S.H., 243,322 Kimball, O., 100,171 King, D., 343,373 King, D. W., 338,358-359,362,374 Kirby, K. G., 286-287,320-321 Kirkpatrick, F. H., 298,321 Kohda, M., 112,172 Kohonen, T., 112,171 Kopec, G., 100,152,171 Koruga, D., 277,321 Kotowski, J., 323 Kraft, D. H., 371 Krasner, M., 100,171 Krinsky, V. I., 301,321 Krysinski, P., 323 Kuhn, H., 296,321 Kuhn, T. S.,327,374 Kuhnert, L., 301,321 Kulkarni, V. G., 209,232 Kutnik, J., 323 L
Laface, P., 112, 139, 149, 153, 165, I71 Lafferty, E. L., 50,98 Lakin, F., 97 Lam, L., I71 Lamel, L. F., 151,174 Lancaster, F. W., 346-347,371 Lancour, H., 336,374 Landauer, R., 242,321 Lanning, S.,54,98 Lapne, J. C., 177,213,229-230,231-232 Larson, A. G., 329,338,372 Lavenberg, S.S.,204,206,209,228,231-232 Lawrence, A. F., 294,296,309,319,321 Lazarev, P. I., 292,321
Lea, W. A., 100,112, I72 Ledermann, D., 323 Ledgard, H., 71,97 Lehner, P. E., 97 Leslie, R. E., 8,97 Lesser, V. R., 120, 124-125,172 Leung, H. C., 145, I72 Levinson, L. E., I72 Levinson, S.E., 100,111-112,163,170, 172-1 73 Levitan, I. B., 275,323 Lewis, B. L., 112,170 Lewis, C., 71, 96 Li, V.O., 232 Liberman, E. A., 242-243,275,277,298, 321-322 Lichten, W., 111,172 Licklider, J. C. R., 374 Likharev, K. K., 292,322 Lindsay, P.H., 364,374 Littlewood, B., 230,232 Lotka, A. J., 330,334,337,375 Lukasiewicz, J., 337,375 Lunin, L. F., 375 M Machlup, F., 335,338-339,357,361-362,375 Mandelbrot, B., 334,375 Mansfield, U., 338-339,343,357,361,373,375 Marie, R., 21 1,231 Martin, A. W., 50,97 Martin, J., 365,375 Mason, D., 335,375 Matsumoto, G., 277,298,322 Matsuno, K., 315,322 Matyskiela, W. W., 96 May, R. M., 259-260,322 Mayer, A., 375 Maynard-Smith, J., 254,322 Mayr, E., 256,322 McClelland, J. L., 241,323 McCulloch, W. S.,242,322 McCuskey, R., 293,323 McDaniel, J. C., 296,309,321 McIrvine, E. C., 329,377 Medress, M. F., 112, I72 Medvinskii, A. B., 301 Meister, D., 10,97 Menen, A,, 303,322
AUTHOR INDEX
Mercer,R. L., 100, 111-112,121,123,170-171 Mercier, G., 1 12, I72 Merlo, E., 126, 151, 171 Meyer, J. F., 216,232 Miller, G. A., I1 1, 172, 332, 363,375 Miller, M. I., 172 Minina, S. V., 243, 275, 277, 298,322 Minsky, M. L., 242,255,322 Mjakotina, 0. L., 298 Monasterio, X., 331,375 Mong,Y., 112, 139, 149,153, 165,171 Montgomery, K. L., 365,372 Mooers, C. N., 340,375 Moore,R. K., 113, 115-116, 123,172 Moriizuma, T., 301,322 Morns, C. W., 328,375 Mosier, D., 71, 73, 98 Mulder, M. C., 3 18,320 Mumby, D., 373 Muntz, R. R., 218,232 Muppala, J., 204-205,207-209,216,231 Musa, J., 230,232 Myers, C. S., 112, 117, 172
N Naccache, N. J., 152-153,172 Nagle, J. F., 300,322 Nagle, T., 300,322 Nakatsu, R., 112,172 Naylor, T. H., 216-217,232 Neely, R. B., 120, I71 Nelson, V. P., 210, 232 Nesbit, J., 362, 375 Neveling, U., 335,377 Ney, H.,112, 117,170,172 Nicholson, G. L., 297,323 Nicola, V. F., 209, 232 Nicolis, G., 243, 290,322 Norman, D. A., 47,97,364,374 North, R. L., 65, 97 Nouhen, A,, 112, I72 Nyquist, H., 333,375 0
Okamoto, M., 299,322 Okumoto, K., 230,232 Olsen, H. A., 359,374 Olson, M. H., 21,96
383
Oppenheim,A. V., 102,172 Otten, K., 329, 332,375
P Palakal, M., 126, 151, 171 Panfilov, A. V.,301 Papert, S., 242,322 Pattee, H. H., 289,316,322 Perennou, G., 112,172 Perry, J. W., 335,340,372,375 Person, C., 330,375 Peterson, C. R., 42,96 Phillips, L. D., 42, 96 Piaget, J., 331,375 Pierce, R., 375 Pierrel, J. M., 112, 171 Pinker, S., 71, 97 Pitts, W., 242,322 Plout, D. C., 128,172 Poincare, H., 375 Polyani, M., 331,375 Popp, F. A., 300,322 Popper, K. R., 331,375 Porat, M. U., 362,375 Potember, R. S., 243,322 Potter, A,, 56, 97 Prager, J., 124, 172 Pratt, A. D., 330,376 Pressman, R. S., 8, 97 Prigogine, I., 243,290,322 Probst, D., 113, 120-123,170
Q Quaintance, M. K., 12,97 Quine, W. V., 330,376 Quinton, P., 112, 172
R Rabiner, L. R., 103, 106, 108, 110, 112, 117, 123, 163,172-173 Ragan, J. W., 54,98 Ragland, C., 69,97 Rambidi, N. G., 241,296,322 Ramesh, A. V., 206,208,233 Ramsey, H. R., 13,28,71,97 Rathswohl, E. J., 330,376 Rayward, B. W., 339,376
384
AUTHOR INDEX
Reddy, D. R., 120,124-125,171-172 Reibman, A,, 206-208,216,229,231-232 Reilly, N. P., 54, 97 Rescigno, A., 259,322 Reuhkala, E., 112,171 Richards, P. S., 348,376 Richardson, I. W., 259,322 Riittinen, H., 112, 171 Rizki, M. M., 306,322 Robinson, G . A., 274,322 Roder, N., 359,374 Roe, M. F. H., 376 Rose, J., 376 Rose, L. L., 333-334,371,378 Rosen, R., 259,323 Rosenberg, A. E., 112,173 Rosenberg, V., 376 Rosenblatt, F., 242,323 Rosenthal, A., 265,320 Rosetti, D. J., 178, 205, 212,232 Rossler, 0.E., 282,287, 290, 299,323 Rothmann, E. M., 205.21 1,231 Rouat, J., 126, 151, 171 Rouse, S. H., 376 Rouse, W. B., 61-63,98,376 Royce, W. W., 8,98 Ruben, B. D., 376 Rumelhart, D. E., 128, 140-141, 167, 173, 241,323 Rush, J. E., 353-354,356,372 Ruske, G., 112,173
S Sachs, M. B., 137,172-174 Sage, A. P., 3, 54,61-63,98 Sagiv, J., 296, 323 Sahner, R., 204-205,207-208,211,216,232 Sakai, T., 299,322 Sakoe, H., 117, 173 Salamon, Z., 323 Salton, G., 376 Sand berg, W. R., 54,98 Sandler, Y. M., 241,296,322 Sandomirskii, V. B., 241,321 Saracevic, T., 335,352, 356,376 Sarin, S., 97 Sathaye, A., 205,229,232-233 Scagliola, C., 112,173
Schafer, R. W., 102-103, 106, 108, 110, 123, 172-1 73 Schatzoff, M., 201, 205,231 Schneiderman, B., 73,98 Schneiker, C., 293,323 Schotola, T., 112,173 Schrader, A. M., 329,343,376 Schreider, Y. A., 330,335,376 Schreiffer,J. R., 295,323 Schwartz, R., 100,171 Schwartz, R. M., 102-103, 111,173-174 Schweiger, D. M., 54,98 Scott, D., 97 Sejnowski, T. J., 128,171 Selvidge, J., 42, 96 Seneff, S., 130, 135 Sericola, B., 21 1,231 Shannon, C. E., 261,323,328,333,376 Shastri, L., 128, 173 Shaw, D., 372 Shaw, R., 340,376 Shera, J. H., 326,338-339, 364,376 Sherif, C. W., 376 Sherif, M., 376 Shikano, K., 173 Shinghal, R., 152-153,172 Shipman, D. W., 112, I73 Shirey, D., 343,373 Shklovsky-Kordy,N. E., 275,277,298,322 Shooman, M. L., 205,232 Siatkowski, R. E., 319 Siewiorek, D. P., 205,233 Silvester, J. A., 228, 232 Simon, H. A., 365,367,376 Sinex, D. G., 130, I73 Singer, A., 71,97 Singer, S. J., 297,323 Siroux, J., 112, I72 Skinner, B. F., 334,376 Skinner, T., 103, I73 Skinner, T. E., 112,172 Skovira, R. J., 376 Slamecka, V., 330,333-334, 360,375,377 Smalz, R., 286, 323 Smith, R., 216,229,232 Smith, R. M., 206,208,233 Smith, S. L., 71,73, 98 Smith, W. E., 205, 21 1,231 Smotherman, M., 205,229,231 Sneff, S., I73
385
AUTHOR INDEX
Sondhi, M. M., 112, 163,172-173 Soskin, M. S., 293,324 Speck, K. R., 243,322 Srinivasan, A., 21,98 Stapp, H., 316,323 Stebbins, G. L., 256,323 Steen, L. A,, 377 Stefik, M., 54,98 Stetyick, K. A., 243,322 Stevens, K. N., 173 Stevens, S., 337,377 Stewart, W. J., 206, 209,233 Street, G. B., 243,323 Stuart, R. D., 327,377 Su, W. P., 295,323 Suchman, L., 54,98 Sugi, M., 296,323 Sukhanov, A. A,, 241,321 Summit, R. K., 340,377 Suppe, F., 361,377 Sutherland, E. W., 274,322 Swarz, R. S., 205,233 Swets, J. A,, 337, 377 Szymanski, T. G., 112,170
Tsukita, S., 277, 298,322 Tucker, A,, 318,320 Turing, A. M., 363,377 Turner, A. J., 318,320 Turner, W. S., 8,97 U U. S. Department of Defense, 212,233 UNESCO, 377 V
Vagianos, L., 342, 348,377 Van Duyn, J., 25,98 Veeraraghavan, M., 204-205,216,218,231, 233 Voelker, M., 293,323 Volkenstein, M. V., 292, 296, 323 von Forester, H., 242,324 von Neumann, J., 242,324 Vsevolodov, N. N., 293,324
W T Tachmindji, A. J., 50, 98 Tanner, Jr., W. P., 337,377 Taranenko, V. B., 293,324 Tate, V. D., 339, 377 Tater, D., 98 Taube, M., 377 Taulbee, 0.E., 374 Taylor, R. S., 332, 335, 348,362,377 Theiss, H., 328,377 Thierauf, R. J., 44-45, 98 Thom, R., 259,323 Thomas, E. C., I71 Thompson, D. W., 377 Thompson, R. L., 288,321 Tien, H. T., 243,301,323 Tkach, Y.Y.,241,321 Todua, K. S., 241,296,322 Toffler, A. W., 362,377 Trenary, R., 286,323 Tribus, M., 329,377 Triestman, S. N., 275,323 Trivedi,K.S., 183-185,201,203-209,211, 213,216,218,229,231-233
Waibel, A., 173 Walker, D. E., 112,173,342,377 Waltrous, R. L., 128, 173 Waner, S., 242,321 Watanabe, T., 171 Weaver, W., 328,333,376-377 Weinberg, A. M., 377 Weiss, A. H., 50, 98 Weiss, E. C., 338,377 Wellekens, C. J., 128,170 Wellekens, J., 112, 170 Wellisch, H., 328, 343, 348,377 Wersig, G., 335,377 White, G. M., 113, 173 White, H. S., 346,377 Whitemore, B., 378 Whiteside, J., 71, 97 Whyle, L., 378 Wiedenbach, E., 373 William, R. J., 167 Williams, J. G., 361,378 Williams, M. E., 338,378 Williams, R. J., 128, 140-141, 173 Wilpon, J. G., 112, I73
AUTHOR INDEX
Winfree, A. T., 301,324 Winograd, S., 261,308,324 Winograd, T., 328,378 Wise, J. A., 342,378 Wittgenstein, L., 315,324,330,378 Wohltjen, H., 319 Wolfram, S., 291,324 Wolpert, L., 280,324 Woods, W. A., 112,174 Wooster, H., 338,378 Wright, C. H., 329,331,361,378 Wright, S., 253,324 Wylly~,R. E., 345-347,374
Y Yates, F. E., 231,324 Young, E. D., 137,173-174
Young, P. R., 318,320 Yovits, M. C., 241,324, 329,332, 333-334, 361,378 Z
Zarnalin, V. M., 241,296,322 Zhang, A. F., 294,319 Zirnrnerman, P. J., 365,378 Zipf, G. K., 330,334,337,378 Zopf, G . W., 242,324 Zue,V. W., 103, 112,145,151,172-174 Zunde, P., 329-330,333,338,361,372,378
SUBJECT INDEX A
Ability requirements taxonomy, 12, 18 Adelman’s multi-attribute utility evaluation model, 47,49-50 Affricates, acoustic characteristics, 109, 111 American Documentation Institute, 339 American Society of Information Science, special interest groups, 343-345 Applications prototyping, 9 Artificial intelligence, command and control information and decision systems, 46-47.65-67 Automatic speech recognition systems, 85-169, see also Ear model; Speech; Vocal tract model advantages of voice input, 100 approach to problems, 168 combination of ear model and multi-layer networks, 168 Error Back Propagation Algorithm, 169 Harpy system, 118-121 hidden Markov Models, 169 knowledge-based systems, 123- 127 blackboard model, 124-125 hierarchical model, 124- 125 knowledge, forms of, 124 rule-based expert systems, 124-126 training material, quality of, 123 link weights, 127-128, 141 Markov modeling system, 118-1 19, 121-123 mathematical methods, 111 multi-layer network model, 127-129 network-based systems, 118-119 parametric and nonparametric methods, 112-113 research goal, 100 suectra and Drosodics. 111-1 12 speech as composite signal, 111 template matching, 113-118 absolute pattern match, 113 best absolute time alignment, 113 cumulative distance matrix, 115-1 17 distance matrix, 115 dynamic time warping, 114-1 15, 118
linear time-normalization, 113-1 14 local decision function, 115-1 16 local differences, 114 nonlinear time-normalization, 114-1 18 sequence comparison methods, 114 tests, 129 Availability, 228, see also Dependability analysis; System availability
B Bacteriorhodopsin, molecular computing, 292-294 Balls-and-springs model, enzyme organization, 26 1- 263 Basilar membrane, simulation, 130, 132 Batch learning, 141 Behavior description approaches, 12 Behavior requirements approaches, 12 Belousov-Zhabotinsky reaction, 301,312 Bibliometrics, 337 Bilayer membrane, 297-298 Biochip, 244 Biochrome film, 293-294 Biocomputing, 244 Bioelectronics, 245 Biological cells, as M-m architectures, 274--280 Biology-driven goal, 240-241 Biomolecular information processing, 245 Biosensor design, 301- 303 Blackboard model, automatic speech recognition systems, 124-125 Block structure, molecular computing, 260 Brain neurocomputer architecture, 283-286 as neuromolecular computer, 280-282 BROKER system, 66-67 Brownian computing, 312
C CAMP, 274~275,277 Capacity-oriented availability, 188-189 Cellular processors, 271-272 Classification, 336-337 307
388
SUBJECT INDEX
Coarticulation, 102 Cognitive science, user-computer interface technology, 69 Command and control information and decision systems engineering, 39-48 computer science and A1 methods, 45-47 decision analytic methods, 42,44 human factors engineering methods, 47 modeling and prototyping, 40,42 operations research methods, 44-45 requirements analysis, 39-41 systems evaluation, 47,49-50 tasks/methods matching process, 47-48 technologies, 43 group planning prototype, 52-56 ISE results, 56 requirement analyses, 53-54 storyboard prototypes, 54-56,89-94 next generation, 57-76 AI, 65-67 emerging issues and challenges, 57 hardware, 72-74 integrated information and decision support, 74-76 models and methods, 60-68 neural-network-based models, 65-66 range of support, 57-59 user-computer interface technologies, 68-72 operational systems, 52 requirements, 35, 38-39 substantive add user-computer interface requirements, 77-89 working prototypes, 50-52 Command and control process, 32-37 decision-making, 34-37 force effectiveness,32-33 Component redundancy, molecular computing, 260-261 Computer science, command and control information and decision systems, 45-46 Conformation-driven computing, 290 Connectionist models, 281 Connectionist neural nets, 310 Connectionist neural network models, 287 Consciousness, information and, 331 CONSCREEN, 50 Consonants acoustic characteristics, 103, 105 glides, 107
liquids, 107 plosives, 107-109 Continuous Density Hidden Markov Model, 161,163-165, 167 Counting, information science, 363 Coverage probabilities, 212- 2 13 Critical activity profiling methods, 20-21 Cumulative distance matrix, 115-117 Cyclic nucleotide system, biochemistry, 275-276 Cytoskeletal information processing, 277 Cytoskeletal structures, 297- 298
D Data, definition, 328 Data-information -knowledge systems, 364-369 componential integration, 365,367-369 components, 364-365 EATPUTr cross matrix, 365-366 relationships, 367-368 Data processing technology, information science, 341-342 Decision analytic methods, command and control information and decision systems, 42,44 Decision-making, 334-335 Dependability, definition, 176-177 Dependability analysis, 176, see also System reliability comparison of sensitivities, 223, 225 coverage, 221-222 design stage, 179 evaluation, 200,219 focus for product-improvement efforts, 177- 178 frequency of full-system outages, 227 maintenance stage, 179 manufacturing stage, 179 measures basic availability, 186-188 capacity-oriented availability, 188- 189 classes, 180 degraded-capacity time, 190 example, 184-186 exponential distribution, 182-183 guidelines for choice, 180-182 mean time to failure, 184 mean time to repair, 184
389
SUBJECT INDEX
memoryless property, 183 reliability function, 183 steady-state balance equations, 185 tolerance availability, 188 tolerance capacity-oriented availability, 189 user tolerance patterns, 181-192 operations stage, 179 reboot time, 220-222 repair time, 220-221 requirements and planning stage, 179 results summary, 224-225 safety and risk issues, 179 sales and deployment stage, 179 sensitivity analysis, 200,219-221 specification determination, 200,221 -222 specification stage, 179 summary of measures, 199 system availability, 226 system description, 2 18- 2 19 system reliability, 226 task completion, 196-198,226 tradeoff analysis, 200-201,222-223 types of, 200-203 usage, 179 Dependability modeling, 201, 203-218, see also Markov model analytic models, 204 combinatorial models, 204 fault/error-injection experiments, 213 field-measurement data, 213 future-work, 230 Monte Carlo simulation models, 203-204 parameter determination coverage probabilities, 2 12-2 13 failures rates, 210-212 repair rates, 213 reward rates, 213-216 reconfiguration time, 220-222 solution techniques, 205-209 structural modeling, 213 validation and verification, 216-218 Descriptive cognitive taxonomies, 18 Direct manipulation interfaces, command and control information and decision systems, 71 Distance matrix, automatic speech recognition systems, 115 Documentation, 32, 339-340 Dynamic time warping, 114-1 18
E
Ear model, 129- I50 adaptation module, 136-137 basilar membrane simulation, 130, 132 batch learning, 141 block diagram, 130- 13I centers of gravity, 143-144, 147-148 confusion matrix, 141-143 eighth nerve fiber responses, 130 Error Back Propagation Algorithm, 140- 141 errors, 148-149 error tree, 145-146 filter bank structure, 130, 133, 135 hair cell synapse model, 134-135 hidden units, 140 membrane model, 136 MLNs, 139-141 neural network used for vowel recognition, 139 NSET, 145, 147 performances of vowel recognition system, 141- 142 phonetic feature recognition, 142-145 recognition of new vowels and diphthongs, 145,147-149 speaker-independent recognition of vowels, 137, 139-142 subjective probability for a feature, 144 synchrony detector, 137-139 transduction module, 135 updating link, 141 vowel representation using phonetic features, 142-143 VSET, 137 word models, 149- I50 WSET, 137 EATPUTr cross matrix, 365-366 Efficiency,molecular computing, 247,249 programmability versus, 249-252 Electronic-conformational interactions, 296-297 Electronics-driven goal, 240-241 Encyclopedias, information science, 359 Enzyme organization, balls-and-springs model, 261-263 Error Back Propagation Algorithm, 140-141, 169 Errors, 210
390
SUBJECT INDEX
Essays, information science, 357-358 Evolutionary architectures, molecular computing, 313-3 14 Evolutionary selection circuits systems, 283- 284 Excitons, 295 Expert systems, rule-based, automatic speech recognition systems, 124-126 Exponential distribution, 182-183 Extradimensional bypass M-m scheme, 287-288 molecular computing, 256-261
F Face validation, 217 Failures, 210, see also Mean time to failure age dependency of, 21 1 component failure rates, 211-212 inter-failure time distribution, 21 1 rates, 210-212 source of, 210 types of, 210-211 Faults, 210 Feature of intrinsic parallelism, 261 Fiber optics, molecular computing, 311 Field model, 335-338 Finite automata, dynamics, 289 First messenger, 273 Fitness, molecular computing, 253 Flowcharting methods, 25 Formants, 101-102 Forward-Backward algorithm, 163 Free-energy minimization process, 289-290 Fricatives, acoustic characteristics, 109-110 Friction, 101 G
Genotype, 252-253 Glides, acoustic characteristics, 107 Gradualism condition, 255-256 Group planning prototype, command and control information and decision systems, 52-56 Group problem-solving requirements, command and control information and decision systems, 54
H Hair cell synapse model, 134-135 Hardware next-generation command and control information and decision systems, 72-74 requirement, prototyping, 31-32 Hardware-software configuration, prototyping, 31-32 Harpy system, 118-121 recognition process, 118 subnetworks, 118-119 vector quantizer encoder, 120-121 word juncture rules, 119 zapdash parameters, 119 Hearsay 11, 120, 124,126-127 Hidden Markov models, 169 Hierarchical model, automatic speech recognition systems, 124-125 Human factors engineering methods, command and control information and decision systems, 47 Hybrid systems, molecular computing, 313 Hydrogen bond network, 286 I
Immune system, m-m scheme, 280 Indexing, 336 Information definition, 328-329 logistics, 361-362 measurement field model, 335-338 physical science model, 332-335 minimal amount, 337 nature of, 327-332,361 consciousness, 331 matter-energy postulates, 329 semantic ascent, 330 semiotics, 330 sociology, 362 Information science automation, 340 congresses, 342 counting, 363 curriculum development, core courses, 350-351
391
SUBJECT INDEX
data-information-knowledge systems, see Data-information-knowledge systems. data processing technology, 341-342 development, 345-352 foundations course outline, 346-347 program outline, 346 topics on concept of information, 349 documentation movement, 339 effects of computers, 325-326 encyclopedias, 359 essays, 357-358 foundations, 326-327 historical perspective, 338-342 key papers, 358-359 nature of, 325 overview, 369-370 recording, 363-364 research activities, 359-360 special interest groups, 343-345 surveys, 342-345 technology effects, 370 terminology, 328-329 texts, 352-357 themes, 360-362 Information systems, history, 2-3 Information systems engineering, 2-5 process, 4-32 blueprint, 4-5 conventional design methods and models, 8-9 organizational/doctrinal profiling methods, 20-22 prototyping, see Prototyping requirements analysis methods, 10-12 task requirements analysis methods, 12-16 task/user/organizational-doctrinal matrix, 21, 23-24 user profiling methods, 15, 17-19 Information transfer, 362 Informon, 334 Input-ouput validation, 217 Integrated circuit technology, paradox of power, 241-242 Intermittent failures, 210, 219-220
K Key papers, information science, 358-359 Kink-type solitons, 295-296
KNOBS, 50,52 Knowledge-based systems, automatic speech recognition systems. 123-127 Knowledge representation techniques, command and control information and decision systems, 46-47
L Langmuir-Blodgett film method, 31 1 Largeness avoidance, 204 Largeness tolerance, 204- 205 Library, institutionalization, 364 Line Tracing Algorithm, 156- 157 Liquids, acoustic characteristics, 107 Literature, obsolescence, 337 Local decision function, 115-116 Luhn, Hans Peter, 340
M M-m architecture, 3 16-3 17 M-m scheme, 269-289 biological cells as, 274-280 brain, as neuromolecular computer, 280-282 cellular processors, 271-272 connectionist models, 281 connectionist neural network models, 287 connections among processors, 271 cyclic nucleotide system biochemistry, 275-276 cy toskeleton dynamics, 286 as medium for fast signals, 277 double dynamics principle, 278-279 endogenous production of CAMP,277 enhance pattern recognition capability, 272 evolutionary learning algorithm, 287-288 evolutionary learning component simulation, 286 evolutionary selection circuits systems, 283-284 extradimensional bypass, 287-288 first messenger, 273 functional approach to biology, 282 immune system, 280 kinase enzymes, 278 macroscopic and mesoscopic constraints, 273-214
392
SUBJECT INDEX
mesoscopic, 271 models and simulations, 282-289 nerve impulse activity, 275 neurocomputer architecture, 283-286 positional information, 280 reference neuron system, 284-286 regulatory proteins, 279 second messenger, 273-274 shaped-based processes, 288 symbolism, 271 tactilization model, 272, 278 terminology, 272-273 Markov chain, differential equations, 184- 185 Markov model, 118-119,121-123,152 control strategies, 121 instantaneous availability, 207-208 interval availability, 208 language model, 122 largeness avoidance, 204 largeness tolerance, 204-205 recognition as communication problem, 122 reward models, 206-207 reward rates, 214-216 state transition rate diagram, 205 steady-state availability, 206-207 successive overrelaxation, 206 system reliability, 208-209 task completion, 209 t wo-processor system, 184 uniformization, 207 Markov property, 183 Master menu structures, command and control information and decision systems, 55 May’s theorem, 259 Mean time to failure, 192 Markov model, 209 processor, 220-221 processor intermittent, 219-221 Membrane model, 136 Memory-based designs, molecular computing, 311-312 Memoryless property, 183 Mirror-based designs, molecular computing, 311-312 Mitchell hypothesis, 293 MLNs, 139-141,167-168 Modeling, see also Dependability modeling command and control information and decision systems, 40,42
techniques, 25-28 value of, 282 Molecular computer factory, 303-309 as ecosystem, 306 criteria for selection, 306 elements, 304-305 stage of initial conception, 305 tools for creating components, 305-306 Molecular computing, 236-237 analogy between DNA and a computer program, 238-239 applications, 3 18 architectures, 307-317 Brownian computing, 312 conformation- and dynamics-driven designs, 312-313 connectionist neural nets, 310 evolutionary, 313-314 fiber optics, 3 11 hybrid systems, 313 interpretation, 315 intrinsic ambiguity, 315- 316 Langmuir-Blodgett film method, 31 1 measurement, 316-3 17 mirror-based designs, 311-312 optical, 311-312 optical linkages, 309 parallel designs, 309 - 3 11 structurally programmable and nonprogrammable, 314 towards cognitive computation, 314-317 types, 307-308 von Neumann, 308-309 biology-driven goal, 240-241 biophysics influence, 243-244 efficiency, 247-249 electronics-driven goal, 240-241 electronics influence, 241 energy dissipation, 242 enzymatic pattern recognition, 238-239 evolution to target protein, 254-255 evolutionary adaptability, 247-248,250 evolutionary processes, 246 evolvability versus programmability, 252-256 extradimensional bypass, 256-261 asymptotic orbital stability, 259 block structure, 260 component redundancy, 260-261 dynamical description, 257-258
393
SUBJECT INDEX
higher-dimensional space, 256-257 mapping peak structure, 257 May’s theorem, 259 multiplicity of weak interactions, 261 predator-prey models, 259 self-simplification,260 stability, 259 fitness, 253 gradualism condition, 255-256 growth rate of polynominal-type problems, 250-251
modes, 289-303 active media, 299 bacteriorhodopsin, 293-294 Belousov-Zhabotinsky reaction, 301,3 12 biochrome film, 293-294 biosensor design, 301-303 coherence, 299 collective dynamics, 299- 301 electron mobility, 292 electronic-conformational interactions, 296-297
energy-driven, 289-290 entropy-driven, 290 hydrogen bond network, 286 membrane and cytoskeleton dynamics,
quantum, 265-269 side effects, 251 rationale for, 266 self-organization, 248 stage of technological development, 237 strong principle of inheritance, 252-253 structural programmability, 247-248 structurally programmable parallelism, 250
structurally programmable systems, 254 technological prospects, 3 18 terminology, 244-245 tradeoff principle, 246-249 Turing-Church thesis, 248 Molecular electronics, 244 Molecular functional systems, 245 Multi-criteria methods/models assessment, command and control information and decision systems, 62-63 Multi-layer network model, automatic speech recognition systems, 127-129 Multi-Layered Networks, 139-141,167-168 Multimedia technology, command and control information and decision systems, 69-71
N
297-298
membrane excitability, 299 - 286 Mitchell hypothesis, 293 optical coupling, 293 polymers with switching function, 292-293
quasi-particles, 295-296 self-assembly model, 302- 303 shape properties, 292 signal integration, 291,294, 302-303 specificity-driven,290 state-determined, 289 water, 286 molecular adaptive surface, 253-254 molecular biotechnology influence, 244 natural selection, 252 pattern recognition, 237 physics influence, 242 polymer chemistry influence, 243 principle of universal simulatability, 266 programmability, 246-247 versus efficiency, 249-252 protein engineering, 261-265 proteins versus transistors, 238-240
Nanoelectronics, 244-245 Narrative methods, 23,25 Nasals, acoustic characteristics, 105-106 Network-based systems, automatic speech recognition systems, 118-119 Neural network-based models, command and control information and decision systems, 65-66 Neurocomputer architecture, 283-286 NSET, 145, 147
0 OBIKV, 50 Operational systems, command and control information and decision systems, 52 Operations research methods, command and control information and decision systems, 44-45 Optical architectures, molecular computing, 311-312
Optically active polymers, 243
394
SUBJECT INDEX
Organic polymers switching function, 292-293 with metallic properties, 243 Organization requirements analysis process, 20-22 Organizational/doctrinalprofiling methods, 20-22
P Parallel designs, molecular computing, 309-311 Pattern match, absolute, automatic speech recognition systems, 113 Pattern recognition, 237 enzymatic, 238- 239 self-assembly model, 302 Peak-type soliton, 296 Permanent failures, 210 Phenotype, 252-253 Phonemes, acoustic characteristics, 103-1 11 affricatives, 109,111 consonants, 103,105 fricatives, 109-1 10 glides, 107 liquids, 107 nasals, 105-106 plosives, 107- 109 vowels, 103-105 Photons, 295 Physical limits of computing, 242-243 Physical science model, 332-335 Pitch, 101 Plosives, acoustic characteristics, 107- 109 Positional information, 280 Precision, 336 Predator-pray models, 259 Principle of universal simulatability, 266 Processor reliability, versus reboot times, 222-223 repairs, frequency, 194, 196 Product engineer, 3 Product selection criteria, 177-178 Programmability, molecular computing, 246-247 Project 2061,348-349 Protein engineering, 261-265 balls-and-springs model, 261-263 conformations, 263 evolvability, 264
joint entropy, 263 mutation buffering, 263 Protein enzyme, switching activity, 239 Proteins, molecular computing, 238-240 Prototyping, 6-7,9-10 command and control information and decision systems, 40,42,50-52 documentation and maintenance, 32 flowcharting methods, 25 generic model-based methods, 25-28 hardware requirements/hardware-software configuration, 31- 32 narrative methods, 23,25 screen display and storyboarding methods, 28-31 software specification and engineering, 3 1 testing and evaluation, 32 Punctuated equilibrium model, 256
Q Quantum molecular computing, 265-266 barrier penetration, 268 classical and nonclassical description, 266-267 Hamilton-Jacobi version, 267 parallelism, 268 potential computing power, 268 Schrodinger representation, 267 simulating machine, 265-266 superposition principle, 267-268 Turing machines, 266 Quasi-particles, molecular computing, 295-296
R Reboot time mean, 220-222 versus processor reliability, 222-223 Recall, 335-336 Reconfiguration time, mean, 220-222 Recording, information science, 363-364 Redundancy analysis, 228 Reference machine, 265 Reference neuron system, 284-286 Reliability, see also System reliability improvement, 178 processor, versus reboot times, 222-223 Reliability function, 183, 190-191
395
SUBJECT INDEX
Repair frequency, processor, 194, 196 rates, 213 time, mean, 184,220-221 Requirements analysis methods, 10-12 Requirements activities, information science, 359-360 Reward rates. 213-216 S
Safe-Point Thinning Algorithm, 153 Schrodinger representation, molecular computing, 267 Screen display methods, 28-29 Second messenger, 273-274 Second-messenger reaction scheme, 275-276 Self-assembly model, pattern recognition, 302 Self-organization, molecular computing, 248 Semantic ascent, 330 Semi-vowels,acoustic characteristics, 107 Sensitivity analysis, 200, 219-221 Shannon and Weaver's theory, 333-334 Shannon's information theory, 333-334 Signal integration, 291,294,302-303 Skeletonization, 152- 153 Skeletonization algorithm, 153- 158 Line Tracing Algorithm, 156-157 restrictions, 153 smoothing, 156, 158 spectral peaks, 154 thinning, 155 Software, specification and engineering, 3 1 Solitons, 295-296 Special interest groups, information science, 343-345 Specification determination, 200 Specificity-driven computing, 290 Speech, 101-11 1, see also Phonemes coarticulation, 102 formants, 101-102 friction, 101 fundamental frequency, 101 organs of production, 101- 102 sound sources, types of, 101 Speech input and output, 71 Speech recognition, as communication problem, 122 Sputnik scientific renaissance, 341 Storyboards group planning prototype, 89-94
methods, 29-31 prototypes, command and control information and decision systems, 54-56 Strong principle of inheritance, 252-253 Structural programmability, molecular computing, 247-248 Structurally programmable parallelism, molecular computing, 250 Structurally programmable systems, molecular computing, 254 Superposition principle, 267-268 Switching, energy dissipation, 242 Synchrony detector, 137-138 System availability, 180, 226 basic availability, 186-188 capacity-oriented availability, 188- 189 degraded-capacity time, 190 forms, 186-187 instantaneous, Markov model, 207-208 interval, Markov model, 208 measure summary, 199 steady-state, Markov model, 206 tolerance availability, 188 tolerance capacity-oriented availability, 189 System outage, see System reliability System reliability, 180, 190-196,226 degraded-capacity incident frequency, 194 failure criteria, 190-191 incident frequency, 192 Markov model, 208-209 mean time to failure, 192 measure summary, 199 outages due to lack of processors, 194 over-tolerance outages, 192, 194 processor repair frequency, 194, 196 reliability function, 190-191 system outages, frequency and duration, 194- 195 unreliabilities of sample system, 192-193 System sizing, command and control information and decision systems, 55-56 Systems design, in perspective, 7-16 Systems engineer, 3 Systems engineering, 3
T Tactilization model, 272, 278 Task characteristics approaches, 12
396
SUBJECT INDEX
Task completion, 180, 196-198,226 interruption probability due to any interruption, 196, 198 due to over-tolerance interruption, 198 Markov model, 209 measure summary, 199 odds against interruption, 197 Task-completion, reward structure, 215-216 Task-requirements analysis methods, 12-16 Task taxonomies, 13 Tasks/methods matching process, command and control information and decision systems, 47-48 Task/user/organizational-doctrinalmatrix, 21,23-24
Template matching, 113-118 Texts, information science, 352-357 Time alignment, best absolute, automatic speech recognition systems, 113 Time-normalization, automatic speech recognition systems linear, 113-114 nonlinear, 114-1 18 Tolerance availability, 188 Tradeoff analysis, 200-201,222-223 Tradeoff principle, molecular computing, 246-249
Transfer function, transduction module, 135 Transient failures, 210-211 Transistors, versus proteins, molecular computing, 238-240 Turing-Church thesis, 248
U User-computer interface, command and control information and decision systems, 53-54 requirements, 83-89 technology, 68-72 Users classes, 15, 17 profiling methods, 15, 17-19 requirements analysis methodology, 17-19 User/task/organizational requirements matrix, 10-1 1
V Validation, dependability modeling, 216-218 Vector quantizer encoder, 120-121 Verification, dependability modeling, 216-218
Vocal tract model, 150-168 acoustic property descriptors, 151 frequency relations among spectral lines, 159-168
continuous density hidden Markov model, 161, 163-165,167 Forward-Backward algorithm, 163 learning and recognition method, 163-164
Markov chains, 165 Multi-Layered Networks, 167-168 parameter characterization, 160- 163 Places of Articulation, 159-160 quasi-stationary interval, 161-162 recognition of stationary segments, 164- 168
recognition results for vowels, 167 relation with phoneme classes, 160-163 transition probabilities, 166 vocabulary for vowel recognition, 164 line description, 156, 159 Line Tracing Algorithm, 156-157 morphological properties, 150 Safe-Point Thinning Algorithm, 153 skeletonization, 152- 153 skeletonization algorithm, 153-158 spectrograms, 151- 152 Speech Units, 151 Voice input, advantages, 100 von Neumann architectures, molecular computing, 308-309 Vowels, acoustic characteristics, 103-105 VSET, 137
W Water, molecular computing, 286 Weak interactions, multiplicity, molecular computing, 261 Word models, 149-150 Worldwide Military Command and Control System, 75 WSET, 137
Y Yovits’s model, 334-335
Z Zapdash parameters, 120
Contents of Previous Volumes Volume 1 General-Purpose Programming for Business Applications C. GOTLIEB CALVIN Numerical Weather Prediction A. PHILLIPS NORMAN The Present Status of Automatic Translation of Languages YEHOSHUA BAR-HILLEL Programming Computers to Play Games ARTHLJR L. SAMUEL Machine Recognition of Spoken Words RICHARD FATEHCHAND Binary Arithmetic GEORGEW. REITWIESNER Volume 2 A Survey of Numerical Methods for Parabolic Differential Equations J I MDOUGLAS, JR. Advances in Orthonormalizing Computation PHILIP J . DAVIS ANV PHILIP RABINOWITZ Microelectronics Using Electron-Beam-Activated Machining Techniques KENNETH R. SHOULDERS Recent Developments in Linear Programming SAULI . GLASS The Theory of Automalta: A Survey ROBERTMCNAUGHTON
Volume 3 The Computation of Satellite Orbit Trajectories SAMUEL D. CONTE Multiprogramming E. F. CODD Recent Developments of Nonlinear Programming PHILIP WOLF€ Alternating Direction Implicit Methods GAKRET BIRKHOFF. RICHARD s. VARGA, A N D DAVIDYOUNG Combined Analog-Digital Techniques in Simulation HAROLDF. SKRAMSTAD information Technology and the Law REEDC . LAWLOR Volume 4 The Formulation of Data Processing Problems for Computers WILLIAM C. MCGEE All-Magnetic Circuit Techniques D A V I DR. BENNlON A N D HEWITTD. CRANE
397
398
CONTENTS OF PREVIOUS VOLUMES
Computer Education HOWARVE. TOMPKINS Digital Fluid Logic Elements H. H. GLAETTLI Multiple Computer Systems WILLIAM A. CURTIN
Volume 5 The Role of Computers in Electron Night Broadcasting JACKMOSHMAN Some Results of Research on Automatic Programming in Eastern Europe WLADYSLAW TURKS1 A Discussion of Artificial Intelligence and Self-organization GORDON PASK Automatic Optical Design ORESTES N. STAVROUDIS Computing Problems and Methods in X-Ray Crystallography CHARLES L. COULTER Digital Computers in Nuclear Reactor Design ELIZABETH CUTHILL An Introduction to Procedure-Oriented Languages HARRYD. HUSKEY
Volume 6 information Retrieval CLAUDE E. WALSTON Speculations Concerning the First Ultraintelligent Machine IRVING JOHNGWD Digital Training Devices CHARLES R. WICKMAN Number Systems and Arithmetic HARVEY L. GARNER Considerations on Man versus Machine for Space Probing P. L. BARGELLINI Data Collection and Reduction for Nuclear Particle Trace Detectors HERBERT GELERNTER
Volume 7 Highly Parallel Information Processing Systems JOHNC. MURTHA Programming Language Processors RUTHM. DAVIS The Man- Machine Combination for Computer-Assisted Copy Editing WAYNEA. DANIELSON Computer-Aided Typesetting WILLIAM R. BOZMAN Programming Languages for Computational Linguistics ARNOLDC. SATTERTHWAIT
CONTENTS OF PREVIOUS VOLUMES
Computer Driven Displays and Their Use in Man -Machine Interaction VAN DAM ANDRIES
Volume 8 Time-shared Computer Systems THOMAS N. PIKE,JR. Formula Manipulation by Computer JEANE. SAMMET Standards for Computers and Information Processing T. B. STEEL,JK. Syntactic Analysis of Natural Language NAOMISAGER Programming Languages and Computers: A Unified Metatheory R. NARASIMHAN Incremental Computation LIONELLO A. LOMBARDI
Volume 9 What Next in Computer Technology W. J. POPPELBAUM Advances in Simulation JOHNMCLEOD Symbol Manipulation Languages PAUL w . ABKAHAMS Legal Information Retrieval AVIEZRI S. FKAENKEL Large-Scale Integration -An Appraisal L. M. SPANDORFEK Aerospace Computers A. S. BUCHMAN The Distributed Processor Organization L. J. KOCZELA
Volume 10 Humanism, Technology. and Language CHARLES DECARLO Three Computer Cultures: Computer Technology, Computer Mathematics, and Computer Science P E m WECNEK Mathematics in 1984-The Impact of Computers BRYANTHWAITES Computing from the Communication Point of View E. E. DAVID,JK. Computer- Man Communication: Using Graphics in the Instructional Process FREDEKICK P. BROOKS,J K . Computers and Publishing: Writing, Editing, and Printing V A N DAMAND DAVID E. RICE ANVKIES A Unified Approach to Pattern Analysis ULF GRENANDER
399
400
CONTENTS OF PREVIOUS VOLUMES
Use of Computers in Biomedical Pattern Recognition ROBERTS. LEDLEY Numerical Methods of Stress Analysis WILLIAM PRAGER Spline Approximation and Computer-Aided Design J. H. AHLBERG Logic per Track Devices D. L. SLOTNICK
Volume 11 Automatic Translation of Languages Since 1960: A Linguist's View HARRYH. JOSSELSON Classification, Relevance, and Information Retrieval D. M. JACKSON Approaches to the Machine Recognition of Conversational Speech KLAUSW. OTTEN Man-Machine Interaction Using Speech DAVID R. HILL Balanced Magnetic Circuits for Logic and Memory Devices AND E. E. NEWHALL R. B. KIEBURTZ Command and Control: Technology and Social Impact ANTHONY DEBONS
Volume 12 Information Security in a Multi-User Computer Environment JAMESP. ANDERSON Managers, Deterministic Models, and Computers G. M. FERRERO DIROCCAFERRERA Uses of the Computer in Music Composition and Research HARRYB. LINCOLN File Organization Techniques DAVIDC. ROBERTS Systems Programming Languages R. D. BERGERON. J. D. GANNON, D. P. SHECHTER, F. W. TOMPA, AND A. VAN DAM Parametric and Nonparametric Recognition by Computer: An Application to Leukocyte Image Processing JUDITHM. S. PREWITT
Volume 13 Programmed Control of Asynchronous Program Interrupts RICHARD L. WEXELBLAT Poetry Generation and Analysis JAMES JOYCE Mapping and Computers PATRICIA FULTON Practical Natural Language Processing: The REL System as Prototype FREDERICK B. THOMPSON A N D BOZENA HENISZTHOMPSON Artificial Intelligence-The Past Decade B. CHANDRASEKARAN
CONTENTS OF PREVIOUS VOLUMES
401
Volume 14 On the Structure of Feasible Computations A N D J. SIMON J. HARTMANIS A Look at Programming and Programming Systems J R . A N U JUDYA. TOWNELY T . E. CHEATHAM, Parsing of General Context-Free Languages A. HARRISON SUSAN L. GRAHAM A N D MICHAEL Statistical Processors W. J. POPPELBAUM Information Secure Systems DAVIDK. HSIAOA N D RICHARD I . BAUM
Volume 15 Approaches to Automatic Programming ALANW. BIERMANN The Algorithm Selection Problem JOHN R. RICE Parallel Processing of Ordinary Programs DAVIDJ . KUCK The Computational Study of Language Acquisition LARRYH. REEKER The Wide World of Computer-Based Education DONALDBITZER
Volume 16 3-D Computer Animation A. CSURI CHARLES Automatic Generation of Computer Programs NOAHs. PRYWES Perspectives in Clinical Computing KEVINC. O'KANEAND EDWARDA. HALUSKA The Design and Development of Resource-Sharing Services in Computer Communication Networks: A Survey SANDRA A. M A M K A K Privacy Protection in Information Systems REINT U R N
Volume 17 Semantics and Quantification in Natural Language Question Answering W. A. Wooos Natural Language Information Formatting: The Automatic Conversion of Texts to a Structured Data Base NAOMISAGER Distributed Loop Computer Networks MINGT. LIU Magnetic Bubble Memory and Logic TEN CHI CHEW ANI) HSL CHANCi Computers and the Public's Right of Access to Government Information ALAN F. WESTIN
402
CONTENTS OF PREVIOUS VOLUMES
Volume 18 Image Processing and Recognition AZRIELROSENFELD Recent Progress in Computer Chess MONROEM. NEWBORN Advances in Software Science M. H. HALSTEAD Current Trends in Computer-Assisted Instruction PATRICK SUPPES Software in the Soviet Union: Progress and Problems S. E. GOODMAN
Volume 19 Data Base Computers DAVID K. HSIAO The Structure of Parallel Algorithms H. T. KUNG Clustering Methodologies in Exploratory Data Analysis RICHARD DUBESA N D A. K. JAIN Numerical Software: Science or Alchemy? C. W. GEAR Computing as Social Action: The Social Dynamics of Computing in Complex Organizations Roe KLINGA N D WALTSCACCHI
Volume 20 Management Information Systems: Evolution and Status GARY W. DICKSON Real-Time Distributed Computer Systems W. R. FRANTA, E. DOUGLAS JENSEN,R. Y. KAIN,A N D GEORGED. MARSHALL Architecture and Strategies for Local Networks: Examples and Important Systems K. J. THURBER Vector Computer Architecture and Processing Techniques KAI HWANG,SHUN-PIAO Su, A N D LIONELM. NI An Overview of High-Level Languages JEANE. SAMMET
Volume 21 The Web of Computing: Computer Technology as Social Organization ROBKLINGA N D WALTSCACCHI Computer Design and Description Languages S U B R A T A DASGUPTA Microcomputers: Applications. Problems. and Promise ROBERTC. GAMMILL Query Optimization in Distributed Data Base Systems GIOVANNI MARIASACCO A N D S. BINGYAO Computers in the World of Chemistry PETERLYKOS
CONTENTS OF PREVIOUS VOLUMES
403
Library Automation Systems and Networks JAMESE. RUSH
Volume 22 Legal Protection of Software: A Survey MICHAEL c . GEMlCiNANl Algorithms for Public Key Cryptosystems: Theory and Applications S. LAKSHMIVAKAHAN Software Engineering Environments I . WASSEKMAN ANTHONY Principles of Rule-Based Expert Systems A N D RICHAKD 0. DUDA BRUCEG. BUCHANAN Conceptual Representation of Medical Knowledge for Diagnosis by Computer: MDX and Related Systems B. CHANDRASEKARAN AND SANJAY MITTAL Specification and Implementation of Abstract Data Types ALFST. BEKZ.rlSS AN11 SATISHTHATTE
Volume 23 Supercomputers and VLSI: The Effect of Large-Scale Integration on Computer Architecture LAWKENCESNYDEK Information and Computation J. F. TKAUB A N D H. WOZNIAKOWSKI The Mass Impact of Videogame Technology THOMAS A. DEFANTI Developments in Decision Support Systems ROBEKT H . BONC'ZEK, CLYUEw. HOLSAPPLE. A N D ANDREW B. WHINSTON Digital Control Systems P E T E K DOKATO A N U DANIEL PETEKSEN International Developments in Information Privacy G. K. G ~ J P T A Parallel Sorting Algorithms S. LAKSHMIVAHAHAN. SUDAKSHAN K. DHALL.A N D LESLIE L. MILLER
Volume 24 Software Effort Estimation and Productivity S. D. CONTE,H. E. DUNSMORE. A N D V. Y. SHEN Theoretical Issues Concerning Protection in Operating Systems MICHAEL A. HARKISON Developments in Firmware Engineering SUBKATA DASGUPTA AND BRUCED. SHKIVER The Logic of Learning: A Basis for Pattern Recognition and for Improvement of Performance RANAN B. BANlXJl The Current State of Language Data Processing PAULL. G A K V I N Advances in Information Retrieval: Where Is That / # * & ( ( I $ Record? DONALI) H. KKAFT The Development of Computer Science Education WILLIAM F. ATC'HISON
404
CONTENTS OF PREVIOUS VOLUMES
Volume 25 Accessing Knowledge through Natural Language NICK CERCONE AND GORDONMCCALLA Design Analysis and Performance Evaluation Methodologies for Database Computers STEVEN A. DEMURJIAN. DAVIDK. HSIAO,AND PAULA R. STRAWSER Partitioning of MassiveiReal-Time Programs for Parallel Processing I. LEE, N. PRYWES, AND B. SZYMANSKI Computers in High-Energy Physics MICHAEL METCALF Social Dimensions of Office Automation ABBEMOWSHOWITZ Volume 26 The Explicit Support of Human Reasoning in Decision Support Systems AMITAVA DUTTA Unary Processing W. J. POPPELBAUM, A. DOLLAS, J. B. GLICKMAN, AND C. O’TOOLE Parallel Algorithms for Some Computational Problems ABHA MOITRA A N D s. SITHARAMA IYENGAR Multistage Interconnection Networks for Multiprocessor Systems S. C. KOTHARI Fault-Tolerant Computing WINGN. TOY Techniques and Issues in Testing and Validation of VLSI Systems H. K. RECHBATI Software Testing and Verification LEEJ. WHITE Issues in the Development of Large, Distributed, and Reliable Software c . v. RAMAMOORTHY, ATULPRAKASH, V l J A Y GARG,TSUNEO YAMAURA, ANUPAM BHIDE
AND
Volume 27 Military Information Processing JAMESSTARKDRAPER Multidimensional Data Structures: Review and Outlook S. SITHARAMA IYENGAR. R. L. KASHYAP, v. K. VAISHNAVI, A N D N. s. v. RAO Distributed Data Allocation Strategies AND ARUNA RAO ALANR. HEVNER A Reference Model for Mass Storage Systems W. MILLER STEPHEN Computers in the Health Sciences KEVINC. O’KANE Computer Vision AZRIEL ROSENFELD Supercomputer Performance: The Theory, Practice, and Results OLAFM. LUBECK Computer Science and Information Technology in the People’s Republic of China: The Emergence of Connectivity JOHN H. MAIER
CONTENTS OF PREVIOUS VOLUMES
405
Volume 28 The Structure of Design Processes SUBRATA DASGUPTA Fuzzy Sets and Their Applications to Artificial Intelligence ABRAHAM KANDEL AND MORDECHAY SCHNEIDER Parallel Architectures for Database Systems M. H. EICH,AND B. SHIRAZI A. R. HURSON,L. L. MILLER,S. H. PAKZAD, Optical and Optoelectronic Computing AND H. JOHNCAULFIELD MIR MOITABA MIRSALEHI, MUSTAFA A. G. ABUSHAGUR, Management Intelligence Systems MANFRED KOCHEN
Volume 29 Models of Multilevel Computer Security JONATHAN K. MILLEN Evaluation, Description and Invention: Paradigms for Human-Computer Interaction JOHNM. CARROLL Protocol Engineering MINGT. LIU Computer Chess: Ten Years of Significant Progress MONROENEWBORN Soviet Computing in the 1980s RICHARD W. JUDYAND ROBERTW. CLOUGH
Volume 30 Specialized Parallel Architectures for Textual Databases A. R. HURSON,L. L. MILLER,S . H. PAKZAD,AND JIA-BINGCHENG Database Design and Performance MARKL. GILLENSON Software Reliability ANTHONY IANNINO AND JOHND. MUSA Cryptography Based Data Security I. DAVIDA AND Yvo DESMEDT GEORGE Soviet Computing in the 1980s: A Survey of the Software and Its Applications RICHARD W. JUDYAND ROBERTW.CLOUGH
This Page Intentionally Left Blank