Advances in COMPUTERS VOLUME 6
Contributor6 to This Volume
Advances in
COMPUTERS editad by
FRANZ L. ALT National Bureau of Standards Washington, D.C.
MORRIS RUBINOFF University of Pennsylvania and Pennsylvania Research Associates Philadelphia, Pennsylvania
asuociate editors A. D. BOOTH R. E. MEAOHER
VOLUME 6
Academic Press. New York. London4965
ACADEMIC PRESS INC. 111 Fifth Avenue, New York,New York 10003
UdedEhg&mlQ~~~h6tiby ACADEXfC PRESS INC. (LONDON)LTD. Berkley Bqusrs Houme, London W.1
Contrlbutora to Volume 6
P. L. BMWELLINI,The Moore 8 c h l
of Electrical Engineering, University of Pennsylvania, Philadelphia, Pennsylvania H ~ ~ VL. E G~RNER, Y Department of E W M Engineering, The University of Michigan, Ann Arbor, Michigan HIURBERT GELERNTER,I B M , Wvatson Research Center, Yorktown Height%, New York IRVING JOHNGOOD, Trinity College, Oxford, England and Atlas Computer Laboratory, Berbhire, England CLAUDEE.WAL~TON, IBM Corporation, Beth&, MarylanrE CHARLESR. WIOKMAN, 0rdmm.x Center, Honeywell Inmymat&, West couina, California
This Page Intentionally Left Blank
The present volume continues to reflect the editors’ conviction, manifested in the earlier volumes of this serial publication, that application of digital computers to areas akin to human thinking-machine-aided cognition, to borrow a term from another environmentis one of the most eotive frontier8 of development in our time. Articles in this volume deal with two such areas: information retrieval and what is called “ultraintelligent machines.” The latter article represents a new departure for this serial publication in that it contains not information but opinions, not a survey of the past but a look a t the future. One article in an earlier volume, on microelectronics, had some of these features; and indeed, the physical systems envisioned in that article give a degree of plausibility to the speculations on ultraintelligence presented here. Together with the discussion of self-organizing systems in the previous volume, and with those of game playing, speech recognition, and language translation by computer presented in earlier volumea, these articles give a panorama of some of the most challenging potentialities of computers. With the two articles on digital training devices and on man versus computer in space missions, Advance8 in Computers enters the field of real-time control for the first time. It should be quickly pointed out that the term “digital training devices” refers not to claasroom teaching machines but to simulators which assist in the training of pilots, ship crews, etc., by presenting a replica of the physical environment for which the trainee must be prepared, together with changes in the environment c a d by the trainee’s own actions. The discussion of the use of men and of machines in space missions points out-in general terms only, for obvious masons-the factors pro and con in the question whether human observers traveling in space vehicles could not be dispensed with and replaced by sophisticated instruments. An artiole on number systems and arithmetic in digital computers continues, broadens, and updates the survey of the same topic in Volume 1. Finally, computer applications to scientific problems, always a subject of interest to “Advances”, are represented here by the paper on particle trace detectors, notably bubble chambers end spark chambers, which have recently acquired a leading position aa a tool for research in particle physics, and ale0 aa a large and sophisticated application of computers. hAN2
8eptember 1965
L. ALT
MORBIS RUBINOPF vl I
This Page Intentionally Left Blank
Contents
. . . . . . . . . . . . . . . . . . . . . . . . . . . ,
VOLUME 6 P~EFACJE . . . . . . , CONTBNTS OF Pmnous V O L ~ S
cO”RIBUT0RS TO
V
vii xii
Information Retrieval CLAUDE E. WALSTON
.
. . . . , . . . . . . . . 6. Automatic Aids to Retrieval and Dissemimtion . . 6. Automatic Fact Retrieval . . . . . . . . 7. Conolusion . . . . . . . . . . . . . References . . . . . . . . . . . . . 1. Introduction . . . . . . . . . 2. The Information-Storage and -Retrieval Cyole 3. Tspes of Retrieval . . . . . . 4. Automatic Document Indexing and Clrtssification
.
1 4 6
,
.
. . . .
.
8 18
22 26 28
Speculations Concerningthe First Ultraintelligent Machine IRVING JOHN GOOD
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1. Introduotion , . . . . . . . . 2. Ultr&ntel.ligent MaohinesandtheirValue . . 3. Communioation &B Regeneration . . . . . . . 4. Some Repmentationsof “Meening” and Their Relevance to Intelligent Machines . . . . . . . . . . 6. R e d and Information Retrieval . . . . . . . . 6. Cell Aseemblies and Subassemblies. . . . . . . 7. An h m b l y Theory of Meaning . . . . 8. TheEoonomyofManing . . . . . . 9. Conclueions . . . . . . . . . LO. Appendix: Informational and Causal Interactions . Referenoes . . . .
. .
.
31 33 37 40
43 64 74 77 78 80 83
Digital Training Devices CHARLES R. WICKMAN
. . . . . . . . . . . . . . . . . . . . . . . . .
1. Introduotion 2. TrainingRequirements
89 90
Ix
CONTENTS
.
3 Training Simulators Using General purpoSe Digital Com-
. .
puters
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
4 Propmming Considerations 6 Non-TrainingUsesofaTrainingSimula~r 6 Future Training Devioe Requirements
.
. . . . . .
101 122 126 128
Number Systems and Arithmetic HARVEY L GARNER
. Introduotion . . . . . . . . . . . . . . . Classifioation and Chmaoterization of Number Systems. .
1 2 3 4 6 6 7
. Addition . . . . . . Redundant Number Systems . Multiplioation . . . . . Division . . . . . . . Residue Number Systems . 8. Digit by Digit Computation Referenoes . . . . .
. . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . . . . . .
131 132 143 167 163 168 177 182 191
Considerations on Man versus Machlnes for Space Probing P L BARGELLlNl
.
. Introduotion . . . . . . . . . . . . . .
1 2 3 4 6
. Humanand~ohineIntelligenoe . . . . . . . . . Problem DefhitioninEngineering Terms . . . . . . Summary of Information Handling by Man and Maohines .
. Information Capaoity of the Human Channel; Aooustio and VisualStimuli
. . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . 8. Comparison of the Bit Rate in Manned and Meohadzed Systems . . . . . . . . . . . . . . . . 9. Considerations on the Communioation Linlrs . . . . 10. Possible Solutions and Reoommendations . . . . . 11. Conolusion . . . . . . . . . . . . . . 6 Somesthetio Communioation 7 Data Prooessing by Maohines
Bibliography
. . . . . . . . . . . . . .
196 197 204 206 208 216 218 221 222 224 226 226
Data Collection and Reduction for Nuclear Particle Trace Detectors
HERBERT GELERNTER
.
. . . . . . . . . . . . . .
1 Introduotion 2 Bubble Chambers 3 The Data Reduotion Problem for Bubble Chambers
. .
X
. . . . . . . . . . . .
. .
229 231 236
CONTENTS
. Advances in Automatic Data Analysis for Bubble Chambers
4 6 0 7
. Spark Chambers . . . . . . . . . The Data Problem for Spark Chambers . . . Filmless Operationof Sparkchambers . . 8. Some Other Particle Trace Detectors . . 9. On-Line Data Processing in Physics . . . Bibliography . . . . . . . . . . Author Index . . . . . . . . . . . Subjeot Index .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . . . . . . .
246 270 277 279 290 293 294
297
302
xi
Contento of Volumo 1
General-Purpose Programming for Businege Applioations CALVIN
c. a0-B
Numerid Weather M o t i o n NORMANA. PHlLLIPs The Present Status of Automatio !l!ranahtion of Languagee YmOSWA BAR-Programming Computere to Play Games ARTHUR
L. SAMUIDL
Maohine Reoognition of Spoken Words RIOEARDFATIDHOUND
Binary Arithmetio Gn~oaalpW. R I D ~ S N I D R
Contenta of Volumo 2
A Survey of Numerioal Methods for Parabolio Differential Equationa JIMDOUCILISB, JR. Advanoee in Orthonormalizing C0mputa;tion PHILIPJ. DAVISAND PHILIP Rmmowrm
MiomleotronimUsing Eleotmn-Beam-Aotivatedlldaohining Teohniques ICBINNIDTE R. SHOULDBL~S Reoent Development in Linear Progrmming SAUL I. GA8S
The Theory of Automata, a Survey ROB~RT MONAUGHTON XI I
ConGnta of Volume 8
The Computation of Satellite Orbit Trajeotoriea SAMUEL D. C O ~ E Multiprogramming E. F. CODD Reoent Developments in Nonlinear Propmming WOLFID
Alternating Direotion Implioit Methods GARBIUTPB I R K H ORIUHARD ~, 8. VARGA,AND DAVIDYOTJNQ Combined Analog-Digital Teohniques in Simulation Haru>LD K. 89aaaas~lu, Information Teohnology and the Law RERID c. LAWLOR
Contonta of Volume 4
The Formulation of Data Prooeesing Problems for Computers
WILLIAM C. MOGEE A l l - D h g ~ ~cirouit t i ~ TeohniqUw DAVIDR. BENITION AND HEWP~T D. CRANE Computer Eduoation HOWARD E. TOMPKINS Digital Fluid Logio Elemente
H. H. GLAIWTLI Multiple Computer Systems
WILLIAM A. CrJaTW xiii
Content. of Volume I
The Role of Computers in Eleotion Night Broadoasting JAUKMOBEMAN Some bulb of Reaearoh on Automatio Programming in Eastern Europe WLaDYSwLW
ToasgI
A Disoussion of Artifioial InteUigenoe and Self-Organization GORDONPASE Automatio Optioal Design ORHSTHSN. STAVRQUDIS Computing Problems and Methods in X-Ray Crystallography -8 L. COULTm Digital Computers in Nuolear Reaotor Design ELIZABHTH C u m An Introduotion to Prooedure-Oriented Languages
D. HWSKBIY
xlv
Advances in COMPUTERS VOLUME 6
This Page Intentionally Left Blank
Information Retrieval CLAUDE E. WALSTON ISM Corporation Lthesda, hkwland
.
1. Introduotion 1.1 Soope of the Diaoussion 1.2 The Informetion Problem , 1.3 The Weer and Hie Neede 2. The Information-Stomge and -Retrieval Cyole 3. Tspeaof Retrieval * 4. Autometio Dooument Indexing and Cleeeification 4.1 Introduot~ryCommenta 4.2 Autometio Indexing 4.3 Automatio classification 4.4 Full Text Indexing 6. Autometi0 Aida to Retrieval and Dieseminetion 6. Automtio Faat Retrieval 7. Conoluaion
.
.
. .
References
. .
.
. .
.
.
1 1 1 3 4 6 8 8
10 14 17 18 22
28 28
1. Introduction 1.1 Scope of the Discussion
Information retrieval is a broad and, as yet, loosely defined subject. Information retrieval, as the term is generally used, implies the selective recall of stored knowledge. It would not be possible in this discussion to oonsider information retrieval in all its ramifications. Rather, we shall examine from a historical point of view the role of computers and automation in solving retrieval problems. At the same time we have tried to prepare a selected set of references, which will provide a eatidaatory entree for the reader interested in pursuing the subject
further. 1.2 The Information Problem
A peat deal of attention has been devoted in both the popular press and the technical press to the problems generated by the “information explosion,” as it has been popularly identified by some. In addition, government committeee have explored the oost and wasted effort 1
CLAUDE E. WALSTON
resulting from the duplication of effort engendered by failures in the information-dissemination and retrieval process. The impression oreated by this publicity is that the problem is a fairly recent one, whereas it proves, upon closer examination, to have existed for a surprisingly long time. De Sola Price [I21 traces the development of this problem m far aa the scientific community is ooncerned. The present information problem might be mid to have had its genesis in the invention of the printing press by Gutenberg in the mid-fifteenth century. However true that may be, its growth to sizable proportions wm assured through the device of the learned paper and the invention of the scientific journal in the seventeenth century. The oldest surviving soientific journal is the philosqhbt? Traneactim of the Royal 8miety of London, first published in 1666. The initial growth of scientific journals was slow and irregular;by 1760 the number being published waa only ten. During the next fifty years, however, the growth pattern began to change; by 1800, the number of published journals had reached a hundred. By 1860 a thousand were being published and by 1900 the number was ten thousand. This rapid growth is continuing and today we are not far from the hundred thousand mark. As early m 1830, when the number of journals being published had reached the level of three hundred a year, scientists were already in trouble; it was impossible even then to keep abreast of all the work being reported in scientifio journals. The solution to this dilemma was the invention of the abstract journal. This in turn hm followed a growth pattern similar to that of the scientific journal. Today there are approximately three hundred abstract journals being published, and it has been suggested that perhaps the next step should be the creation of an abstract journal that abstracts the abstract journals! It is obvious then that the “information problem” is not a new one and that attempts to find solutions to it have been made over the past hundred years. That greater suocess has not been attained can be attributed to a number of factors. One factor of course is the complexity of the problem ooncerned with the communication of products of the human intelleot between individuals and groups of individuals. Since it is so oomplex, the development of a theoretical foundation desoribing the prooess has been very slow. This is true even if we narrow our soope of interest and concentrate on the problems of storing and retrieving printed information. One of the difficulties in developing a theoretioal foundation has been a laok of both experimental data and the tools with which to collect and prooess the data. The advent of the high-speed digital computer has provided the tool necessary to let experimentation proceed, and appears to have provided one of the o r i t i d elements in the solution of the information-retrieval problem. 2
INFORMATION RETRIEVAL
A factor contributing to the difficulty in solving the scientific information problem has been the growth of scientific activity in the United States as a result of the large-scale support of research and development activities by various agencies of the federal government. Congress has taken a close look at various facets of these activities and has been particularly concerned about waste and duplication resulting from poor dissemination of technical information or from an inability to retrieve information about technical programs having explored the same areas or solved the same or similar problems. In particular, the Committee on Government Operations of the United States Senate and its Subcommittee on Reorganization and International Organizations, chaired by Senator Hubert Humphrey, has done a thorough job of examining both federal and non-federal programs for information processing and the problems of coordinating information among federal agencies. The reports of these groups [41, 421 contain much information on the information problem and the programs initiated in an attempt to solve this problem. The House of Representatives, concerned with another aspect of the information problem, has created an Ad Hoc Subcommittee on Research Data Processing and Information Retrieval, chaired by Roman Pucinski, that has been conducting a series of hearings [40] on the need to create a national information center to solve the scientific information problem. Finally, The President’s Science Advisory Committee [30]has explored the responsibilities of the technical community and the government in the transfer of information. The findings and recommendations of this study provide additional information on the nature of the scientific information problem. This discussion has briefly indicated some of the factors that have led to an increased interest in information retrieval. Some of these factors and the statistics quoted here have been suggested as justification for launching large-scale programs for automating information retrieval. Bar-Hillel [3] challenges this approach and questions whether information retrieval is in fact approaching a crisis, as some have maintained. He suggests that specialization has been the defensive mechanism that has evolved to combat the geometrical rate of increase in scientific and technological publications. Green [I71 also questions whether the information explosion is real, and suggests that our ourrent information-retrieval resources are growing and adapting themselves to solve the problem. 1.3 The User and His Needs
Any consideration of information retrieval must reflect the particular requirements of the user of that information. This seems quite obvious, 3
CLAUDE E. WALSTON
but too often in the past it has either been ignored or only oureorily examined by those responsible for implementing information-retrieval systems. An examination of the information needs of potential users of a retrieval system reveals a number of information problems. Tukey [39] has suggested the following olassifioation of user needs: (1) I n f o m a t h delivery. The automatio delivery of information in whioh the user has to take little or no initiative, e.g. monthly administrative reports automatioally delivered to the appropriate users. ( 2 ) Injomathn r e t r i e d . The delivery of information by a librarian or information speoialist in response to a very speoifiocllly stated request. (3) Informatiolt primit. The searoh by the prospective user for information to be utilized for hypothesis formulation or hypothesis testing. Quite often the user has only a vague idea of what he needs. (4) Infowmtirm brm8ing. This, LLB its name implies, is a general seeking of information, sometimes with an area of interest in mind, quite often with none. These needs, whioh range from the speoifio to the very vague in terms of the definition of the information required by the user, may all exist as requirements to be satisfied by a given information-retrieval system design, or in some oases only one oategory may have to be satisfied. In addition, a given user’s needs may ohange from oategory to oategory during the oourse of a year as his work progresses through various phases, or as the nature of his assignments ohanges. The variety of users’ needs to be satisfied, ooupled with the wide range of input data that may have to be prooessed, oomplioates the system design and has made neoessary the present tailoring of eaoh system to fit its partioular applioation, although eaoh system will oonsist of funotions and will utilize teohniques oommon to all the others, 2. The Informationdtorage and -Retrieval Cycle
It is not our intent in this discussion to oonsider the fundamental prinoiples of the malysis and design of information-storage and -retrieval systems. This is a subjeot worthy of separate disoussion in its own right. However, in order that the si@oanoe of the teohniques oovered in the later seotions of this artiole ocln be more olearly understood, we should take a brief look at those funotions that must be performed in a retrieval system. Any information-retrieval aystem, whether it is entirely manual or contains some degree of automation, must exeoute the same funotions. The operation of the system is 4
INFORMATION RETRIEVAL
cyclic and these functionsfall into one of two cycles: the input or storage cycle, and the output or retrieval cycle. I n the ideal system these two cycles me completely independent; that is, the user should be able to retrieve the information needed to satisfy his query regardless of the manner in which the input information was identified, tagged, and organized for storage. In the less-than-ideal current systems design octpability, this is not true, and the user is very much a t the mercy of the indexer who prepared the input for storage. The input cycle is composed of the following functions: (1) Information collection and screening (2) Information conversion
(3) Indexing (4) Storage
The output cycle consists o f (1) Query formulation and transformation (2) Search
(3) Retrieval (4) Data p d g (5) Output and dissemination.
These functions, while existing in any system, will vary in importance or significance from application to application, depending upon the nature of the information to be handled and the nature of the equipment, if any, that may be used. Information collection and screening are the processes of identifying the information that should be stored in the system, determining where it is, physically acquiring it, and selectively evaluating it to determine whether it should be stored. In some c w s the collection of information to be processed in a retrieval system may be a monumental task. I n other instances so great a volume of information may be available that its evaluation and screening may be a difficult and time-consuming activity. After the information has been screened, it may not be in a form that can be handled in the system. It may have to be translated. It may need to be decoded. If the system contains a computer, the information may have to be converted to a machine readable form, i.e., punched cards, punched paper tape, magnetic tape, etc., and this conversion may be a major and costly problem. The most crucial part of the input cycle is the indexing function. The success and effectiveness of the total system depend upon how well this function is carried out. By indexing we mean the process of determining the nature of the input, and tagging it in such a f d i o n that the nature of the information it contajns is suocinctly identified for further prooesaing in the 5
CLAUDE E. WALSTON
system. If the indexing is poorly done we run the risk of losing the information once it is stored, by virtue of the fact that we cannot identify it as being pertinent to a user’s query. The indexing task is discussed in detail in Section 4. It should be apparent that, depending upon the particular application, the indexing function might precede the conversion function, Once the indexing function is complete, then the original input item, the bibliographic data that provide background and control data about it (source, date, accession number, etc.), and the index tags produced by indexing must all be properly organized and stored in such a manner that they can be utilized in the retrieval process. The output cycle is to a certain extent an inverse of the input cycle. This cycle begins with the formulation of a query by a user of the system, who specifies the nature of the information he desires and any constraints or restrictions to be imposed on the system to assist in narrowing the scope of the search. The query, once it is formulated by the user, must usually undergo a certain amount of manipulation and transformation to put it in a form that contains the elements and a structure similar to those resulting from the indexing function. The search operation is an attempt to find a match between the transformed query and some subset of the index tags accumulated in the system in order to identify those items of information that answer the original query. I n the ided case, an exact matoh always occurs between the query and the index items; in real life this happens very infrequently, and the usual problem is what decision rules to follow when there is only a partial match. As a result of this process, the items to be retrieved are identified along with their location in the store, and they can be retrieved either in their entirety or in some reduced representation (e.g., lists of titles or abstracts). Again, depending on the nature of the information-retrieval system, the retrieval process may be as simple as using a locator number to manually retrieve a document from a filing cabinet, or as complex as using a computer to retrieve abstracts stored on magnetic tape and to print them out at a remote location for the user’s perusal. The retrieved information may need to be processed before it is delivered to the user; e.g., he may be interested in counts or averages, or may want statistical analyses performed on the retrieved information. Finally, after any processing, the output function assures that the retrieved information is presented to the user in a form in which he can use it.
3. Types of Retrieval Up to this point we have been discussing information retrieval in general terms and have m d e no attempt to be specific about the nature 6
INFORMATION RETRIEVAL
of the output that may be delivered by the system. One method used to distinguish differences between retrieval systems has been to cIassify them with respect to their output. Bourne [S] identifies four types of system :reference, document, fact, and information retrieval. Referenceretrieval systems provide references to documents containing the information sought in response to a user’s query. Document-retrieval systems go one step further and provide complete copies of the documents themselves in response to a query. Fact-retrieval systems yield specific information (e.g., physical properties of materials, number and capacity of tantalum capacitors that failed last month) in response to a query. Information-retrieval systems, which are the most complex since they must deal with concepts, are those able to provide direct answers (not references) to such questions as, “What is the most recent theory on the role of nucleation in cavitation?” Despite this useful distinction among retrieval systems, information retrieval has become firmly established through common usage as the generic term that includes reference, document, and fact retrieval, and the reader will discover that he must learn from the context which type of system is being discussed. A great deal of work has been done on the development of factretrieval systems, largely as a result of both the pressure and the support coming from the Department of Defense to satisfy urgent requirements for command/control and intelligence applications. Systems designed for these applications must have the ability to handle a variety of data organized into many Merent files and to process a wide range of queries against these files. These systems have come to be known as formatted file systems, since the input data are arranged into various formats for ease of storage and retrieval. A typical formatted file system will consist of four general programs: ( 1 ) An executive control program (monitor)to allow job processing on
a priority basis and to permit interrupts for answering queries or entering new data. (2) A jile modiJication program for the generation of new formatted data files or the modification or restructuring of existing files. (3) A jile maintenance program for file updating and error correction. (4) A jile query program for information selection, processing, and report generation.
The files handled by a formatted file system may be structured from many types of data ranging from small, fixed-length items to large variable-length items containing repetitive information. The system is designed to allow the user flexibility in handling his data. He is not limited in the number or variety of files that can be defined and handled 7
CLAUDE E. WALSTON
by the system, nor is he oonatrained to follow the format of an existing file in the event of a ohange in the applioation for which the file was intended, or of a ohange in the input data to that file. The system query programs provide the user with a flexible data seleotion and report generation capability. A logical query language is used to select the desired records from the files. Prooessing routines are provided to reorder (sort) and summarize the data (totals, subtotals, averages) after seleotion. A report generation oapability allows the oreation of a large variety of formal reports from any single file. Some formatted file systems now in operation provide for multiple file queries, i.e., the retrieval and correlation of data from several files without manual intervention. The system also allows for the incorporation of subroutines to perform any special purpose functions that may be neoeesary. A detailed disoussion of formatted file systems is worthy of separate tmtment and is beyond the scope of this article, but we have mentioned them here beoause they represent an important segment of the work on information retrieval. Postley and Buetell[29] describe one such formatted file system that they have developed, possessing some of the oharaoteristios desoribed above. In general, the faots entered into a fact-retrieval system have to be seleoted and organized manually before they oan be entered into the system. The problems of so organizing the data that it can be readily stored and retrieved have presented mmy ohallengesto system designers, m d have given impetus to the development of new teohniques, such as those oovered by Cheydleur in his article “Memory Allocation Methods for Aesooiative Storage and Retrieval” (this volume). A large part of the research m d development aotivity in the information-retrieval field has oentered on problems associated with referenoe- and document-retrieval systems. The emphrtsis has been on the development of automatio indexing and clmsifioation techniques, and in Seotion 4 we review the efforts in that partioular area. Computers have been applied in other faoeta of the retrieval problem as well, whioh are examined in Seotion 5. Automatic faot and information retrieval represents a muoh more diffioultproblem and, although not as much effort hm been directed toward the development of these systems, we briefly review some of the current work in Section 6. 4. Automatic Document Indexing and Clruriflcation 4.1 Introductory Comments
The most oritioal part of the storage and retrieval operation is the indexing funotion. The indexing of an item of information prior to its 8
INFORMATION RETRIEVAL
inclusion in the information-retrieval system establishes the attributes to be used in identifying the particular item for the remainder of its life in the system. A great deal of thought and effort has been devoted to the prooess of indexing and to the development of techniques and prooedures for uniquely identifying, selecting, and describing those attributes of a given item to enable unambiguous retrieval at a later date in response to a request for the information it contains. Space does not permit a detailed description of the vasious indexing methods that have been implemented or proposed. For a discussion of indexing problems in general and a review of the major indexing systems, Vickery [43] is a good source of information, although the beginner may find his presentations fairly difficult reading. Bourne [8] is also a good source of information, easier to read than Vickery but not as detailed, although covering a broader scope and providing an excellent set of references for pursuit of the subject in more depth. Becker and Hayes [5] give a good discussion of indexing from the librarian’s or information specialist’s point of view, and a preliminary discussion of theoretical considerations underlying information-storage and -retrieval system design. Jonker [I81 envisions all indexing systems as constituting a continuum, the descriptive continuum, aa he has entitled it. A t one end of the spectrum are hierarchical classification systems, such as the Dewey Decimal Classification. In the middle of the spectrum is subject heltding indexing, while at the other end of the continuum is keyword indexing, or coordinate indexing, as it is also called. The hierarchical classification system assumes that the information contained in the items being indexed can be organized into a tree structure, the so-called “tree of knowledge.” Figure 1 is a hypothetical example of a hierarchical
Prop-jet
Jet
Piston
FIG.1. A m p l e hieramhicd ckification system. classification system. The tree structure exhibits two logical types of relationship; first, the generic relationship of each element to the classes 9
CLAUDE E. WAWTON
above it (e.g., the relationship of jet bombers to the 01- of military airoraft) and, seoond, the ooordinate relationship between subordinated olasses (e.g., relationships among jet bombers, prop-jet bombers, and piston bombers). In a subjeot heading index, the information is organized into a series of oategories, all of equivalent rank and labeled with a desoriptive heading. The familiar yellow pages of the telephone book are an example of a subjeot heading index, Keyword indexing is aohieved by the seleotion of signifiomt or meaningful words or oombinations of worda (inoluding numbers or symbols) oonttlined in the input item. Retrieval is made through the “omrdination” of these keywords to identify dooumente oontaining the desired information. One of the pioneers in the development of ooordinate indexing systems is Mortimer Taube, omtor of the UNJTERM(unit term) indexing system [37, 381, perhaps the best known of the ooordinate indexing systems. The indexing funotion has been in the past and is today essentially a manual operation. It has suEered from the usual clerical errors involved in manual operations. More importantly, it suffers from the problems of inoonsistenoies-those inoonsistenoies that arise from the differenoes in emphasis and interpretation given to the same dooument by two different people, as well as their different interpretation of the indexing instruotiona themselves. Furthermore, an indexer may very well index the same dooument differently if it is indexed at two widely separated points in time. In addition, with the oontinuing inorease in the generation of doouments, books, and other items of information eaoh year, and with a laok of trained indexers as well rts the limited output per indexer per day, there is great pressure to develop automatio methods of indexing to overcome these problems. Until the advent of the digital oomputer, there was no tool available that gave promise of being able to solve the indexing problem. Consequently, the initiation of researoh into teohniques for automatio indexing has been a relatively recent event. 4.2 Automatic Indexing
One of the earlier approaohes to meohanized indexing was that suggested by H.P. Luhn [ZI].Luhn’s approaoh was brtsed upon the premise that the frequenoy of word ooourrenoe in an article furnishes a useful measurement of word signifioance. A plot of the frequency of ooourrenoe of various word types in a given artiole against their rank order (that is, in the order of their frequenoy of occurrence) yields a ourve similar to that shown in Fig. 2. This ourve is the familiar Zipf’s law o w e developed by Zipf [45], who showed that the produot of the frequenoy of use, f, of words in American newspaper English, with rank order r is approximately a oonatant. Zipf’s law has been of great 10
INFORMATION RETRIEVAL
use in information-retrieval work, but has the limitation that it is true only in the center of the frequency range. Luhn further remoned that those words in the region of the highest frequencies, the common words such as “the” and “and,” for example, would constitute noise in the system and should be eliminated by establishing an upper cutoff point. Words of higher frequency would be eliminated either by comparison to a common word list stored in the computer or by establishment of a high frequency cutoff through statistical methods. He also reasoned that those words with a low frequency of occurrence should also be eliminated as not having occurred enough in the document to be significant. These words would be eliminated by establishing a lower cutoff frequency. The words that remain, derived automatically on the basis of their frequency of occurrence, are the significant words in the article. These significant words are the keywords that constitute the index to the article from which they were extracted. This approach with its simple rules for implementation lends itself quite readily to implementation on a digital computer and is usually the approach selected for automatic indexing. UI CL
r f
Lo #er cu f, f
lving powerof Significant d
s
Words by rank order ( in order of frequency )
FIU.2. A word frequenoy diegram.
Luhn carried this concept one step further to show how it could be utilized to do automatic abstracting (or extracting, to be more precise). The auto-abstract is formed by ranking each sentence in the article and 11
CLAUDE E. WALSTON
then seleoting those several sentenoes with the highest rank. The oriterion used by Lubn for ranking sentenoea is the relationship of signifioant words within the sentenoe, rather than their distribution over the sentence. Eaoh sentenoe is soanned to establish whether a portion of it is braoketed by significant words. Those portions of sentences braoketed by significant words are oonsidered, if there me no more than five nonaignifioant words between the signifioant ones. The significance faotor is oaloulated by firet establishing a oluster of words by braoketing the signifioant words, then counting the signifioant words in the oluster, and dividing the square of this number by the total number of words in the oluster. If two or more olusters ooour, the one with the highest signifioanoe faotor is taken aa the memure for the sentenoe. Autoabstrctots developed by this procedure, while not having the literary quality of abstraots developed by human abstraotors, nevertheless oonvey a general feeling for the subjeot matter of the doouments from whioh they were extraoted. Baxendae [SJ in an early investigation of soientific literature oompared three methods for automatio indexing. One method followed the approaoh developed by Luhn, desoribed above. High frequency words were deleted by reference to a table oontaining 160 words that inoluded all pronouns, artioles, oonjunotions, oonjunotive adverbs, oopula, and auxiliary words. In the artioles prooessed, deletion of these 160 terms reduoed the volume by approximately 60%. The seoond method tested waa the extraction of index terms on a frequency basis from the topio sentenoes of every paragraph in each artiole. Analysis showed that in the articles used for experiment the topio sentence was the initial sentenoe in the paragraph in 86% of the owes and the final sentenoe in 7 % of the oases. Accordingly, adopted &B a rule for indexing eaoh artiolewaa the simple process of (1)seleot the first and last sentenoe of eaoh paragraph, (2) delete the common words in these sentences, (3) extraot the high fkequenoy words remaining and use these as index tern. The third method examined was the utilization of the phrase as m index unit on the premise that the phrase is likely to refleot the oontent of an artiole more olearly than other simple grammatioal oonstruotions. To simplify maohine identification of phrclses, prepositional phrases were used and the prepositions identified by table look-up. The four words following the preposition were automatically seleoted unless a punotuation mark or a seoond preposition was enoountered. The words seleoted in this manner and ranked on a frequency basis after deletion of oommon words served as keywords for indexing the artiole. A oomparison of the three approaches indicates that they are equally effeotive in seleoting index terms auhmaticady, but the phrase approaoh haa the additional advantage that it provides m a by-product 12
INFORMATION RETRIEVAL
an automatic coordination of index terms, linking terms that have been used together in the original article. Edmundson and Wyllys [14]in 1961 made an excellent survey of those techniques that had been considered for use in automatic indexing and automatic abstracting up to that time. At the time of their survey, the proposed methods for automatic measurement of a wordye significance depended upon the frequency of occurrence of the word within the document being analyzed. Edmundson and Wyllys advanced the argument that general considerations from information theory suggest that a word’s information should vary inversely with its frequency rather than directly. They argue that the rare, special, or technical word in an article will indicate most strongly the subject of the author’s discussion, By “rare,yyhowever, they mean rare in general usage, not rare in the article itself. Thus, they suggest a new approach to the determination of a word’s significance, namely, comparing the relative frequency f of a word within the article and its relative frequency r in general use (where 0
8 = f - T
f
8 = log -
(4.4)
r
They conclude that either f - r or f / r seems good. They make the further suggestion of using weighted frequencies; that is, a weighted significance 8, calculated for a word according to the formula: 8 =
b, b, b, Sdf, t‘)
(4.5)
where, for a given word w, w ocours in the title = (bt1 ifotherwise b if w ocours in the first paragraph b, = ( 1 otherwise
bl
P
bs
w occurs in the summary or conolusion paragraphs = (b,1 ifotherwise
and where b,,
4, and b, am all greater than
1.
13
CLAUDE E. WALSTON
4.3 Automatic Classification
The work described up to this point has been aimed at automatically selecting index terms for a given item of input. At the same time, efforts have been underway to develop techniques for automatically classifying documents. This work has followed two general paths. The first path assumes the existence of a given classification system and is aimed at providing techniques for automatically assigning a given item to its appropriate place in the classification structure. The second approach makes no a priori assumptions about the existence of a olwsifioation system, and attempts to derive one automatically, along with techniques for automatically classifying items into their proper categories. Maron [24] undertook one of the early studies on automatic clasaification. His approach was a statistical one based on the assumption that the individual words of a document can provide clues that can be utilized to predict the subject category to which the document probably belongs. He tested his approach on 405 abstracts published in the March, June, and September issues of the 1969 I R E Transactions on Electrical Cmputercr. These abstracts were drawn from the current computer literature. Selecting 260 documents, he developed the statistics necessary to conduct the actual experimentation of automatic classification. The documents were classified into 32 subject categories. (Actually the abstracts themselves were used, but we shall oall them documents from here on.) The words used in the documents were analyzed, common words being deleted as well as words of low frequency. Those significant words remaining were examined to determine their distribution across the 32 categories. The words that “peaked” in at least one category were selected as good clue words; 90 different clue words were selected by this process. Using a Bayesian approach, the probability that a document containing clue words W,,W , . . W , belongs to category j is given by
.
P(Cj I W,*
.. W,)
= P ( C j ) .P ( W ,
I Cj). . . P(W, I Cj)/P(W,). . . JYW,)
(4.8)
where P(Cj) is the a priori probability the document will be indexed under category j and P(W, I Cj)is the probability that if a document is indexed under category j i t will oontain the word W,. The classification of a document is then made by noting the clue words it contains, ctllculating the probabilities of its belonging to each category, and @signing it to that oategory with the highest estimated probability. Using the above procedures and the statistics obtained from the first 14
INFORMATION RETRIEVAL
group of 260 documents, Maron attempted to automatically classify them into 32 categories and correctly clmaified 84.6%. Using the statistics from the first group, he then attempted to classify 146 documents in the second group automatically. His results this time were 61.8% correctly clmsified. These results, while not spectacular, were better than chance and indicated that automatic classification was worth pursuing. Borko had been exploring the possibility of using factor analysis not only as a tool for automatically classifying documents into the appropriate categories, but also to derive the categories themselves. Consequently Borko and Bernick [S] decided to test their approach by using the same set of documenta as had Maron in his earlier work. They also used the same set of 90 clue words, even though they recognized that Maron had selected these specific clue words because they were good predictors for his 32 categories and the factor analysis would in most likelihood yield different categories. However, i t was decided to use these clue words since one of the assumptions underlying the factor analysis approach was that it should be relatively independent of the worda chosen if the words are representative. Frequency counts wore made of the number of times each of the 90 keywords was used in each document, and these were developed into a matrix with the index terms along one axis and the documents along the other. Based upon the frequencies in this matrix, correlation coefficients were computed for each of the 90 keywords correlated with each of the other terms. The resulting 90 x 90 correlation matrix was then factor analyzed to reduce it to 8 smaller number of factors. A number of sets of factors were extracted and examined by the investigators; finally 21 orthogonal factore were selected to be utilized as clasaification categories. The original 406 documents were then manually clmaified into the 21 selected categories. Procedures were prepared and the factor loadings, both regular and normalized, were calculated for each of the 90 keywords for each category. Each document to be Classified was then read into the computer where it was determined which of the keywords were present with what frequency of occurrence. A measure, P, was calculated for each category; namely,
P 0 = L , x T,+Lax T,+
...+ L n X T ,
(4.7)
where L, is the normalized factor loading for keyword n in category c, and T,, is the number of occurrences of the keyword in the document being classified. The category for which P has the highest value is the one to which the document is assigned. Using these procedures, Borko proceeded to clmsify automatically the 260 documents in the h t group and correctly claaaified 63.4%, an 15
CLAUDE E. WALSTON
accuracy 21.2% less than Maron’s. On the second group of 146 documents, Borko achieved a score of 48.9% correctly classified, only 2.9% less than Maron’s. Subsequent and more extensive tests [7] have been made on different subject matter to test the stability and reasonableness of the factor analysis approach to automatic classification. These tests showed an accuracy of approximately 60% for the automatic classification of documents when compared to the classifications assigned by human classifiers, However, as Borko points out, human classifiers can agree among themselves only 60% of the time, so these performance figures may in no way indict the automatic classification system for poor performance. Williams [44] has developed a method which, starting with a classification system given a priori, proceeds to classify documents automatically into the given classification structure, using multiple discriminant analysis techniques. His approach has developed from an earlier consideration of the suggestion by Edmundson and Wyllys [ l a ] of comparing word frequencies within a document to their frequencies in general use. Williams uses statistical data about word frequencies and their distributions within each category of the classification system, as well as data about their distributions across all categories, to establish the parameters used in his classification process. He selects for each category of the given classification system a representative set of documents, which have been previously (manually) classified. His program then analyzes each word type in the total set of reference documents. For each word type, four parameters are measured: X,, its mean frequency within each category; W , its variance within each category is,,its mean frequency arcoss all categories; and A , its variance across all categories. Williams argues that the ideal discriminating word should occur regularly in all documents within a category (that is, its variance W is low), but should not occur with the same frequency in documents in other categories (that is, its variance A is high). Those words with a high mean frequency X, and having a variance ratio .P’ = A/W (4.8) whose value is high, are therefore good discriminators and should be chosen as the keywords for classifying new documents. These keywords become the variates in the multiple discriminant analysis, and their mean frequencies and variances are used to calculate the coefficients of the discriminant functions, established to transform the original measure space to a new space, providing greater separation between the centroids of each category. The classification of a new document is accomplished by determining the mean frequency of each keyword that it contains, and then utilizing this information to compute the distance 16
INFORMATION RETRIEVAL
(in a geometric sense) between the new document and the categories of the classification system. This distance measure can then be used to determine the degree of relevance of the document with respect to each category. The document is then assigned to that category for which it hm the highest relevance value. This approach permits the assignment of the document to more than one category, based upon this relevance value, and is also useful for retrieval purposes since it permits the user to specify a threshold below which retrieval should not occur. Williams has been testing his classification technique on a corpus of documents that are abstracts of articles on solid state physics, prepared and classified by the Cambridge Communications Corporation using their classification system. He used 320 documents, 80 from each of four categories (Solid State Devices, Application of Solid State Devices, Solid State Physics, and Metallurgy and Chemistry of Solids) as the control set to develop the statistics. The test set consisting of 474 documents was then classified by Williams with an overall accuracy of 62%. Baker [ I ] has suggested another approach to document classification and retrieval based upon the method of latent class analysis, originally developed for the analysis of sociological studies. Baker suggests a way in which this technique might be utilized in a document-retrieval system, but unfortunately was not able to test his method. 4.4 Full Text Indexing
Up to this point we have been examining automatic indexing and classification techniques that operate by compressing or reducing the input through the selection of keywords representing or portraying the main concepts of the document. Another tested and implemented technique, called “full text indexing” by some of the workers in the field, is an adaptation of techniques that grew out of early work on literary data processing, that is, data processing techniques applied to the requirements of literary analysis. Tasman [36] reported on one such study in which an electronic data processor was used in the automatic analysis and indexing of the Summa Theologica of St. Thomas Aquinas. One analytical tool produced automatically in this study was a concordance, whiph is an alphabetical collection of the individual words used by an author in a given work, and which cites every passage in which each word appears. A concordance is an extremely useful tool in literary analysis and the idea of applying i t as a tool in the retrieval of legal information was exploited by Kehl and Horty [I91at the University of Pittsburgh in the development of an operating legal retrieval system. I n their system, the full text of the legal information, including punctuation conventions and line indication, is prepared on punched 17
CLAUDE E. WALSTON
paper tape. The paper tape is read into the computer, which then creates an index locator for each word in every document, with the exception of common words. Thus, each word is accompanied by a descriptor that indicates the document number in which i t occurred, as well a,a its line number and its sentence number within the document, and its word number within the sentence. This list, when stored alphabetically by word, then becomes a concordance for the entire input and can be used for retrieval purposes. As a by-product of the generation of the concordance, statistics am also generated to provide a count of the number of documents in the system, the number of words entered, the number of unique words, and the number of significant and nonsignificant (common) words. The system is a complete system, providing a legal retrieval language that permits the user to formulate a wide variety of queries to the system; the system then utilizes the concordance to locate those documents that satisfy the query. The user has a wide latitude in specifying his retrieval criteria, e.g., he may have specified that two relevant words that must appear in any document he desires are “rent” and “control.” He can specify that they must occur in the same sentence before they satisfy his requirements, or that they occur in order, i.e., “rent control,” before the document is satisfactory. This system, although maintaining the entire text of the input on-line to permit a full and flexible retrieval search to be made, places the burden of the search on the user; this leads us to the next section of the discussion, in which we examine some of the tools developed to aid the uaer. The university of Pittsburgh System is currently being used in the retrieval of legal information from the statutes of the States of Pennsylvania, and New Jermy, which have been put on magnetic tape. The State of New York is currently converting its statutes to machine readable form preparatory to installing its own legal retrieval system based on the system developed by the University of Pittsburgh. For a full discussion of information retrieval applied to legal problems, see Lawlor [20a]. 5. Automatic Aids to Retrieval and Dissemination
Our discussion has centered on methods of automatically indexing or classifying information for storage. These techniques, while well adapted for implementation on a computer, put a burden on the uaer at the time of retrieval, since he must oontend with the problems of how to organize his query and what terms to uae in order to retrieve the information he desires. A large number of documents may be retrieved in response to a legitimate query, and it would be a great asset to the user to have them ranked in some way to indicate their responsiveness to his query. 18
INFORMATION RETRIEVAL
Maron and Kuhns [25] examined several facets of the problems encountered in indexing and retrieval and focused in particular on the relevancy problem. They suggested a method for computing a relevance number for each document so that those documents retrieved in response to a query could be ranked in the probable order in which they satisfied the query. Their approach was based on an indexing method, which they called probabilistic indexing, in which the indexer assigns a numerical value to each index term indicating the degree to which each term applies to the document being indexed. They also introduced the idea of “statistical closeness” between index terms in addition to the “semantic closeness” that had been known to exist. Semantic closeness includes such factors aa synonymity among index terms and generic relationships among terms. They used the term “statistical closeness” to cover situations in which terms are associated in documents on the basis of the factors described in the documents and not on semantic relationships among the terms. For example, “logic” implies “switching theory” since the truth-functional logic is used for the analysis and synthesis of switching circuits, although there is nothing about the meaning of the two terms that would imply a relationship. They further illustrated how statistical measures of closeness between index terms can be computed. Although their work was not implemented operationally because of the high degree of dependence on human indexers and the problems associated with weighting the index terms, the concepts they generated were unique and have been utilized by a number of other researchers in the field. Stiles [35] was faced with the operational problems in a large technical system of choosing search terms that would retrieve all documents pertinent to a request, and also ranking the documents retrieved so the user would know which to examine first. He developed an approach that could be completely mechanized and that utilizes an association factor to identify index terms having shtjstical closeness. After considering several formulas, including the ones suggested by Maron and Kuhns, Stiles decided to use a form of the chi-square formula, using marginal values of the 2 x 2 contingency table and the Yates correction for small samples. The formula he chose is: ( I f N - A B I -p7)2N Association factor = log (5.1) AB(N - A ) ( N - B ) where A is the number of documents indexed by one term B is the number of documents indexed by a second term f is the number of documents indexed by a combination of the two terms N is the total number of documents in the collection. 19
CLAUDE E. WALSTON
Using the aasociation factor defined above, Stiles developed a retrieval method which he haa summarized &B follows: (1) Calculate a profile for eaoh term in the query by calculating an
eeaociation factor for each request term and every indexing term in the document collection and rejecting those whose factors are less than 1. (2) Compare profiles of each request term and select those terms appearing in all or a given number of profiles. (3) Using the terms resulting from steps (1) and (2), repeat the process to develop an expanded list, except now a term need not appear in all profiles. (4) Using the expanded list, calculate the association factor for each term with respect to the others. The sum of these factors for each term, divided by the total number of terms in the list, gives a weight for each term. (6) Compare the expanded list with the index terms of each document in the collection and add the weights of the terms that match, This sum is the document relevance number. Documents can then be ranked by their relevance number.
Using this approach, Stiles has been able to improve the operation of the system by retrieving documents pertinent to a query that would not have been retrieved by using the original terms of the query. The measure suggested by Stiles may not always give pertinent results m d should be examined carefully before it is applied to a particular document collection. Other memures of association have been suggested, for example, by Tanimoto [31]. Salton [32] reviews current work in aasocietion techniques and makes comparisons with his own work, which utilizes bibliographic information. The problems of semantic closeness were explored by Doyle [13],who e x m h e d the interrelationships among words within documents and described severd types of word correlation that can occur: words that occur in pairs due to the nature of our language and its use, and words that occur in the mme sentence but in nonadjaoent positions. The statistical effects of these oorrelations he called language redundancy m d reality redundancy. A third type of redundancy, document redundancy, occurs when more than one document is indexed by the same terms. He envisions the development of what he calls semantic road maps aa a solution to these diffioulties. These semantic road maps would allow the computer to present to the user diagram representations of the aasociationsoccurring between the terms in the system to aid him in locating the document he needs. 20
INFORMATION RETRIEVAL
We should mention here several general types of retrieval aid that have had widespread consideration and use, although they are not generated automatically at this point in time. One is the thesaurus, which can be used to provide a controlled set of descriptors, to indicate synonymity and near-synonymity between index terms, as well as to denote generic relationships among index terms. While a thesaurus would have to be prepared manually, it could be kept on-line in a computer and could be automatically searched during the queryformulation function to determine other terms to be added to the query to improve retrieval. The use of roles and links has also been chosen in some systems to improve retrieval by reducing false drops. Role indicators are used to show semantic relationships among terms, for example, the action of one term on another, and links are used to indicate those index terms that have been semantically linked in the original document. Again, roles and links would normally have to be generated manually, although in certain restricted situations, such m those relationships between terms indicated by the use of prepositions, they could be generated automatically. As in the case of the thesaurus during the search function the roles and links affixed to the index terms could be automatically examined by the computer to determine whether the conditions specified in the query were being met. One retrieval aid, used in the legal profession for many years, is the citation index, developed by Shepard as a research tool for searching cases. Garfield [I51 has been very active in extending the concept of citation indexing to the scientific information-retrieval problem, and his paper, a review of this approach, includes a good bibliography. Very briefly, a citation index is an ordered list of cited articles, each of which is accompanied by a list of citing articles. The citation index permits searches to be made backward and forward in time to identify a reference trail through those documents that cite and are cited. The technique lends itself very conveniently to mechanization. Kessler [19a,b] has explored one facet of citation indexing, which he calls “bibliographic coupling.” He postulates that scientific papers are coupled (i.e., have a significant relationship with one another) when they have one or more references in common. Two papers that share one reference contain one unit of coupling. He defines two criteria of coupling by which related groups of papers may be identified: criterion A defines a number of papers as constituting a related group if each member has at least one coupling unit in common with a given test paper; criterion B defines a number of papers as constituting a related group if each member has at least one coupling unit with every member of the group. The technique readily lends itself to mechanization and tests have shown that it does assemble papers into valid groups. 21
CLAUDE E. WALSTON
Another tool of great value in the retrieval of information has been the KWICor the Keyword I n Context System [22]. It has been applied in a variety of ways to give the user another tool for locating information pertinent to his needs, Bmically, ih is a permuted word list that lends itself very readily to implementation on a computer. It was initially used for developing permuted title lists, although its use is by no means restricted to that application. For example, one by-product of the Pittsburgh Legal Retrieval System is a KWIClisting of the significant words from the concordance showing the context in which they occur in the original documents. The noncommon words in the items to be listed in a KWICindex are listed alphabetically along with a fixed number of words that preceded and followed them in the source material in which they originally occurred. Figure 3 is an example of a KWICindex m actually used by Balz and Stanwood [Z] in their detailed bibliography of literature on information retrieval and machine translation. One problem that faces any user of information is maintaining “current awmness” or keeping abremt of new developments. It is not strictly a problem in information retrieval but is closely related to it. Two of the information-retrieval techniques we have discussed, KWIC and bibliographic coupling, are also useful for maintaining current awareness. Another current awareness technique, called Selective Dissemination of Information [23] or SDI, is not an informationretrieval tool but rather a routing tool to assure that information entering a system is automatically sent to the user. Basically, it consists of establishinga profile of each user in the system. Each profile, which is described by keywords, is automatically matched against each incoming document by comparing the index terms of the document against the terms in the user’s profile. If a satisfactory match is achieved, an abstract of the document is automatically sent to the user. If the user is interested in the information, he can request that a copy of the document be sent to him. This request cycle also provides a feedback loop for the SDIsystem to determine how effectively it is satisfying its customer’s needs. SDI can be implemented as a natural corollary to an information-retrievalsystem. 6. Automatic Fact Retrieval
The techniques discussed in the preceding section are primarily techniques for document-retrieval systems. However, research and development activities designed to provide the foundations for designing automatic fact- and information-retrieval systems are also under way. Because of the oomplexity of the problem, the number of projects 22
A MULTIPLE CARD
-
ID SYSTEM OF CODING AND OF THE TEXT.= AUTOMATIC
s,cIENTIFIc
F NATURAL LANGUAGES.= INQUIRY.= WORD CORRELATION AND
ABSTRACT ABSTRACT
RETRIEVAL SYSTEM. = THEORY OF RETRIEVAL CODING.
WEIGJW-62-MCA MALOCJ-59-ATR
CHEMICAL LITERATURE USIN BASED ON STATISTICAL ANA BY COMPUTER.=
KIRSS -57-SRS PURTVA-62-AAB ROWEHT-58-SAC
ABSTRACTING ABSTRACTING ABSTRACTING AUTOMATIC AUTOMATIC AUTOMATIC
INDEXING AND ABSTRACTING 0 RAY LC-61-AIA INDEXING.* AN EXPERIMENTAL MAROME-60-AIE INDEXING.= T025A5-59-WCA
5 x
3
5 Z
5E
rn
< FIU. 3.
An example of a KWICindex.
F
CLAUDE E. WALSTON
involved has been small and progress has been slow, as would be expected. One of the earlier projects in automatic fact retrieval was reported by Green [I61 at Lincoln Laboratory. This group developed the “Baseball” computer program, which obtains answers from stored data to questions supplied to it in ordinary English. The initial application is baseball games and the program can answer such questions as, “How many games did the Yankees win by one runin July?” and “Did every team beat the Yankees at least once in Yankee Stadium in 1959?” The program consists of a linguistic section and a processing section. The linguistic section analyzes each question with the aid of a stored dictionary and derives a specification list that indicates to the processor the relevant data contained in the query and the information requested. The processor extracts the information requested on the specification list from that part of the stored data matching the specifications, does any necessary further processing, such as counting, and then prints the answer. The SYNTHEXproject [33] at the Systems Development Corporation is a research project aimed at the development of general purpose computer systems for the synthesis of human-type cognitive functions. As an initial research vehicle, a prototype system named PROTOSYNTHEX 1 has been programmed. PROTOSYNTHEX 1 reads and indexes English text and selects answers in response to questions phrased in ordinary English. The program system consists of four subsystems: (1) the indexer, (b) the question analyzer and information-retrieval unit, (0) the grammatical analysis system, and (d)the answer evaluation system. The program is reported as working well with simple questions of fact. As a data base, sixteen volumes of the Golden Book Encyclopedia, in maohine-readableform, have been used. The system is general enough to handle hundreds of thousands of words of ordinary English text, although in a simple-mindedfashion. Another approach to a fact-retrieval system has recently been reported by Cooper [II]. He contends that the central theoretical problem of fact retrieval is to develop a system of logical inference among natural language sentences. Recognizing that it is doubtful whether a sound logical analysis of an entire natural language can be attained in the near future, he acknowledge8 that fact-retrieval systems must, for the present, use only those parts of the language for which an analysis has been made. He selectsasublanguageof English and develops a tranalational algorithm for transforming it into a logical language. Both the data base to be queried and the queries themselves are translated into this logical language, where logical inferences can be made to determine whether an answer to the query exists in the stored data,. He 24
INFORMATION RETRIEVAL
has tested this approach, using basic chemical information gleaned from the first few pages of an elementary chemistry text. Initial tests of the system have indicafed that simple questions could be answered fairly quickly by the computer, but for the most difficult questions the computer had to be stopped manually after 16 min of computing without providing an answer. At the moment it is uncertain whether the difficulties arise from inefficiencies in the program or in the decision algorithm far h d i n g answers to the queries. These projects, while still too new to permit drawing conclusions, have exhibited sufficient signs of success to indicate that they probably portend the future direction of the research on automatic fact retrieval. They also indicate two approaches that should be explored-the natural language versus the metalanguage. Cooper questions whether a natural language with all its vagaries and uncertainties can be used to develop the inferences necessary to answer queries, and in fact suggests it cannot. Only further research and experimentation will tell. 7. Conclusion
We have described those efforts in the automation of information retrieval that have had a major influence on the direction and nature of the work that has been performed. The references included here provide information on those developments and problem arem that could only be alluded to in this article because of space limitations; these references contain, collectively, a good bibliography of the literature of information retrieval. The best starting point for surveying the field is the excellent series, Current Reserach and Development in ScientiJic Documentation [27],prepared by the National Science Foundation. They contain a very extensive set of references to work carried out both in the United States and abroad in the field of information retrieval, and to work in other fields that may have strong interaction with information-retrieval research and development. I n any exciting new field of technology, there are those who jump on the bandwagon early and publicize pet panaceas. The informationretrieval field hm in the past unfortunately had more than its share of those who carried out experimental work on an intuitive rather than a scientific basis and who made claims of breakthroughs on the bmis of extremely small test samples. The result was to give information retrieval a somewhat shady reputation, which was not deserved. As it became apparent that in the field of information retrieval there were no shortcuts to success, the usual winnowing process took place, and the work today is being done on a competent and scientific b&s. Before it sounds as if we are engaged in a vendetta against all the early
25
CLAUDE E. WALSTON
workers in information retrieval, however, we should hasten to acknowledge the difficulties of testing and evaluating information-retrieval techniques. I n particular, the selection of evaluation criteria is an extremely knotty problem. Cleverdon’s work [9, 101 clearly pointed up the difficulties in evaluating indexing systems. It also indicated the amount of planning and preparation necessary before adequate testing could begin. Two problems that quickly arise in testing retrieval systems are the number of variables entering into the system, and the resultant difficulty in analyzing the data after the tests are completed. Another factor that makes testing so difficult is that even when the tests are so arranged that there is a “right” answer, there may be other answers that are, in practice, useful to the person who framed the query, and allowance must be made for this in the testing. The main difficulty here, of course, is the interpretation of what is a relevant answer, and this hm been one of the stumbling blocks in evaluating and comparing systems. Cleverdon’s answer to the problem was to place emphasis on the “right” answer by having people not associated with the project compile questions bmed on documents in the collection used in the test. That the problems of testing and evaluation have not been adequately solved is apparent from the evaluation of the American Society for Metals-Western Reserve University Metallurgical Searching Service, which was made by an ad hoc committee of the National Academy of Sciences-National Research Council [26]. This report is of further value since it also contains a good bibliography relating to the testing and evaluation of information-retrieval systems. Information retrieval has had to draw heavily upon a number of scientific disciplines for support m it struggles to develop a theoretical structure of its own. I n return, it has encouraged and challenged research in those disciplines from which it draws by feeding back experimental data, and by pointing the way to exciting new research problems needing solution. For example, we have seen that some of the early work in automatic indexing and classification made the assumption that the significant words in a document were independent entities. It wm recognized that this was only an expedient, but it simplified the statistical procedures involved and in certain instances has provided satisfactory solutions to existing problems. The problems encountered in the retrieval process also confirmed the fact that the words were not independent entities but were also linked together by the various ways in which they had been used in the source document. It became apparent that not only were more complex statistical considerations necessary, but that more complex linguistic factors than roles and links were necessary. The impact of these deliberations has been a closer coupling of linguistics to the problems of information retrieval. It has 26
INFORMATION RETRIEVAL
also tended to point up dramatically our lack of syntactical and semantic theory in some areas and to spur research on these problems. The Current Ruearch and Development in Scientijic Documentation series [27]is an excellent starting point for determining those research efforts in linguistics aimed either directly at problems impacting on information retrieval, or indirectly through such efforts as mechanical language translation. Information retrieval has had to draw heavily upon mathematics in a number of areas. (An excellent state-of-the-art review will be contained in the Proceedings of the Symposium on Xtatistical Association Methods for Mechanized Documentation [34].A number of the researchers whose works have been cited here will have papers in the Proceedings describing the results of their latest research.) One ‘interesting problem to be faced each time a retrieval system is designed is the search problem. The organization of the storage and the search strategies to be used for the most efficient operation of the system is no trivial problem. I n the case of a large store of information, the finding of just one strategy not prohibitive in time, without attempting to find the most efficient, is a large enough problem in its own right. The results of the indexing operating might be depicted as developing a multidimensional space in which are located the original documents of the collection. For purposes of classification or to assist in identification of those documents “closest” to a, given query, i t is desirable to “cluster” documents that are more nearly aligned. The problems here are similar to those encountered in pattern recognition work [27],and can draw upon the mathematical developments there. Parker-Rhodes [28] and Needham [26b],as a result of work on information retrieval, have developed the theory of clumps as a tool to aid in this process. The problem is far from solved and there is room for more creative work. One field of research having impact on information retrieval, as we noted in Section 6 where we described Cooper’s work [ I l l , is automata theory. Kirsch [20],in his excellent paper, suggests certain topics in automata theory that should be explored to achieve a better understanding of the information-retrieval problem. He discusses several areas including the processing of natural language, the description of natural text, and the problems of inference and relevance, where automata theory may have a strong impact. Automata theory may offer the possibility of developing a, unified theory of retrieval. (Two excellent articles reviewing automata theory and artificial intelligence are those by McNaughton [26a]and Pask [28a].) We can see that, although many problems are yet to be solved, progress has been made in the automation of information retrieval. Any technology, and information retrieval is no exception, depends upon a 27
CLAUDE E. WALSTON
good experimental base if it is to grow and support the development of a theoretical foundation. I n information retrieval in the past, too few empirical data have been available, due in part to the cost of collecting a sufficientlylarge sample of data in machine readable form to support the types of tests necessary, and in part to the lack of appropriate computing systems upon which to experiment. Conditions have improved. At the moment, equipment is not a problem; its capabilities have outstripped our abilities to use them fully. Programming systems either available or under design are giving us the ability to conduct more extensive and more varied tests. The quantity of data in machine readable form is increasing, but its availability may be a problem to those requiring it for large-scale testing. It was implicit in the discussion of the techniques in Section 4 that information must be available in machine readable form before information-retrieval systems that incorporate these techniques can be implemented. This factor, reflecting the current costs of converting printed material to machine readable form, has been perhaps the single greatest reason why more mechanized information-retrieval systems have not been installed. The tendency to develop isolated techniques that attack some facet of the informationretrieval problem taken out of context is diminishing, and this is good. What is needed is more support to encourage the necessary systems research and development and to permit the testing so sorely needed at the systems level. This is essential, for information retrieval is a systems problem; its solution will be achieved only by people, equipment, and programs all working as an integrated system. REFERENCES 1. Baker, F. B., Information retrieval based on latent class analysis. J . Aseoc. Computing Machinery 9,612-621 (1962). 2. Balz, C. F.,and Stanwood, R. H., Literature on information retrieval and machine translation. Doc. No. 320-1710, IBM, Data Processing Division, White Plains, New York (November 1962). 3. Bar-Hillel, Y.,Is information retrieval approaching a crisis? Am. Documentation 14, 96-98 (1963). 4. Baxendale, P. B., Machine-made index for technical literature-an experiment. IBM J . Ree. Devdop. 2, 364-361 (1968). 6. Becker, J., and Hayes, R. M., Information Storage and Retrieval. Wiley, New York, 1962. 6. Borko, H.,and Beirnck, M., Automatic document classification. J . Aeeoc. Computing Machinery 10, 161-162 (1963). 7. Borko, H., and Bernick, M., Toward the establishment of a computer baaed cleseiflcetion system for scientific documentation. Rept. No. TM-1763 System Development Corp., Santa Monica, California (February 1964). 8. Bourne, C. P., Methods of Information Hadling. Wiley, New York, 1963. 9. Cleverdon, C. W.,The evaluation of systems used in information retrieval.
28
INFORMATION RETRIEVAL
Proc. Intern. Cmf.Sci. Inform. Vol. 1,pp. 687-698. Nat. Acad. Sci.-Natl. Res. Council, Washington, D.C., 1959. 10. Cleverdon, C. W., Report on the testing and analysis of an investigation into the comparative efficiency of indexing systems. College Aeronautics, Cranfield, England (October 1962). 11. Cooper, W. S., Fact retrieval and deductive question-answeringinformation retrieval systems. J. Aeeoc. Computing Machinery 11, 117-137 (1964). 12. de Sole Price, D. J., Science Since BabyZon. Yale Univ. Press, New Haven, Connecticut, 1961. 13. Doyle, L. B., Semantic road maps for literature searchers.J. Aeeoc. Computing Machinery 8, 553-573 (1961). 14. Edmundson, H. P., and Wyllys, R. E., Automatic abstracting and indexingsurvey and recommendaions. Commun. Aeeoc. Computing Machinery 4, 226-234 (1961). 16. Garfield, E., Science citation index-a new dimension in indexing. Science 144, 646-654 (1964). 16. Green, B. F., Wolf, A. W., Chomsky, C., and Laughery, K., Baseball: an automatic question-answerer. Proc. Weatern Joint Computer Conf. pp. 219224 (May 1961). 17. Green, J. C., The information explosion-real or imaginary? Science 144, 646-648 (1964). 18. Jonker, F., The descriptive continuum: a generalized theory of indexing. Proc. Intern.Conf.Sci. Inform.Vol.2,1291-1311. Natl. Acad. Sci.Nat1. Res. Council, Waahington, D. C., 1959. 19. Kehl, W. B., Horty, J. F., Bacon, C. R. T., and Mitchell, D. S., An information retrieval languagefor legal studies.Commun. Aeaoc. Computing Machinery 4, 380-389 (1961). lQa.Kessler,M. M., An experimental study of bibliographic coupling between technical papers. IEEE Trans. Information Theory 9, 49-50 (1963). lQb.Kessler,M. M., Bibliographic coupling between scientific papers. Am. Documentation 14, 10-25 (1963). 20. Kirsch, R. A., The application of automata theory to problems in information theory (with selected bibliography). Natl. Bur. Std. (U.S.) Rept. No. 7882 (March 1963). tOa.Lawlor, R. C., Information technology and the law. Aduan. Computere 8, 299-352 (1962). 21. Luhn, H. P., The automatic creation of literature abstracts. IBM J. Rea. Develop. 2, 159-165 (1958). 22. Luhn, H. P., Key-word-in-context index for technical literatwe (KWIC index). Am. Documentation 11, 288-295 (1960). 23. Luhn, H. P., Selective dissemination of new scientific information with aid of electronic processing equipment. Am. Documentation la, 131-138 (1961). 24. Maron, M. E., Automatic indexing: an experimental inquiry. J. A88OC. Computing Machinery 8, 404417 (1961). 25. Maron, M. E., and Kuhns, J. L., On relevance, probabilistic indexing and information retrieval. J. Aeeoc. Computing Machinery 7, 216-243 (1960). 26. Marzke, 0. T. (Chairman, An Ad Hoc Committee), The Metallurgical Searching Service of the American Society for Metals-Western Reserve University: A n Evaluation. Pub1 No. 1148, Natl. Acad. Sci.-Natl. Res. Council, Washington, D.C., 1964. 2&.McNaughton, R., The theory of automata, a survey in Advan. Computer8 2, 379-421 (1962).
29
CLAUDE E. WALSTON
26b.Needham, R . M., and Sparck Jones, K., Keywords and clumps. J. Documentation 20, 5-16 (1964). 27. National Science Foundation, Current Ree. Develop. Sci. Documentation No. 11. Office of Technical Services, Dept. of Commerce, Washington, D.C. (November 1962). (Note: No. 12 and No. 13 now in press.) 28. Parker-Rhodes, A. F., The theory of clumps. Rept. No. ML-126, Cambridge Language Research Unit, Cambridge, England, 1960. 28a.Pask, G., A discussion of artificial intelligence and self-organization in Advan. Computere 5 , 109-226 (1964). 29. Postley, J. A., and Buetell, T. D., Generalized information retrieval and listing system. Datamation 8 , 22-25 (1962). 30. The President’s Science Advisory Committee, Science, Government, and Information. U.S. Govt. Printing Office, 1963. 31. Rogers, D. J., and Tanimoto, T. T., A computer program for Classifying plants. Science 132, 1116-1118 (1960). 32. Salton, G., Associative document retrieval techniques using bibliographic information. J. Aseoc. Computing Machinery 10, 440-457 (1963). 33. Simmons, R. F., Synthex: computer synthesis of human language behavior. In Computer Applicatione in the Behavioral Science (H. Borko, ed.),pp. 360393. Prentioe-Hall, Englewood Cliffs, New Jersey, 1962. 34. Stevens, M. E., Ed. Proc. Symp. Statiat. Aeeociation Method8 for Mechanized Documentation, Waahington. D.C., 1964 (sponsored by Natl. Bur. Std. and Am. Documentation Inst.),to be published by U.S. Govt. Printing Office. 35. Stiles, H. E., The association factor in information retrieval. J. Aseoc. Computing Machinery 8, 271-279 (1961). 36. Tasman, P., Literary data processing, IBM J. Ree. Develop. 1, 249-256 (1957). 37. Taube, M., et al., Studiea in Coordinate Indexing, Vol. 1. Documentation, Inc., Washington, D.C., 1953. 38. Taube, M., Gull, C. D., and Wachtel, I. S., Unit terms in coordinate indexing. Amer. Documentation 3, 213-218 (1962). 39. Tukey, J. W., The citation index and the information problem-opportunitiee and research in progress. Ann. Rept. for 1962 under Natl. Sci. Foundation Grant NSF-0-22108, Statist. Techniques Res. Group, Princeton University. 40.’ U.S. House of Representatives, Committee on Education and Labor, Ad Hoc Subcommittee on Research Data Processing and Information Retrieval, National Information Center (Hearings on H.R. 1946), U.S. Govt. Printing Office, Washington, D.C. 41. U.S. Senate, Committee on Government Operations, Documentation, indexing, and retrieval of scientific Information. Doc. No. 113, U.S. Govt. Printing Office, Washington, D.C., 1960 42. U.S. Senate, Subcommittee on Reorganization and International Organizations, Hearings, interagency coordination of information. U.S. Govt. Printing Offico, Washington, D.C., 1963. 43. Vickery, B.C., On Retrieval Syetem Theory. Butterworth, London and Washington, D.C., 1961. 44. Williams, J. H., Jr., A discrimination method for automatically classifying documents. Proc. FaZl Joint Computer Conf. pp. 161-167 (1963). 46. Zipf, G. K., Human Behavior and the Principle of Leaat Effort. AddisonWesley, Reading, Messachusetts, 1949.
30
Speculations Concerning the First Ultraintelligent Machine* IRVING JOHN GOOD Trinity College, Oxford, England and Atlas Computer laboratory, Chilton, Berkchire, England
.
. .
1. Introduction 2. Ultraintelligent Mmhines and Their Value 3. Communication aa Regeneration . 4. Some Representations of “Meaning” and Their Relevance to Intelligent Mmhines . 6. %all and Information Retrieval . 0. Cell Assemblies and Subessemblies . . 7. An Assembly Theory of Meaning 8. TheEconomyof Meaning 9. Conclusions . 10. Appendix: Informational and Caw1 Interactions References
. .
.
.
. . . .
.
.
.
. .
31 33 37 40
43 54
I4 I7 78 80 83
1. Introduction
The survival of man depends on the early construction of an ultraintelligent machine. I n order to design an ultraintelligent machine we need to understand more about the human brain or human thought or both. I n the following pages an attempt is made to take more of the magic out of the brain by means of a “subassembly” theory, which is a modification of Hebb’s famous speculative cell-assembly theory. My belief is that the first ultraintelligent machine is most likely to incorporate vast artificial neural circuitry, and that its behavior will be partly explicable in terms of the subassembly theory. Later machines will all be designed by ultra-
* Baaed on trrlke given in a Conference on the Conceptual Aspects of Biocommunicatione, Neuropsychiatric Institute, University of California, Los Angeles, October 1962; and in the Artificial Intelligence Sessions of the Winter General Meetings of the IEEE, Jenwry 1963 [ I , 461. The first draft of this monograph was completed in April 1963, and the present slightly emended version in May 1964. I em much indebted to Mre. Euthie Anthony of IDA for the 8rdUOUe teak of typing. 31
IRVING JOHN GOOD
intelligent machines, and who am I to guess what principles they will devise? But probably Man will construct the dew ex machina in his own image. The subassembly theory sheds light on the physical embodiment of memory and meaning, and there can be little doubt that both will need embodiment in an ultraintelligent machine. Even for the brain, we shall argue that physical embodiment of meaning must have originated for remons of economy, at lemt if the metaphysical reasons can be ignored. Economy is important in any engineering venture, but especially so when the price is exceedingly high, as it most likely will be for the first ultraintelligent machine. Hence semantics is relevant to the design of such a machine. Yet a detailed knowledge of semantics might not be required, since the artificial neural network will largely take care of it, provided that the parameters are correctly chosen, and provided that the network is adequately integrated with its sensorium and motorium (input and output). For, if these conditions are met, the machine will be able to l e m from experience, by means of positive and negative reinforcement, and the instruction of the machine will resemble that of a child. Hence it will be useful if the instructor knows something about semantics, but not n e c e s s d y more useful than for the instructor of a child. The correct choice of the parameters, and even of the design philosophy, will depend on the usual scientific method of successive approximation, using speculation, theory, and experiment. The percentage of speculation needs to be highest in the early stages of any endeavor. Therefore no apology is offered for the speculative nature of the present work. For we are certainly still in the early stages in the design of rn ultraintelligent machine. I n order that the argumente should be reasonably self-contained,it is necessary to discuss a variety of topics. We shall define an ultraintelligent machine, and, since its cost will be very large, briefly consider its potential value. We say something about the physical embodiment of a word or statement, and defend the idea that the function of meaning is economy by describing it as a process of “regeneration.” I n order to explain what this means, we devote a few pages to the nature of communication. (The brain is of course a complex communication and control system.) We shall need to discuss the process of recall, partly because ifs understanding is very closely related to the understanding of understanding. The process of recall in its turn is a special cme of statistical information retrieval. This subject will be discussed in Section 6. One of the main difficultiesin this subject is how to estimate the probabilities of events that have never occurred. That such probabilities are relevant to intelligence is to be expected, since intelligence is sometimes defined as the ability to adapt to new circumstances.
32
THE FIRST ULTRAINTELLIGENT MACHINE
The difficulty of estimating probabilities is sometimes overlooked in the literature of artificial intelligence, but this article would be too long if the subject were surveyed here, A separate monograph has been written on this subject [as]. Some of the ideas of Section 6 are adapted, in Section 6, to the problem of recall, which is discussed and to some extent explained in terms of the subassembly theory. The paper concludes with some brief suggestions concerning the physical representation of “meaning.” This paper will, as we said, be speculative: no blueprint will be suggested for the construction of an ultraintelligent machine, and there will be no reference to transisitom, diodes, and cryogenics. (Note, however, that cryogenics have the important merit of low power consumption. This feature will be valuable in an ultraintelligent machine.) One of our aims is to pinpoint some of the difficulties. The machine will not be on the drawing board until many people have talked big, and others have built small, conceivably using deoxyribonucleic acid (DNA). Throughout the paper there are suggestions for new research. Some further summarizing remarks are to be found in the Conclusions. 2. Ultraintelligent Machines and Their Value
Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an “intelligence explosion,” and the intelligence of man would be left far behind (see for example refs. [ZZ], [34], [44]).Thus the first ultraintelligent machine is the last invention that man need ever make, provided that the machine is docile enough to tell us how to keep it under control. It is curious that this point is made so seldom outside of science fiction. It is sometimes worthwhile to take science fiction seriously. I n one science fiction story a machine refused to design a better one since it did not wish to be put out of a job. This would not be an insuperable difficulty, even if machines can be egotistical, since the machine could gradually improve itselfout of all recognition, by acquiring new equipment. B. V. Bowden stated on British television (August 1962) that there is no point in building a machine with the intelligence of a man, since it is easier to construct human brains by the usual method. A similm point was made by a speaker during the meetings reported in a recent IEEE 33
IRVING JOHN GOOD
publication [ I ] , but I do not know whether this point appeared in the published report. This shows that highly intelligent people can overlook the “intelligence explosion.” It is true that it would be uneconomical to build a machine capable only of ordinary intellectual attainments, but it seems fairly probable that if this could be done then, at double the oost, the machine oould exhibit ultr&ntelligence. Since we are concerned with the economical construction of an ultraintelligent machine, i t is necessary to consider &st what such a machine would be worth. Carter [ I I ] estimated the value, to the world, of J. M. Keynes, as at least 100,000 million pounds sterling. By definition, an ultraintelligent machine is worth far more, although the sign is uncertain, but since it will give the human race a good chance of surviving indehitely, it might not be extravagant to put the value at a megakeynes. There is the opposite possibility, that the human race will become redundant, and there are other ethical problems, such as whether a machine could feel pain especially if it contains chemical artScial neurons, and whether an ultraintelligent machine should be dismantled when it becomes obsolete [43, 841. The machines will create social problems, but they might also be able to solve them in addition to those that have been created by microbes and men. Such machines will be feared and respected, and perhaps even loved. These remarks might appear fanciful to some readers, but to the writer they seem very real and urgent, and worthy of emphasis outside of science fiction. If we could raise say a hundred billion dollars we might be able to simulate all the neurons of a brain, and of a whole man, at a cost of fen dollars per artificial neuron. But it seems unlikely that more than say a millikeyneswould actudly be forthcoming, and even this amount might be difficult to obtain without first building the machine! It would be justified if, with this expenditure, the chance of sucoess were about lo-*. Until an ultraintelligent machine is built perhaps the best intellectual feats will be performed by men and machines in very close, sometimes called “symbiotio,” relationship, although the term “biomechanical” would be more appropriate. As M. H. A. Newman said in a private communication in 1946, an electronic computer might be used as “rough paper” by a mathematidan. It could already be used in this manner by a o h m player quite effectively, although the effectiveness would be much increased if the ahess-playing programs were written with extremely close man-machine interaction in mind from the start. The reason for this effectiveness is that the machine has the advantage in speed and accuracy for routine calculation, and man has the advantage in imagination. Moreover, a large part of imagination in chess can be reduced to routine. Many of the ideas that require imagination in the amateur are routine for the matar. Consequently the machine might 34
THE FIRST ULTRAINTELLIGENT MACHINE
appear imaginative to many observers and even to the programmer. Similar comments apply to other thought processes. The justification for chess-playing programs is that they shed light on the problem of artificial intelligence without being too difficult to write. Their interest would be increased if chess were replaced by socalled “randomized chess,” in which the positions of the white pieces on the first rank are permuted a t random before the game begins (but with the two bishops on squares of opposite colors), and then the initial positions of the black pieces are determined by mirror symmetry. This gives rise to 1440 essentially distinct initial positions and effectively removes from the game the effect of mere parrot learning of the openings, while not changing any of the general principles of chess. I n ordinary chess the machine would sometimes beat an international Grandmaster merely by means of a stored opening trap, and this would be a hollow victory. B’urthermore a program for randomized chess would have the advantage that it would not be necessary to store a great number of opening variations on magnetic tape. The feats performed by very close man-machine interaction by say 1980 are likely to encourage the donation of large grants for further development. By that time, there will have been great advances in microminiaturization, and pulse repetition frequencies of one billion pulses per second will surely have been attained in large computers (for example see Shoulders [91]).On the other hand, the cerebral cortex of a man has about five billion neurons, each with between about twenty and eighty dendrites ([go], pp. 36 and 61), and thousands of synapses. (At the recent IEEE meetings, P. Mueller offered the estimate 300,000 orally. It would be very interesting to know the corresponding figure for the brain of a whale, which, according to Tower [99], has about three times as many neurons as a human brain. Perhaps some whales are ultraintelligent! [as].) Moreover, the brain is a parallel-working device to an extent out of all proportion t o any existing computer. Although computers are likely to attain a pulse repetition speed advantage of say a million over the brain, it seems fairly probable, on the basis of this quantitative argument, that an ultraintelligent machine will need to be ultraparallel. I n order to achieve the requisite degree of ultraparallel working it might be weful for many of the elements of the machine to contain a very short-range microminiature radio transmitter and receiver. The range should be small compared with the dimensions of the whole machine. A “connection” between two close artificial neurons could be made by having their transmitter and receiver on the same or close frequencies. The strength of the connection could be represented by the accuracy of the tuning. The receivers would need numerous filters so as to be 35
IRVING JOHN GOOD
capable of receiving on many different frequencies. “Positive reinforcement” would correspond to improved tuning of these filters. It cannot be regarded &B entirely certain that an ultraintelligent machine would need to be ultraparallel, since the number of binary operations per second performed by the brain might be far greater than is necefor a computer made of reliable components. Neurons are not fully reliable; for exrtmple, they do not all last a lifetime; yet the brain is extremely efficient. This efficiency must depend partly on “redundancyyyin the aense in which the term is used in information theory. A machine made of reliable components would have an advantage, and it seemsjwt possible that ultrapardel working will not be essential. But there is a great waste in having only a small proportion of the components of a machine active at any one time. Whether a machine of classical or ultraparallel design is to be the first ultraintelligent maohine, it will need to be able to handle or to learn to handle ordinary language with great facility. This will be important in order that its instructor should be able to teach it rapidly, and so that later the machine will be able to teach the instructor rapidly. It is very possible also that natural languages, or something analogous to them rather than to formd logic, a m an essential ingredient of scientific imagination. Also the machine will be called upon to translate languages, and perhaps to generate fine prose and poetry a t high speed, so that, all in all, linguistic facility is at a high premium. A man cannot leam more than ten million statements in a lifetime. A maohine could already store this amount of information without much difficulty, even if it were not ultraparallel, but it seems likely that it would need to be ultrapmallel in order to be able to retrieve the information with facility. It is in recall rather than in retention that the ordinary human memory reveals its near magic. The greatest mental achievements depend on more than memory, but it would be a big step towmd ultraintelligence if human methods of recall could be simulated. For the above reaaons, it will be assumed here that the first ultraintelligent machine will be ultraparallel, perhaps by making use of radio, aa suggested. For definiteness, the machine will be assumed to incorporate an artificial neural net. This might be in exceedingly close relationship with an ordinary electronic computer, the latter being used for the more formalizable operations [33]. I n any event the ultraintelligent machine might as well have a large electronic computer at its beck and call, and also a multimillion dollar information-retrieval installation of large capacity but of comparatively slow speed, since these would add little to the total cost. It is unlikely that facility in the use of language will be possible if semantic questions are ignored in the design. When we have read or 36
THE FIRST ULTRAINTELLIGENT MACHINE
listened to some exposition we sometimes remember for a long time what it meant, but seldom how it was worded. It will be argued below that, for men, meaning serves a function of economy in long-term retention and in information handling, and this is the basis for our contention that semantics are relevant to the design of an ultraintelligent machine. Since language is an example of communication, and since an ultraintelligent machine will be largely a complicated communication system, we shall briefly consider the nature of communication. It will be argued that in communication a process of “generalized regeneration” always occurs, and that it servesa function of economy. It will also be argued that the meanings of statements are examples of generalized regeneration. 3. Communication as Regeneration’
I n a communication system, a source, usually a time series denoted here by S ( t ) or S for short, undergoes a sequence of transformations. The f i s t transformation is often a deterministic encoding, which transforms the source into a new time series, T&’(t).This is noisily (indeterministically) transmitted, i.e., it undergoes a transformation T , which is a random member of some class of transformations. If the possible sources are, in some sense, far enough apart, and if the noise is not too great, then the waveforms T , T d will, also in some sense, tend to form clumps, and it will be possible with high probability to reconstruct the encoded sources at the receiving end of the channel. This reconstruction is called here (generalized) regeneration, a term that is most familiar in connection with the reshaping of square pulses. When dealing with groups of consecutive pulses, the term error correction is more usual, especially when it is assumed that the individual pulses have themselves been fist regenerated. Another way of saying that the source signals must be far enough apart is to say that they must have enough redundancy. In a complicated network, it is often convenient to regard signals as sources at numerous places in the network and not merely at the input to the network. The redundancy might then be represented, for example, by mere serial or parallel repetition. A compromise between pure regeneration and the use of the whole garbled waveform T,T&’(t) is probabilistic regeneration, in which the garbled waveform is replaced by the set of probabilities that it has arisen from various sources [42]. I n probabilistic regeneration less information is thrown away than in pure regeneration, and the later data-handling costs more, but less than it would cost if there were no regeneration at all. The hierarchical use of probabilistic regeneration would add much flexibility to complicated communication networks. 1
For a short survey of the nature of communication, BBB for example Pierce [800].
37
IRVING JOHN G O O D
An example of generalized and hierarchical regeneration is in the use of words in a language. A word in a spoken language could well be defined aa a clump of short time series, that is, a cZm8 of time series having various properties in common. (The class depends on the speaker, the listener, and the context; and membership of the class is probabilistic since there are marginal caaes.) If any sound (acouatic time series) belonging to the clump is heard, then the listener mentally regenerates the sound and replaces it by some representation of the word. He will tend to remember, or to write down, the word and not its precise sound, although if any other significant property of the sound is noticed it might also be remembered. The advantage of remembering the word rather than the precise sound is that there is then less to remember and a smaller amount of information handling to do. This process of regeneration occurs to some extent at each of the levels of phonemes, worda, sentences, and longer linguistic stretches, and even at the semantic level, and wherever it occurs it serves a function of economy. But the economy is not parsimonious: “redundancy,’ often remains in the coding in order that the encoded message should continue to have useful error-correcting features. The redundancy often decreases with the passage of time, perhaps leading eventually to the extinction of a memory. That communication theory haa a bearing on the philosophy of meaning haa been suggested before (see for example, Weaver [89], pp. 114-117, and Lord Brain [S]). Note also the definition of the amount of subjective information in a proposition, as - log, p where p is the initial subjective probability that the proposition is true ([21], p. 76). This could also be described aa subjective semantic information: when the probabilities are credibilities (logical probabilities) we obtain what might be called objective semantic information [5, 101, the existence of which is, in my opinion, slightly more controversial. That subjective probability is just aa basic as communication theory to problems of meaning and recognition, if not more so, is a necessary tenet for any of us who define reasoning aa logic plus probability ([21],pp. 3 and 88; see also Colin Cherry [12], pp. 200 and 274, Woodward “51, and Tompkins [98]). The implication is that both the initial (prior) probabilities and the likelihoods or “weights of evidence” [21] should be taken into account in every practical inference by a rational man, and in fact nearly always are taken into account to some extent, at least implicitly, even by actual men. (In caae this thesis should appear as obvious to some readers aa it does to the writer, it should be mentioned that in 1960 very few statisticians appeared to accept the thesis; and even now they are in a minority.) There is conclusive experimental evidence that the recognition of worda depends on the initial proba38
THE FIRST ULTRAINTELLIGENT MACHINE
bilities [94]:a well-known method of deception when trying t o sell a speech-synthesis system is to tell the listeners in advance what will be said on it, and thereby to make i t easily intelligible when it is repeated. There is a similar effect in the perception of color [9]. The rational procedure in perception would be to estimate the final (a posteriori) probabilities by means of Bayes’ theorem, and then perhaps to select one or more hypotheses for future consideration or action, by allowing also for the utilities. (Compare refs. [24], [IZ],p. 206, and Middleton [SS].) I n fact the “principle of rationality” has been defined as the recommendation to maximize expected utility. But it is necessary to allow also for the expected cost of information handling including theorizing [23, 401, and this is why regeneration and probabilistic regeneration are useful. We pointed out above that the organization of regeneration is often hiermchical, but i t is not purely so. For example, we often delay the regeneration of a phoneme until the word to which the phoneme belongs has been regenerated with the help of the surrounding context. Likewise if a machine is t o be able to “understand” ordinary spoken language in any reasonable sense, it seems certain that its corresponding regeneration structure must not be purely hierarchical unless it is also probabilistic. For each process of nought-one or pure regeneration (each definite “decision”) loses information, and the total loss would certainly be too great unless the speech were enunciated with priggish accuracy. The probabilistic regeneration structure that will be required will be much more complex than a “pure” regeneration structure. (Historical note: the hierarchical structure of mental processes was emphasized by Gall [20],McDougall [66],and many times since-see for example MacKay [63],Hayek [53],and others [30],[34],[87],[all.) It seems reasonable to the present writer that probabilistic regeneration will, for most purposes, lose only a small amount of information, and that, rather than to use anything more elaborate, it is likely to be better to compromise between pure and probabilistic regeneration for most purposes. The applications of regeneration in the present paper will be to assemblies, subassemblies, and meaning. When a person recalls a proposition he could be said to regenerate its meaning; when he understands a statement made by another person the term “transgeneration” would be more appropriate; and when he thinks of a new proposition, the process would be better called “generation,” but we shall use the word “regeneration” to cover all three processes. For example, when listening to speech, the production of meaning can be regarded as the last regeneration stage in the hierarchy mentioned before, and it performs a function of economy just as all the other stages do, It is possible 39
IRVING JOHN GOOD
that this has been frequently overlooked because meaning is associated with the metaphysical nature of consciousness, and one does not readily associate metaphysics with questions of economy. Perhaps there is nothing more important than metaphysics, but, for the construction of an artificial intelligence, it will be necessary to represent meaning in some physical form. 4. Some Representations of “Meaning” and Their Relevance to Intelligent Machines
Semantics is not relevant to all problems of mechanical language processing. Up to a point, mechanical translation can be performed by formal processes, such as dictionary look-up and some parsing. Many lexical ambiguities can be resolved statistically in terms of the context, and some as a consequence of the parsing. Sometimes one can go further by using an iterative process, in which tke lexic’al ambiguities are resolved by the parsing, and the parsing in its turn requires the resolution of lexical ambiguities. But even with this iterative process it seems likely that perfect translation will depend on semantic questions [ l a , 891. Even if this is wrong, the design of an ultraintelligent machine will still be very likely to depend on semantics [31,50]. What then is meant by semantics? When we ask for the meaning of a statement we are talking about language, and are using a metalanguage; and when we ask for the meaning of “meaning” we are using a metametalanguage, so it is not surprising that the question is difficult to answer. A recent survey chapter waa entitled “The Unsolved Problem of Meaning” [3]. Here we shall touch on only a few aspeots of the problem, some of which were not mentioned in that survey (see also Black [7]). It is interesting to recall the thought-word-thing triangle of Charles Pierce and of Ogden and Richards. (See, for example Cherry [12],p. 110. Max Black ascribed a similar “triangle” to the chemist Lavoisier in a recent lecture.) It will help to emphasize the requirement for a physical embodiment of meaning if it is here pointed out that the triangle could be extended to a thought-word-thing-engramtetrahedron, where the fourth vertex represents the physical embodiment of the word in the brain, and will be assumed here usually to be a cell aasembly. Given a class of linguistic transformations that transform statements into equivalent statements, it would be plausible to represent the meaning of the statement, or the proposition expressed by the statement, by the class of all equivalent statements. (This would be analogous to a modified form of the Frege-Russell definition of a cardinal 40
THE FIRST ULTRAINTELLIGENT MACHINE
integer, for example, “3” can be defined as the class of all classes “similar” to the class consisting of the words “Newton,” “Gauss,” and “Bardot.”) The point of this representation is that it makes reference to linguistic operations alone, and not to the “outside world.” It might therefore be appropriate for a reasoning machine that had few robotic properties. Unfortunately, linguistic transformations having a strictly transitive property are rare in languages. There are also other logical difficulties analogous to those in the Frege-Russell definition of a cardinal integer. Moreover, this representation of meaning would be excessively unwieldy for mechanical use. Another possible representation depending on linguistic transformations is a single representative of the class of all equivalent statements. This is analogous to another “definition” or, better, “representation,” of a cardinal integer (see for example Halmos [51], p. 99). This representation is certainly an improvement on the previous one. If this representation were to be used in the construction of an ultraintelligent machine, i t would be necessary to invent a language in which each statement could be reduced to a canonical form. Such an achievement would go most of the way to the production of perfect mechanical translation of technical literature, as has often been recognized, and it would also be of fundamental importance for the foundations of intuitive or logical probability ([21], pp. 4 and 48). The design of such a “canonical language” would be extremely difficult, perhaps even logically impossible, or perhaps it would require an ultraintelligent machine to do it! For human beings, meaning is concerned with the outside world or with an imaginary world, so that representations of meaning that are not entirely linguistic in content might be more useful for our purpose. The behaviorist regards a statement as a stimulus, and interprets its meaning in terms of the class of its effects (responses)in overt behavior. The realism of this approach was shown when “Jacobson . . . made the significant discovery that action potentials arise in muscles simultaneously with the meaning processes with which the activity of the muscle, if overtly carried out, would correspond” ([3], p. 667). Thus the behavioral interpretation of meaning might be relevant for the understanding of the behavior and education of people and robots, especially androids. But, for the design of ultraintelligent machines, the internal representation of meaning (inside the machine) can hardly be ignored, so that the behavioral interpretation is hardly enough. So far we have been discussing the interpretation and representation of the meaning of a statement, but even the meaning of a word is much less tangible and clear-cut than is sometimes supposed. This fact was emphasized, for example, by the philosopher G . E. Moore. Later John 41
IRVING JOHN GOOD
Wisdom (not J. 0.Wisdom) emphasized that we call an object a cow if it has enough of the properties of a cow, with perhaps no single property being essential. The need to make this interpretation of meaning more quantitative and probabilistio has been emphmized in various places by the present writer, who has insisted that this “probabilistic definition” is of basic important for future elaborate informationretrieval systems [29, 35, 31, 43, 411. “An object is said to belong to class C (such as the class of cows) if some function f(zll, Pa, . ., p,) is positive, where the p’s are the credibilities (logical probabilities) that the object has qualities Qr, Qa, .,Q,. These probabilities depend on further functions related to other qualities, on the whole more elementary, and so on, A certain anuyurtt of circulatmy &s typical. For example, a connected brown patch on the retina is more likely to be caused by the presence of a cow if it has four protuberances that look like biological legs than if it has six; but each protuberance is more likely to be a biological leg if it is connected to something that resembles a oow rather than a table. In view of the ciroularity in this interpretation of “definition,” the stratification in the struoture of the cerbral cortex can be only a first approximation to the truth” ([all, pp. 124-125; see also Hayek [53], p. 70). The slight confusion in this paswge, between the definition of a cow and the recognition of one, ww deliberate, and especially appropriate in an anthology of partly baked ideas. It can be resolved by drawing the distinction between a logical probability and a subjective probability (see for example [36]),and also the distinction between subjective and objective information that we made in the previous section. If we abandon interpretations of meaning in terms of linguistic transformations, such as dictionary definitions, or, in the case of statements, the two interpretations mentioned before; and if also we do not regard the behavioral interpretations w sufficient, we shall be forced to consider interpretations in terms of internal workings. Since this article is written mainly on the assumption that an ultraintelligent machine will consist largely of an artificial neural net, we need in effect a neurophysiological representation of meaning. The behavioral interpretation will be relevant to the education of the machine, but not so much to its design. It does not require much imagination to appreciate that the probabilistic and iterative interpretation of the definition of a word, as described above, is liable to fit well into models of the central nervous system. It has been difficultfor the writer to decide how much neurophysiology should be discussed, and hopefully an appropriate balance is made in what follows between actual neurophysiology and the logic of artificial neural networks. The discussion will be based on the speoulative
.
..
42
THE FIRST ULTRAINTELLIGENT MACHINE
cell-assembly theory of Hebb [54] (see also [53]and [71]),or rather on a modification of it in which “subassemblies” are emphasized and a central control is assumed. If the present discussion contains inconsistencies, the present writer should be blamed. (For a very good survey of the relevant literature of neurophysiology and psychology, see Rosenblatt [82],pp. 9-78.) 5. Recall and Information Retrieval
Whatever might be the physical embodiment of meaning, it is certainly closely related to that of long-term recall. Immediate recall is not strongly related to semantics, at any rate for linguistic texts. I n fact, experiments show that immediate and exact recall of sequences of up to fifty words is about ati~good for meaningless texts as it is for meaningful texts, provided that the meaningless ones are at lemt “fifth order” approximations to English, that is to say that the probability of each word, given the previous five, is high [70]. The process of recall is a special case of information retrieval, so that one would expect there to be a strong analogy between the recall of a memory and the retrieval of documents by means of index terms. An immediate recall is analogous to the trivial problem of the retrieval of a document that is already in our hands. The problem of the retrieval of documents that are not immediately to hand is logically a very different matter, and so it is not surprising that the processes of immediate and long-term recall should also differ greatly. The problem of what the physical representation is for immediate recall is of course not trivialy but for the moment we wish to discuss long-term recall since it is more related to the subject of semantics. The usual method for attacking the problem of document retrieval, when there are many documents (say several thousand), is to index each document by means of several index terms. We imagine a library customer, in need of some information, to list some index terms without asauming that he uses any syntax, at least for the present. I n a simple retrieval system, the customer’s index terms can be used to extract documents by means of a sort, as of punched cards. The process can be made more useful, not allowing for the work in its implementation, if the terms of the documents, and also those of the customer, are given various weights, serving in some degree the function of probabilities. We then have a weighted or statistical system of information retrieval. One could conceive of a more complicated information-retrieval system in which each document had associated with it a set of resonating filters forming a circuit C. All documents would be queried in parallel: the “is-there-a-doctor-in-the-house” principle [MI. The 43
IRVING JOHN GOOD
amount of energy generated in the circuit C would be fed back t o a master control circuit. (In the brain, the corresponding control system might be the “centrenoephalic system” [79].) Whichever circuit C fed back the maximum power, the corresponding document would be extrwted first. If this document alone failed to satisfy the customer completely, then the circuit C would be provisionally disconnected, and the process repeated, and so on. Ideally, this search would be probabilistic, in the sense that the documents would be retrieved in order of descending a posteriori probability, and the latter would be registered also. If these were p,, pa, ., then the process would stop at the nth document, where ,,, p,,, For exthere would be a threshold on n, and on p , pa ample, the process might stop when n = 10, or when pl p a . . . p,, > 0.96, whichever occurred first. The thresholds would be parameters, depending on the importance of the search. (For the estimation of probabilities, see [48].) When we wish to recall a memory, such as a person’s name, we consciously or unconsciously use clues, analogous to index terms. These clues are analogous to weighted index terms, and it seems virtually certain that they lead to the retrieval of the appropriate memory by means of a parallel search, just as in the above hypothetical documentretrieval system. The speed of neural conduction is much too slow for a primarily serial search to be made. The search might very well be purtly serial: the less familiar memories take longer to recall and require more effort. This might be because the physical embodiment of the less familiar memory requims a greater weight of clues before it will “resonate” strongly enough. Further evidence that the search is, on the whole, more parallel than serial can be derived from Mandelbrot’s explanation of the Zipf “law” of distribution of words [28]. The explanation requires that the effort of extracting the pth oommonest word from memory is roughly proportional to log r. This is reasonable for a parallel search, whereas the effort would be roughly proportional to r for a serial search. When the clues do spark off the sought memory, this memory in its turn reminds us of other clues that we might have used in advance if we had thought of doing so. These “retrieved clues” often provide an enormous factor in favor of the hypothesis that the memory retrieved is the one that wm sought: consequently we are often morally certain that the memory is the right one once i t is recalled, even though its recall might have been very difficult. There is again a strong resemblance to document retrieval. When we extract a wrong memory, it causes incorrect clues to come to mind, and these are liable to block the correct memory for a number of 44
..
+
+ +
+ + +
THE FIRST ULTRAINTELLIGENT MACHINE
seconds, or for longer if we panic. This is another reaaon why the leas familiar memories take longer to recall. When we wish to recall incidents from memory, pertaining to a particular subject, the method used is t o bring to mind various relevant facta and images in the hope that they are relevant enough, numerous enough, vivid enough, independent enough, and specific enough to activate the appropriate memory. (If specificity is lacking, then the wrong memory is liable to be recalled.) There is a clear analogy with the probabilistic definition of a word and probabilistic recognition of an object quoted in Section 4. A corresponding method of information retrieval is to list index terms that are relevant enough, numerous enough, independent enough, and specific enough, and (if the process is not entirely mechanized) vivid enough. This attitude towards index terms leads to forms of probabilistic or statistical indexing, as suggested independently by the writer ([35],[31], p. 12) and by Maron and Kuhns [64] who treated the matter in more detail. The present writer regards subjective and logical probabilities as partially ordered only [21], but does not consider that the fact of only partial ordering is the main source of the difficulties in probabilistic indexing. We have said enough to bring out the analogy between the process of recall and the techniques of document retrieval, and to indicate that, if i t is possible to develop a comprehensive theory of either of these subjects, i t should be a probabilistic theory. The need for a probabilistic theory is further brought out by means of a short discussion of what might be called “statistical semantics.” A complete discussion of statistical semantics would lean heavily on (i) the very intricate subject of non-statistical semantics, and on (ii) some statistical theory concerning language, without any deep discussion of semantic problems, But our purpose in this section is only to make clear that a complete treatment of statistical semantics would be somewhat more general than recall and document retrieval. If we wish to teach a language to a baby who starts in a state of comparative ignorance, we simultaneously allow him to become familiar with some part of the world of nonlinguistic objeots and also with linguistic sounds, especially phonemes. The primitive ability of the baby to achieve this familiarity, although not much more remarkable than the achievements of lower animals, is still very remarkable indeed, and more so, in the writer’s opinion, than anything that comes later in his intellectual development. If this opinion is correct, then most of the struggle in constructing an ultraintelligent machine will be the construction of a machine with the intelligence of an ape. The child later associates words with objects and activities, by implicit statistical inference: in fact the fist words learned are surely 45
IRVING JOHN GOOD
regarded by the child aa properties of an object in much the same sense aa the visual, olfactory, and tactual properties of the object. For example, if the child succeeds in pronouncing a word to an adequate approximation, and perhaps points in approximately the right direction or otherwise makes approximately the right gesture, then, statistically speaking, events are more likely to occur involving the object or activity in question; and, if the environment is not hostile, the events are likely to be pleasurable. Thus the words and gestures act aa statistical index terms for the retrieval of objects, and the activation of processes. A t a later stage of linguistic development, similar statistical associations are developed between linguistic elements themselves. The subject of statistical semantics would be concerned with all such statistical aesociations, between linguistic elements, between nonlinguistic and linguistic elements, and sometimes even between nonlinguistic elements alone. A basic problem in Statistical semantics would be the estimation of probabilities P(W,I 0,) and P(Oj I WJ, where Wi represents a word (a clump of acoustic time series defined in a suitable abstract space, or, in printed texts, a sequence of letters of the alphabet with a space at both ends: admittedly not an entirely satisfactory definition), and 0, represents an object or an activity. P(Wi I 0,) denotes the probability that a person, speaking a given language, will use the word W , to designate the object O,,and P(Oj I Wi) is the probability that the object 0,is intended when the word W,is used. Strictly, the estimation of probabilities is nearly always interval estimation, but, for the sake of simplicity, we here talk as if point estimation is to be used. The ranges of values of both i a n d j are great; the vocabulary of an educated man, in his native tongue, is of the order of 30,000 words and their simple derivatives; whereas the range of values of j is far far greater. The enormity of the class of objects is of course reducible by means of classification, which, in recognition, again involves a process of regeneration, just as does the recognition of a word, An ideal Statistical dictionary would, among other things, present the two probability matrices, (Compare Sparck Jones [95] and the discussion.) Such a dictionary would, apt& from interdependences between three or more entities, give all the information that could be given, by a dictionary, for naming an object and for interpreting a word. Existing dictionaries sometimes indicate the values of the probabilities P(W, I Oj) to the extent of writing “rare”; and also the variations between subcultures are indicated (“archaic,” “dialect,” “slangyyy “vulgax,y’and 80 on). But let
46
THE FIRST ULTRAINTELLIGENT MACHINE
us, somewhat unrealistically, imagine a statistical-dictionary maker who is concerned with a fixed subculture, so that the two probability transition matrices are fixed. One method he can use is to take linguistic texts, understand them, and thus build up a sample (f,),where fij is the frequency with which object Oj is designated by word W,. Within the hypothetically infinite population from which the text is extracted, there would be definable probabilities P(Wi) and P(Oj)for the words and objects, and a joint probability P(W i Oj) crudely estimated by fij/Zijfij. If these joint probabilities could be estimated, then the two probability matrices could be readily deduced. We have now said enough to indicate the very close relationship that exists between statistical semantics, recall, and the retrieval of documents. I n the remaining discussion in this section we shall restrict our attention t o the question of retrieval of documents, including abstracts. This is a particular case of the retrieval of objects and the inauguration of processes, and the discussion brings out some of the theoretical difficulties of statistical semantics in a concrete manner. A basic problem, bordering on semantics, is the estimation of the probability P(D, I Wi), where V,represents a word, or index term, and Dj represents a document, or other object, and P(Dj I Wi) denotes the probability that D, represents a sought document, when Wi is an index term, and when i t is not known what the other index terms are. Strictly speaking, the probability depends on the customer, but, for the sake of simplicity, it will be msumed here that the indexer of the documents, and all the customers, speak the same indexing language. The problem of estimating P(Dj I Wi) is nontrivial to say the least [a], but let us imagine it solved for the present. Next suppose that index terms W,, W,, . , , W, have been specified. Then we should like to be able to compute the probabilities P(Dj I W , * W , . . . Wm), where the periods denote logical conjunction. One could imagine this probability to be estimated by means of a virtually infinite sample. Reasonable recall systems would be those for which (i) the probability that the document Dj will be recalled is equal to the above probability; (ii) the document Dj that maximizes the probability is selected; or (iii) the documents of highest (conditional) probability are listed in order, together with their probabilities. (Compare, for example [35],[31],p. 12, [all.) I n one of the notations of information theory [26, 671,
.
-
log P(D, I W , * =
w, . . . *
+ I(D, : w,
log P(Dj)
*
* w 8’
W,)
. . . W,) *
(5.1)
where I ( I : E”) denotes the amount of information concerningB provided 47
IRVING JOHN GOOD
by F, and is defined (for example [26,I&]) as the logazithm of the “twociation factor”
(The “clssociation factor” as defined in refs [27],[31],and [all is the factor by which the probability of one proposition is multiplied in the light of the other. It is used in a different sense, not as a population parameter, in Stiles [96].)The amount of information concerning E provided by P is a symmetrical function of 1c and P and is also called the “mutual information” between E and F,and is denoted by I ( E , P) when we wish to emphasize the symmetry. Notice that our “mutual information” is not an expected value aa is, for example, the “relatedness” of McGill and Quastler [67].Shannon [89]always used expected values. If the index terms W1,W s , , W , provide statistically independent information concerning Dj (i.e., if W,, . . , W , are statistically independent, and are also statistically independent given D5),then
...
log P(D5 I
..
W1’
.
m
*
+ C I(D, : Wr)
Wm) = log P(D5)
r-1
(6.3)
The expected rate at which the individual index terms provide information concerning documents is
conveniently denoted by I ( D : W ) (compare [89],p. go), but this does not allow for the expectation of the mutual information when several index terms are used. A knowledge of the expectations, for various values of m, would be relevant to the design of information-retrieval systems, since its antilogarithm would give some idea of the “cut-down factor” of an m-term request. When one wishes to choose between only two documents, then the final log-odrle ere equal to the initial log-odds plus the sum of the “weighta of evidence” or “log factors” (see [21]for the terminology here and cf. Minsky [73]). It should be noted that Eq. (6.1) and (6.3) ere just ways of writing Bayes’ theorem, but this is not a stricture, since Bayes’ theorem is likewise just a way of writing the product axiom of probability theory, It is suggestive to think of Bayes’ theorem in a form that is expressible in one of the nofactions of information theory, since the various terms in Eqs.(6.1) and (6.3) might correspond to physical meohanisma, associa-
48
THE FIRST ULTRAINTELLIGENT MACHINE
tive bonds between memory traces (see Section 6). The use of Eq. (6.3) might be described aa the “crude” use of Bayes’ theorem, or as a first approximation to the ideal procedure. It was used, for example, in [73], and I think also in [64]. It is a special case of discrimination by means of linear discriminants of the form
which have been used, for example, in the simplest class of perceptra [82],and in suggestions or experiments related to mechanical chess and chequers (draughts) (for example [33, 83, 18, 8 3 ~ 1 ) . One can write (5.4) in the form
aj
+c
4 , P &P
(6.5)
where now the summation is over all words in the language, not just those that occur in the context, and eP is defined aa 1 if W, does occur and as 0 otherwise. It is because of this mode of writing (6.4) that we call i t a linear discriminant. It has been claimed [I041 that the more general form, (6.4) or ( 5 4 , is often much better than the crude use of Bayes’ theorem, i.e., Eq. (6.3). I n order to estimate the second term on the right of Eq. (5.1), a very large sample would usually be required, and this is why i t is necessary to make approximations. Successively better approximations can presumably be obtained by truncating the following series: log P(Dj I
.
W1,W,, * * *
.
9
W,)
..
(r, s, t , . . = 1, 2, . . ,rn; r < s < t < . ), where the 1 ’ s are “interactions of the first kind” as defined in the Appendix. If, for example, we were to truncate the series after the interactions of the second order (the Ja’s),we would obtain a special case of the quadratic discriminant
which, with optimal coefficients, would of course give a better approximation. (An example of the relevance of the quadratic terms, in analogous problems in learning machines, is in the evaluation of 49
IRVING JOHN GOOD
material advantage in chess: the advantage of two bishops [33].) If we truncate Eq. (6.6)dter the third-order interactions, we of course obtain a special case of a cubic discriminant, and so on. An interesting class of problems arises when we ask: What are the optimal linear, quadratic, cubic, etc., discriminants, and how do we set about finding them? There is some discussion of this problem in [I91 and i n [85]. Here we merely make the obvious comment that, if the number of words is large, the number of coefficients increases rapidly with the degree, and optimization problems might be exceedingly difficult even for the cubic discriminant. Even without optimization, the work of estimating the interactions I(Dj, W,,W,, W,)would be enormous. It will be suggested, in Section 6, that the subassembly theory of the brain is capable of explaining, or at least of explaining away, how the brain can in effect embody these higher interactions as association bonds between sets of subassemblies. But in the present seotion we shall not consider biological processes. Let us consider how, in principle, the vaious terms in Eq. (6.6)could be obtained. We should begin by taking 8 sample of 8ucoe.98fuZ library applicatim, each being of the form ( W1,Wg,, , W,; D,), meaning that the index terms W1,Wg, ,, W, were used by the customer, and he was satisfied with the document D,. If on a single oocasion he was satisfied by more than one document, then, in this notation, that occasion would correspond to more than one successful library application. It should be remembered that we are assuming that all customers speak the same language. This msumption is in flagrant contradiction with the facts of life, but we assume i t as an approximation in order to avoid complication. It should be noted that a sample of the kind mentioned here would form a useful part of any practical operational research on document retrieval. We can now imagine the raw statistical data to be entered in a contingency table in w 1 dimensions, where w is the number of index terms in use (the “size of the vocabulary”); w of the sides of the contingency table would be of length 2, whereas the remaining side would be of length d, the number of doouments. It might be suggested that the way to enter the data in the table would be to regard each successful library application as a long vector
..
. .
+
where el is 1 or 0 depending on whether the ith index term in a dictionary of index terms is one of the index terms, Wl, , , W, that was used in the application; and so to put a tick in the cell (6.8) of the contingency
..
table. This method of constructing the contingency table would be very 50
THE FIRST ULTRAINTELLIGENT MACHINE
misleading, since there is a world of difference between missing out an index term and explicitly saying that the term is irrelevant to the sought document. This method of construction would be appropriate only if the entire vocabulary of index terms were presented to the customer to be used as a yes-no tick-off list. As R. A. Fairthorne has often pointed out, the idea of negation is not usually a natural one in indexing. Instead, the above “successful library application” is better regarded as contributing to the “marginal total,” denoted [47] by ~
~
~
I
,
.
.
I
I
I
~
~
~
..... I
.
.
.. ~I t ~ t lI I~ l I ~ ~. . ~ . ~ ~ t j
The meaning of this notation is this. Let nc,e,.. . c w j
be the hypothetical entry in the contingency table in the cell given by (5.8); “hypothetical” since tick-off lists are not in fact used, I n the above notation, each of the l’s,of which there are m, corresponds to the specification of an index term, and the acute accents indicate summations of n,, . ..r,) . over all the ei’s that do not correspond to one of these m terms. After a large amount of sampling, one would have good estimates for the values of many of the marginal totals of the “population contingency table,” that is, the (w 1)-dimensional array of population probabilities. The “principle of maximum entropy” [47, 56, 481 could then be used in principle for estimating all the 2”d probabilities. The amount of calculation would be liable to be prohibitive, even if i t were ultraparallel, although it might be practicable in analogous small problems such as the recognition of printed characters or phonemes. It should be possible in principle to cut down the size of both the sample and the calculation by making use of the theory of clumps (“botryology”) or of clusters. One of the benefits of such a theory would be that, by lumping together words into clumps, the dimensionality of the contingency table would be reduced. The theory of clumps is still in its infancy (see for example [30, 35, 31, 41, 78, 75, 76]),and is necessarily as much an experimental science as a theory: this is why we prefer to call it “botryology.” Research workers who use the term “cluster” rather than “clump” (see for example [77, 97, 81, 93, 78~~12) seem to be concerned with the grouping of points that lie in a Euclidean space, and their methods tend to be fairly orthodox from the point of view of statistical methodology. In botryology the methods tend to be less orthodox, and it is sometimes
+
*The Editor mentions [78a],which I have not men.
51
IRVING JOHN GOOD
actually desirable that the clumps should overlap, both in applications to information retrieval and in potential application to neural nets. Nevertheless the two theories might be expected eventually to merge together. Let us consider how botryology might be applied for finding clumps of aasociated index terms and “conjugate” clumps of documents. (The method could also be applied to the categorization of diseases, by replacing the index terms by symptoms and the documents by people.) Let there be w index terms, and d documents. Let fjj be the frequency with which index term i occurs in document j , and consider the w by d matrix
F
=
(f,)
Various botryological computations with F have been suggested in the references: the one proposed here is closest to that of Needham [76], who, however, wm concerned with a square symmetric matrix of frequencies of co-occurrence of index terms, and who did not use logarithms or “balanoing,” rn described below. First replace the matrix P by the matrix @og(fij k)],where k is a small oonstant (less than unity). A reason for using the logarithm is that we are proposing to use additive methods and a sum of log-frequencies is a log-likelihood. The addition of the small oonstant k to the frequencies is necessary to prevent zeros from going to minus infinity, and can be roughly justified for other rewona (see for example [58], [25], p. 241, or [48]).This modified matrix is now “balanced” in the following sense. By balancing an arbitrary matrix we mean adding ui bj to cell (i,j ) (i,j = 1 , 2 , .) in such a manner that each row and each column adds up to zero. It is eaay to show that the balanced matrix is unique, and that the balancing conatants can be found by first selecting the q ’ s to make the rows add up to zero, and then selecting the 4’s to make the columns add up to zero. The column balancing does not upset the row balancing. For a symmetric matrix the row-balancing constants are equal to the column-balancing constants. I n what follows, instead of balancing the matrix i t might be adequate to subtract the mean of all the entries from each of them. Let B be the result of balancing the matrix [log(& k)]. Consider the bilinear form b = x’By, where x is a column vector consisting of +l’s and -1’8, and the prime denotes transposition. We now look for local maxima of b in the obvious manner of b t fixing x,perhaps randomly, and finding y to maximize b (i.e., taking y = sgn Ex), and then fixing y and finding x to maximize b (i.e., taking x = sgn By), and so on iteratively. The process t e d a t e s when the bilinear form takes the same value twice running. The process would lead to the separation of 52
+
+
..
+
THE FIRST ULTRAINTELLIGENT MACHINE
the words into two classes or large clumps, and two conjugate clumps of documents. Consider one of the two smaller matrices obtained by extracting the rows and columns from B, corresponding to a clump and its conjugate. Balance this smaller matrix, and find a local maximum of its bilinear form. This procedure will split our clump into two smaller clumps, and will simultaneously split the conjugate clump, In this manner we can continue t o dichotomize our clumps until they are of approximately any desired size. The whole collection of clumps would form a tree. Actually, it is desirable that clumps should overlap in some applications to information retrieval, and this can be achieved by means of a slight modification of the above procedure, in which the “large” clumps are made larger still. That is, in place of taking all the +l’s in w as a clump, one could take all the components in B’z algebraically greater than some negative threshold; and, in the conjugate clump, all the components in By above some negative threshold. The effect of this botryological procedure is to induce a partially ordered structure each of whose elements is a clump of index terms together with its conjugate clump of documents. Having obtained the partially ordered set of clumps, one could apply the methods described in [at?],which, however, have not been completely worked out, in order to make estimates of I ( i ,j) when fi, is too small for the estimate log& - logf,, - logffj to be usable (for example when f.. v = 0 or 1). (We have writtent,, and!,, for the total frequencies of Wi and D,.) Hopefully, the higher-order mutual information (interaction) I(Wl,Wa, , W,,, I D,) could be estimated in a similar manner. Another conceivable method for associating documents with index terms would be in terms of the eigenvectors of B B and of B’B, where the primes still indicate transposition. By a theorem of Sylvester, the eigenvalues of B‘B are the same as those of BB’, together with d - w zeroes, if d 2 w. We can use the nonzero eigenvalues in order to pair off the two sets of eigenvectors, and we could order each of the two sets of eigenvectors in the order of the magnitudes of the eigenvalues. Then we could associate with the ith index term the ith component of the normalized eigenvectors of BB‘, and with the j t h document the j t h component of the Corresponding w eigenvectors of B’B. This would amociate a w-dimensional vector with each index term and with each document. The relevance of index term i to document j could now be defined as the correlation coefficient between the two associated vectors. An approximate relationship between relevance and mutual information aould then be found experimentally, and we could then apply Eq. (6.1) for document retrieval. The amount of calculation required for the 53
...
IRVING JOHN GOOD
application of this method would be exceedingly great, whereas the clumping algorithm mentioned before could perhaps be carried out on a computer of the next generation. 6. Cell Assemblies and Subassemblies
Suppose that one wishes to simulate psychological association and recall on a machine. We restrict our attention to the recall of one word when rn other WOTdB are pwented, but most of the discussion can be adapted, in an obvious manner, to the recall of a concept given various attributes, or to the retrieval of a document, given various index terms. The discussion could be modified in order to cover the cme when the words are presented serially and form a Markov chain, this being a well-known approximate model for the prediction of words in a language text (of. [St?]).For the sake of simplicity, we shall ignore problems of syntax, so that our discussion will be in this respect more pertinent to methods of information retrieval based only on index terms than to the full problem of recall. This limited problem is difficult enough for the present, and is I think a necessary preliminary to any more ambitious discussion of recall in general. If there are w words in the vocabulary, there are potentially w(w - 1)/2 associations of various strengths between pairs of words. ( K i d of association are here being ignored.) The process of recall, in this example, is that of selecting the word, A, that is in some sense most associated with the m words A,, A,, . . , A, which have been recently inserted at the input of the computer. I n the usual problem of information retrieval A would be a document and A,, A,, . , A, would be index terms, and the discussion of the previous section is all relevant to the present problem. The difficulty of making the probability estimates [48] provides some of the explanation of why men are not entirely rational in their probability estimates and in their recall. It is possible, for example, that, for men, the probability of retrieval of a word is approximated by only a few terms of Eq. (5.6) of the previous section. An ultraintelligent machine might be able to use more terms of the equation, since it might be able to speed up the calculations by invoking the electronic computer with which it would be in close relationship (of. [33]). Russell and Uttley [I021 suggested that a time delay might be the neural embodiment of the amount of information in a proposition, I ( H ) = -logP(H), and that this would make conditional probabilities eaaily embodiable, since the difference between two time delays is itself a time delay. As point out in [38], this idea extends at once to mutual information, log-odds, weights of evidence, and tendency to 54
.
..
THE FIRST ULTRAINTELLIGENT MACHINE
cause. But of course time delay is only one of many possible physical representations of a real variable, and others could be suggested in terms of synaptic facilitation. In view of the complexity of the brain, it is quite probable that more than one representation is used, and this would give greater scope for adaptability. One must not be overready to apply Ockham’s lobotomy. As in other complex systems, many theories can contain elements of the truth. Economists are familiar with this principle. We return now to our problem of information retrieval. Suppose that w = 30,000 and that some acceptable method were found for est,imating the mutual information between each pair of the 30,000 words. Then it will still be hardly practicable to list the 450 million answer8 in immediately accessible form in a machine that is not ultraparallel. Instead it would be necessary to put the words that have appreciable association with a given word, A, into a list of memory locations, called shy the A list. Each word in each list must have the strength of the association (the logarithm of the association factor) tagged to it. Many of the lists would be very long. The process of recall involves the collation of the words in the lists corresponding to recent input words, together with some further arithmetic. Collation is a slow process, and it is tempting to ask whether i t would be more economical to simulate the process of recall by means of an artificial neural network, or at any rate by means of ultraparallelism. The use of artificial associative memories is a step in this direction, but so far only a small one (for example [60, 651). For purposes of information retrieval, which in effect is what we are discussing, it might be worth while to design computers that are not ultraparallel but have extremely rapid collation as a special feature. Such computers would be very useful for information retrieval by means of index terms, but when the words are strongly interdependent statistically, as in ordinary language, a machine using artificial neural nets seems intuitively to hold out more promise of flexibility. (See also the discussions of “higher-order interactions” later in this section). If each word were represented by an artificial neuron, or otherwise highly localized, it would take too long to set up the associations, unless there were w(w - 1) association fibers built in, and this would be very expensive in equipment. Moreover, it is not easy to see how more than a small fraction of such a machine could be in operation at any one time, so that there would be a great wastage of potential computation power. For these reasons, a machine with “distributed memory” seems more promising. As Eccles says ([16],p. 266), “Lashley argues convincingly that millions of neurons are involved in any memory recall, that any memory trace or engram has multiple representation; that each neuron 55
IRVING JOHN G O O D
or even each synaptic joint is built into many engrams” [61].A further relevant quotation, from [34], is: “An interesting analogy is with the method of superimposed coding, of which Zatocoding is an example. This is a method of coding of information for information-retrieval purposes. Suppose we wish to identify a document by means of m index terms. Each term is represented by means of v punched holes in a card containing N locations each of which can be punched or not punched. [For each of the index terms] we may select v locations out of the N at random [to punch]. The representation of the joint occurrence of rn index terms is then simply the Boolean sum of the m individual punchings of v locations each. In the application to information retrieval if we extract all the oards punched in the v locations corresponding to any given term, we may get some cards that are irrelevant by chance. If N is large, and v is suitably selected, mistakes need seldom occur. In fact it is natural to arrange that
...
i.e.,
(1
v
-v/N)mk:
k:
g
(1 - 2-1””)N
This must be the best value of v since to have half the holes punched gives the largest variety of possible punchings. “By analogy, Nature’s most economical usage of the brain would be for a reasonable proportion of it to be in operation at any one time, rather than having one concept, one neuron.” Instead, each neuron would occur in a great many distinct circuits, and would not be indispensable for any of them. Such an analogy can at best give only a very rough idea of what goes on in the brain, which is an ultradynamic system as contrasted with a collection of punched cards. (The analogy would seem a little better if, instead of taking the Boolean sum, a threshold were used at each location.) But if we take m = 20, on the grounds that the game of “twenty questions” is a reasonably fair game, we find that the representation of a word occupies say a thirtieth of the neurons in the corhx. It must be emphasized that this is not much better than a guess, partly bemuse it is based on a very crude optimality principle. But it is not contradicted by the experiments of Penfield and others (for example [80],p. 117) who found that the electrical stimulation of a small area on the surfwe of the cortex could inhibit the recall of a fraction of the subject’s vocabulary. (For further references, see Zangwill [108].)For it is entirely possible that a large subnetwork of neurons could be inhibited, and perhaps even sparked off, by stimulation at special points. Among the theories of distributed memory, the ‘‘cell msembly” theory is prominent, and, aa stated in the previous section, a modified 56
THE FIRST ULTRAINTELLIGENT MACHINE
form of this theory will be adopted here. The meaning and point of the theory can be explained in terms of its applications to the linguistic activities of the brain, although the theory is usually discussed in a more general context. There are some advantages in discussing a special case, and some generalizations will be obvious enough. A cell assembly is assumed to consist of a great number of neurons, which can all be active at least once within the same interval of about a quarter to half a second. For simplicity we shall generally take the half-second estimate for granted. An aasembly reverberates approximately as a unit, and, while reverberating, it tends to inhibit the remainder of the cortex, not neuron by neuron, but enough so that no other assembly can be very active during the same time interval. A word, or a familiar phrase, is often represented by an assembly, and, more generally, an assembly usually corresponds in Hebb’s words, to a “single element of consciousness.” But the consciousness might habituate to assemblies that occur very frequently. It will be assumed in this paper that there are also subassemblies that can be active w i t h t dominating the whole cortex, and also that when an assembly becomes fatigued and breaks up i t leaves several of its own subassemblies active for various lengths of time, from a second to several minutes, and typically about ten seconds. Each subassembly would consist of a smaller group of neurons than an assembly, but with greater relative interconnectivity. The subassemblies might in their turn break up into still smaller groups of still greater relative interoonnectivity and of greater “half-lives.” These could be called subsubassemblies, etc., but we shall usually use the term “subassembly” generically to include subsubassemblies, etc. When an wsembly gains dominance for a moment it is approximately completely active, when the subject is wide awake. The process is assumed to be one of approximate regeneration. It is not exact regeneration for if it were there would be no learning. Probabilistic regeneration might often be represented by the degree of activity of an assembly. This degree of activity will be oarried forward by the subassemblies, so that the benefits of probabilistic regeneration, as described in a previous section, will be available. Also the activity is less, and the assembly is somewhat smaller, when the subject is sleepy or dreaming, but the activity is assumed to be nearly always enough for the assembly to have a definite identity, except perhaps in dreamless sleep. When the subject is nearly asleep, there might be frequent intervals of time when there is no active amembly. The association between two assemblies could be largely embodied in the subassemblies that they have in common. When a man is in a sleepy condition, an assembly need not be followed 57
IRVING JOHN GOOD
by mother aonsciousness-provokingassembly for a short time. In that owe, the assembly A might recover from fatigue and be reactivated by the subassemblies that it itself had left in its wake when it last fired. This would account for the occasional repetitivity of thought when one is sleepy. The hypothesis is not that the assembly reverberates for longer than usual, but that it is liable to reactivate because there has not been enough activity to disperse its subassemblies. The subassemblies themselves, both in sleepiness and in dreams, have lower activity than in wakefulness, so that, when one wakes up, the memory and atmosphere of dreams would be easily erased. When dreaming there is perhaps not enough energy in the cortex to sustain many full assemblies so that the subassemblies would be less inhibited than in wakefulness. It might well be that there are far more subassemblies active during sleep, and they would form arrangements having less logical cohesion and higher entropy. This would explain the remarkable rate at which visual information can be internally generated during dreams; and the incomplete regeneration of full assemblies would explain the non sequitur and imaginative nature of dreams. In the same spirit, if assemblies correspond to conscious thoughts, it might well be that subassemblies c o r r e a e to unconaciow and Mpecially to prewnscious thoughts, in the wakeful state m well a8 in sleep. What gives the assemblies their semipermanent static structures, corresponding to long-term memory, is assumed, following Hebb, to be the pattern of strengths of synaptic joints throughout the cortex. The physical counterpart of learning is the variation of these strengths. We have already conjectured that the number of possible states of any synaptic joint is small enough to justify calling the strength a “discrete variable.” This assumption makes it easier to understand how memories can be retained for long periods, and how the identities of assemblies can be preserved. We wsume that the strength of a synapse, when not in use, occasionally mutates in the direction of some standard value. This mechanism would explain the gradual erosion of memories that have not been recalled, and would also help to prevent all synapses from reaching equal maximal strength, which would of course be disastrous. Equally, the increase in strength of a synapse when its activation leads to the firing of a neuron can rewonably be assumed to be a mutation and only probabilistic. The number of synapses is so large that it might well be sufficient for only a small fraction of them to mutate when they oontribute to the firing of a neuron. This hypothesis would also help to explain why all 8ynwpee8 do not reach d m u m strength. Even when an assembly sequence is frequently recalled, some of the strengths of the relevant synapses would nevertheless have mutated 58
THE FIRST ULTRAINTELLIGENT MACHINE
downwards, so that some of the many weaker subassemblies involved in the assembly sequence would have become detached from the structure of the assembly sequence. Thus the structure of a frequently used assembly sequence, used only for recall and not for building into fantasy or fiction, would tend to become simplified. I n other words, detail would be lost even though what is left might be deeply etched. Thus the corresponding memory would tend to become stereotyped, even in respeot of embellishments made to it after the first recording. It is interesting to consider what enables us to judge the time elapsed since a memory was fist inscribed. Elapsed time seems introspectively to be recorded with roughly logarithmic accuracy: the least discernible difference of a backward time estimate is perhaps roughly proportional to the time elapsed, not allowing for the “cogency” of the recall, that is, not allowing for the interconnections and cross-checks in the recall. This conjecture, which is analogous to the Weber-Fechner law, could be tested experimentally. An aging memory suffers from a gradual loss of “unimportanty’detail. If, on the other hand, we recall an item repeatedly, we preserve more of the detail than otherwise, but we also overlay the memory with additional associations to assemblies high up in the hierarchy. We can distinguish between “reality” and imagination because a memory of a real event is strongly connected to the immediate l ~ - o r d e sensory r and motor assemblies. As a memory ages i t begins to resemble imagination more and more, and the memories of our childhood are liable to resemble those of a work of fiction. One of the advantages that an ultraintelligent machine would have over most men, with the possible exception of millionaires, would be that it could record all its experiences in detail, on photographic film or otherwise, together with an accurate time-track. This film would then be available in addition to any brain-like recordings. Perfect recall would be possible without hypnotism! As pointed out by Rosenblatt ([82],p. 55), a permanent lowering of neural firing thresholds would be liable to lead to all thresholds becoming minimal, unless there were a “recovery mechanism.’’ He therefore prefers the more popular theory of synaptic facilitation, which we are using here [15,54].Although there are far more synapses than neurons, a similar objection can be raised against this theory, namely, too many synapses might reach maximal facilitation, especially if we assume a cell assembly theory. This is why we have assumed a mutation theory for synaptic strengths. I n fact, we assume both that a synapse, when not in use, mutates downwards, with some probability, and also, that when it has just been used, i t mutates upwards, with some probability. The higher the strength at any time, the greater the probability of mutating downwards when not used, and the smaller the probability of mutating 59
IRVING JOHN GOOD
upwclrde when used. It is neither neoessary nor desirable that every synapse should increase its strength whenever it is used. The enormous number of neurons in an assembly make it unnecessary, and the fiequent uses of the synapses make it undesirable. After a oertctin number of urn, an rtssembly does not need any further strengthening. A sentenoe lasting ten seoonds would oorrespond to an meembly eequence of about twenty assemblies. Hebb, ([54],p. 143) says that the appmnt duration of a “oonoeptual prooess” in man is from one to five or ten seoonds. The expression “oonoeptual process” is of course vague, and the disoussion is here made somewhat more oonorete by fiaming it in terms of linguistio aotivity. A phoneme, when it is part of a word, perhaps oorreaponds fo a subassembly, and there will be many other subassemblies oorresponding to other properties of the word, but only a fraction of these will remain aotive when the assembly breaks up. Whioh assembly beoomes aotive at the next moment must depend on the ourrent sensory input, the ourrent dominant assembly, and the ourrently aotive subaaeemblies. Indireotly, therefore, it depends on the recent aaembly eequence, wherein the most recent wsemblies will have the greatest influence. It also depends of o o w e on the semipermanent statio storwe, the “past history.” Well-formed assemblieswill tend to be aotivated by a small fraotion of their subassemblies; this is why it is posaible to read fast with praotioe: it is not neo888&Lyto observe all the print, Memory abbreviates. An example that shows how the aotivation of an assembly oan depend on the previous assembly sequenoe is the recall of a long sequenoe of digits, suoh as those of T . A. C.Aitken and Tom Lehrer, for example, oan repeat several hundred digits of T oorreotly. If we assume that there is one assembly for eaoh of the ten digits 0, 1, , .,9, then it is oleax that the next wsembly to be mtivated must depend on more than just the previously aotive assembly. If there is no hexanome (sequenoe of six digits) that is repeated in the firat 600 digits of T , then one method of remembering the 600 digits in order is to memorize a funotion of hexrtnomes to mononomes. Then any six oonseoutive digits would uniquely determine the next digit in this pieoe of T ; for example, the digit 6 is determined by the hexanome 416926. Let us oonsider how the subassembly theory would aooount for this. For the sake of argument, we shall ignore the strong possibility that a oaloulatingprodigy has an assembly for say =oh of the hundred distinot dinornee, and continue to msume one amembly for eaoh of the ten digits. (The argument oould be modified to allow for other possibilities.) We take it for granted that the subjeot (Aitken)is in the psyohologioal “set” oorrespondingto the reoitation of the digits of T . Suppose that the aemmbly oorresponding to the digit i haa subaaeemblies e(i, l), a(&,2)) 60
.
THE FIRST ULTRAINTELLIGENT MACHINE
...
, and that these symbols correspond to subassemblies of successively shorter “half-lives.” Then, provided that the digits are recited by Aitken at a constant rate, one set of subassemblies that would activate the assembly corresponding to 6 would be of the form 4 4 , l),s(4,2), . . ., 4 4 , n,,,);. . . ; 4 6 , l), 4 6 , 2), . , 4 6 , R , , ~ ) where , s(i, is the next subassembly (belonging to assembly i) to become extinguished after j “moments of time.” If at least one subassembly of each assembly is extinguished at each moment within the first six moments after the assembly is extinguished, then this theory could account for the possibility of the recitation. For, a t any given moment, the active subassemblies would uniquely determine the next assembly to be activated. If the recitation were slowed down by a moderate factor, then there would still be enough clues for the unique determination of the successive digits. I n fact a knowledge of the maximum slow-down factor would give quantitative information concerning the numbers and durations of activation of the subassemblies. There is an analogy between cell assemblies and the gel that can form in a polymerization reaction. (See Flory [I71 for a comprehensive discussion of polymerization, or [45] for a short self-contained description of some mathematical theory that might also be relevant to cell aaemblies.) The gel is often regarded as a molecule of infinite size, but there can be other largish molecules present simultaneously, analogous to the subassemblies. Polymerization is not as dynamic as cerebral activity, so the analogy is imperfect, but i t is instructive since it shows the plausibility of subassemblies. A theory that does some of the work of the subassembly theory is the theory of “primed neurons” ([MI, p. 606 and [71]).We quote (from the former reference): “After an assembly has just been extinguished, many of its neurons will have received subthreshold activation without having fired. Milner calls them ‘primed neurons’. . . A primed neuron may be regarded as the opposite of a refractory one. Therefore, in virtue of ‘temporal summation’ for neurons, parts of a recently extinguished assembly will be primed, so that it will be easily reactivated during the next few seconds. This is an explanation of shortterm memory different from that of reverberatory circuits; but an activated assembly must itself reverberate. Milner assumes that the effect of priming dies away after a few seconds. But I think it would be useful to assume that the time constant can vary greatly from neuron to neuron since this may help to explain our sense of duration, and also medium-term memory. Here, as elsewhere, other explanations are possible, such as the gradual extinction of small reverberating circuits within assembles.” (The last remark is a reference to subassemblies; see also [all.)
. .
.
61
IRVING JOHN GOOD
The subassembly theory seems to be a more natural tool than that of primed neurons, for the purpose of explaining the sequence of firing of assemblies although both might be features of the brain. One would expect subassemblies to exist, since the density of connectivity in an assembly would be expected to vary from place to place in the cortex. Subclumps of high connectivity in a network would be expected to reverberate longer than those of low connectivity. Although it could be argued that highly connected subclumps should become exhausted more quickly, it should be observed that the synapses in these subclumps will tend to be stronger than where the connectivity is low. It is therefore natural to assume that the subclumps correspond to subassemblies. It might turn out that the theory of primed neurons will be sufficient to explain the workings of the brain, without the assumption of subassemblies, but the latter theory gives the kind of discrete representation that fits in well with the notion of probabilistic regeneration. The theory of subassembliesis so natural for any large partly randomlooking communication network (such as that of a human society) that it tempts one to believe, with Ashby ([4],p. 229), that a very wide class of machines might exhibit intelligent behavior, provided that they have enough interconnectivity and dynamic states. Machines certainly need some design, but it is reasonable to suppose that money and complication can be traded for ingenuity in design. For example, a welldesigned machine of say lo8 components might be educable to ultraintelligence, but a much more carelessly designed machine of say 10l8 components might be equally good. That some design is necessary can be seen from one of the objections to the cell assembly theory as originally propounded by Hebb. Hebb did not originally assume that it was necessary to assume inhibition, and Milner pointed out that, without inhibition, the assemblies would fill the whole cortex. Ultimately there could be only one assembly. Either inhibition must be assumed to exist, as well as excitation, or else the assemblies would have to be microscopically small in comparison with the cortex. The latter assumption would be inconsistent with “distributed memory.” Milner accordingly assumed that neurons tend to inhibit those near them. Therefore one may picture an assembly as a kind of three-dimensional fishing net, where the holes correspond to inhibited neurons. The simplest model would assume that each fishing net (assembly) spans the entire cortex, or perhaps only the entire association cortex, or perhaps also other parts of the brain [67]. In future, mctinly for verbal simplicity, we w e th word “&ex’y uqwli$ed. There is a need for some mathematical theorems to show that a very large number of 62
THE FIRST ULTRAINTELLIGENT MACHINE
distinct assemblies could exist under reasonable assumptions for the parameters that describe connectivity. It is reasonable to conjecture that the thinness of the cortex is a relevant parameter, or rather the “topology” that is encouraged by the thinness. The dimensions of the cortex, if straightened out, would be about 60 cm by 60 cm by 2 mm ([go], pp. 32 and 34). It is possible that the assembly theory would become impossible if the cortex were much “thicker.” If we cannot treat the problem mathematically, perhaps we should experiment with an artificial neural net of neural dimensions approximately 60 x 10,000 x 10,000, but smaller-scale experiments would naturally be tried fist. There must surely be some advantage in having thin cortices, otherwise people would have thicker ones. It seems unlikely that the brain contains many useless residuals of evolutionary history. Hence the anatomy of the brain is very relevant to the design of the first ultraintelligent machine, but the designer has to guess which features have important operational functions, and which have merely biochemical functions. Since it is not known what values of the parameters are required for the intelligent operation of a neural net, it is possible only to guess which features of the cortex are most relevant for the design of an ultraintelligent machine. The feature of a good short-term memory (“attention span”), of the order of 207, where T is the active time of a single assembly, is certainly essential for intelligence. (In a machine T need not be approximately half a second.) It might even be possible to improve on the performance of a brain by making the average duration of the sequence somewhat greater than 207. But there must be a limit to the useful average duration, for a given cost in equipment. This limit might be determined by the fact that the longer an assembly sequence the smaller must be the average size of the assemblies; but is more likely to be determined by the fact that the complexity of concepts can be roughly measured by the durations of the assembly sequences, and beyond a certain level of complexity the brain would not be large enough to handle the relationships between the concepts. (In a more precise discussion the duration would be interpreted as a kind of “half-life.”) When guessing what biological features are most relevant to the construction of an ultraintelligent machine, it is necessary to allow for the body as a whole, and not just the brain: an ultraintelligent machine would need also an input (sensorium) and an output (motorium). Since much of the education .of the first ultraintelligent machine would be performed by a human being, it would be advisable for the input and output to be intuitively tangible. For example, the input might contain a visual and a tactual field and the output might control artificial limbs. In short the machine could be something of a robot. The sensorium and motorium might be connected topographically to parts of 63
IRVING JOHN GOOD
the two surfaces of the disk that represents the cortex. Many other decisions would have to be made concerning the design, even before any really useful experiments could be performed. These decisions would concern qualitative details of structure and also the values of quantitative parameters. The need for further theory is great, since, without advances in theory, the amount of experimentation might be prohibitive. Even if the values of the parameters in the cerebral cortex were known [9O], theory would be required in order to decide how to scale them to a model with fewer components. A very tentative example of some quantitative theory is given near the end of the present section. It has been argued [79] that the cortex seems to be under the control of a more centrally placed subcortical region, partly in the diencephalon, “not in the new brain but in the old” ([SO], p. 21).9 Penfield calls the partly hypothetical controlling region the “centrencephalic system. ” It seems that consciousness is likely to be associated with this system. A natural inference of the hypothesis that consciousness is associated with the old brain is that the lower animals have consciousness, and can experience “real metaphysical pain,” an inference natural to common sense but disliked by some experimentalists for obvious reasons: they therefore might call i t meaningless. Sometimes Penfield’s theory is considered to be inconsistent with Hebb’s, but in the present writer’s opinion, the a88embly theory is mads easier to accept by d i n i n g it with this hypthesie of a central wntrol. For the following mechanism suggests itself. The greater the amount of activity in the cortex, the greater the number of inhibitory pulses sent to all currently inactive parts of the cortex by the centrencephalic system. This negative feedback mechanism would prevent an assembly from firing the whole cortex, and would also tend to make all assemblies of the same order of size, for a given state of wakefulness of the centrencephalic system. This in its turn would be largely determined by the condition of the human body as a whole. This “assembly theory, MARK 111,”as we may call it (taking a leaf out of Milner [71]), has two merits. First, it would allow a vastly greater class of patterns of activity to assemblies: they would not all have to have the pattern of a three-dimensional fishing net, filling the cortex. This makes it much easier to accept the possibility that a vast variety of assemblies can exist in one brain, as is of course necessary if the awembly theory is to be acceptable. A second, and lesser, merit of the modified theory is that a single mechanism oan explain both the control of the “cerebral atomic reactor” and degrees of wakefulness, and perhaps of psychological “set” also. Finally, the theory will shortly be seen ‘ZengWill givea earlier referentma in hie intemtiug survey [108].
64
THE FIRST ULTRAINTELLIGENT MACHINE
to fit in well with a semiquantitative theory of causal interactions between assemblies. It is proposed therefore that our artificial neural net should be umbrella-shaped, with the spikes filling a cone. During wakefulness, most assemblies will have a very complicated structure, but, during dreamless sleep, the centrencephalic system will become almost exclusively responsible, directly and indirectly, for the eotivity in the cortex, taking for granted of course the long-term or “static” structure-of the cortex. The input from the cortex to the centrencephalic system will, as it were, be “reflected back” to the cortex. The assumption is that the excitation put out by the centrencephalic system has the function of encouraging cortical activity when it is low, and discouraging it when it is high. Under a wide class of more detailed models, the amount of activity will then have approximately simple harmonic amplitude when other input into the cortex is negligible. Since we are assuming that the duration of a cell assembly is about half second, following Hebb, it is to be expected that the period o i this simple harmonic motion will also be about half a second. This would explain the delta rhythm ([103], p. 167) which occurs during sleep. Apparently, very rhythmic assemblies do not correspond to conscious thought. To some extent this applies to all assemblies that are very hequently used. Consciousness is probably a t its height when assemblies grow. In order to explain the alpha rhythm, of about five cycles per second, when the eyes are closed and the visual imagination is inactive, along similar lines, we could assume that “visual assemblies” have a duration of only about a fifth of a second. This would be understandable on the assumption that they are on the whole restricted to the visual oortex, i.e., to a smaller region than most other assemblies (of. Adrian and Matthews [2]). We have assumed that, when no assembly is active, the centrencephalic system encourages cortical activity, 60 that, at such times, the various current active subaesemblies will become more active. This process will continue until the activity reaches a critical level, a t which moment the neurons not already active are on the whole inhibited by those that are active, including those in the centrencephalic system. Thia is the moment at which, by definition, an assembly has begun to fire. If this happens to be a new msembly, then the interfacilitation between its subassemblies Will establish it as &nassembly belonging to the repertoire of the cortex. This will happen whenever we learn something new or when we create a new concept. The newborn child has certain built-in tendencies, such as the exercise of its vooal organs. We i3ssume that there me pleaeure centers in the 65
IRVING JOHN GOOD
brain, whose function is reinforcement, and that they are usually activated when there is a “match” between a sound recently heard and one generated by the vocal organs. The matching could be done by a correlation mechanism, which in any case is apparently required in order to recognize the direction of a sound. E. C. Cherry [I31 points out the need for this, and also the possibility of its more general application (see also [57, 631). Also the child is rewarded by attention from its parents when it pronounces new phonemes for the &st time. Thus one would expect assembliesto form, corresponding to the simplest correctly pronounced phonemes. The phonemes in agricultural communities might be expected to be influenced by the farm animals. Assemblies corresponding to syllables and short words would form next, apart from the words that were negatively reinforced. Each assembly representing a word would share subassemblies with the assemblies that represent its phonemes. An assembly for a word would also have subassemblies shared with nonlinguistic assemblies, such as those representing the taste of milk, and, more generally, representing experiences of the senses, especially at the nine apertures, where the density of neurons is high for evolutionary reasons. And so, gradually, the largely hierarchical structure of assemblies would be formed, the lowest levels being mostly closely connected with the motorium and also with the sensorium, especially where the surface neural density is high. It is interesting to speculate concerning the nature of the associations between oell assemblies. We shall suppose that there is some measure of the strength of the wsociation from one cell assembly, P,t o another one, A, or from an assembly sequence P to the assembly A. Assliming the subassembly theory, this association will be largely embodied in the strength of the association to A from the subassemblies left behind by 4, and will depend on the degrees of activation of the subassemblies and on the current psychological “set.” A few distinct but related formulas suggest themselves, and will now be considered. I n these formulas we shall take for granted the degrees of activation and the psychological set, and shall omit them from the notation. The first suggestion is that the strength of the association from P to A should be measured by I ( A : P),as in the discussion of information retrieval in Section 6. If P is the assembly sequence Al, As, , , A,, and if these assemblies supply statistically independent information, we have, by Eq. (6.3):
. .
log P ( A I A, A , *
..
* *
A,)
+ crn I ( A :A,)
= log P ( A )
r-1
It could then be suggested that the term log P ( A )is represented by the 66
THE FIRST ULTRAINTELLIGENT MACHINE
strength of the connectivity from the centrencephalic system to A. Actually i t is unlikely that the assemblies will supply statistically independent information, and it will be necessary to assume that there are interaction terms as in Eq. (5.6). We would then have an explanation of why the next assembly that fires, following an assembly sequence, is often the one that ought to have the largest probability of firing in a rational man. More precisely, the terms I ( A : A,) corresponding to the most recently active assemblies will be represented with larger weights. Consequently, when we wish to recall a memory, it pays to hold in mind all the best clues without the intervention of less powerful clues. An objection to the above suggestion is that i t is necessary to add a constant to log P ( A )to make it positive, and then the neurophysiologica1 “calculation” of the strength of the association from the centrencephalic system would be ill-conditioned. Accordingly we now consider an0ther suggestion. One of the distinctions between the action of the brain and documentretrieval systems is that the brain action is considerably more dynamic. The activity of the assemblies constitutes an exceedingly complicated causal network. It is natural to consider whether the causal calculus [39] might be applicable to it. Reference [39] contains two immediately relevant formulas, namely,
the tendency of P to cause E ( P denotes “not P”),also described as “the weight of evidence against P if E does not occur”; and
the “intrinsic” tendency of F to cause E. I n both formulas, the laws of nature, and the state of the world immediately before the occurrence of P,are taken for granted and omitted from the notation. Like the mutual information, both Q and K have the additive property
+ &(E : B I P) + K ( E : Q [ P) &(E : P G) = &(E : P)+ &(E : Q) K ( E : P - G) = K ( E : F)+ K ( E : Q)
&(E : P * G) = &(E : P) K ( E : P * G) = K ( E : P) Moreover
*
when P and G are “independent oauses” of E. This means that F and G are statistically independent, and are also statistically independent 67
IRVING JOHN GOOD
given not 1.This definition of independent causes, extracted from [39], was seen to be a natural one by the consideration of a firing squad: 1 is the event that the victim is shot, P and U are the events of shooting by two marksmen; and part of the given information, taken for granted and omitted from the notation, is that the sergeant at arms gives the order to fire. We now take P m the firing of an assembly or membly sequence, also denoted by F,and we take 1as the firing of the assembly A. The suggestion is that Q or K is a reasonable measure of the strength of the association from F to A. We then have additivity in so fax as the components of P, assemblies or subassemblies, have independent tendencies to cause A to fire. Otherwise various intermtion terms can be added, and can be expressed in various ways, for example,
K ( l :P
a) = K ( B : P) + K ( B : U) + I ( P : U) - I ( P : U I 4)
The “causal force,” K(B : F),tends to activate A, but the assembly that is activated will not be the one that maximizes R(B : F),but rather the one that maximizes P(B I F).This can be achieved by assuming that the oentrenctephdic system applies a “force” -log[l - P(B)]. [This will always be well approximated simply by P(B).]The resultant force will be -log[l - P(B I P)]and increaaes with P(B I P) as it should. We see that K(B : P) appears to be more logical than Q(E : P) for our purpose, since it would be more difficult to see how the centrencephdic system could apply a “force” equal to -log [l - P(B I p ) ] to A. If there exists no B for which -log[l - P(l I P)]exoeeds some threshold, then a new assembly will be activated, or else the next thought that occurs will be very much of a non eeprcitur. It could be asked, what is the advantage of using K ( B : P) rather t h m -log[l - P(B I P)], as a memure of the strength of the association from P to A ? (In the latter cam the centrencephdic system would not need to make a contribution.) Two answers can be given: first that, if P(B I P) = P(B), then P should have no tendency to muse A to fire. Second, that, when P and U have independent tendencies to cause B, we can easily see that -@[I
- P(lI P a)] = -lOg[1 - P(B I P)] -lo@ - P(B I a)] + log[l - P(1)]
and consequently the strengths would not be additive. Hopefully, these memures of strengths of association between assemblies will help to suggest some quantitative neural mechanisms that oould be put to experimental test. 68
THE FIRST ULTRAINTELLIGENT MACHINE
In physical terms, the interaction between a pair of assemblies, A and B, will depend on the number and size of the subassemblies (inoluding the subsubassemblies) that they have in common. This set of subassemblies could be called the “intersection,” A.B. (A more complete notation would be A.B(T),where T is the time since B fired. The intersection decrertses to zero as T increases.) The second-order interaction between three assemblies, A, B, and C, will depend on the set of subassemblies common to all of them, A.B.C. If B and C have just been active, they will contribute a “force” tending to activate A, expressible in the form I A.B I I A.C I - I A.B.C I, where the moduli signs represent in some sense the current total strengths of the sets of subassemblies. The term I A.B.C I is subtracted in order that it should not be counted twice. More generally, as in the BooleP o i n c d theorem, the firing of an assembly sequence, A,, A,, , , . A, will have an “intrinsic” tendency to cause A to fire, measured (compare the Appendix) by
+
~
..
(Y < 8 < t < .). To this must be added a term depending on the ourrent “force” on A from the centrencephalic system, which will perhaps be a function only of the probability that A fires conditional only on past history and psychological c‘set.’yThe assembly, A, for which the total causal force is a maximum is the one most likely to fire, or, on a deterministic theory, the one that actually will fire. The formula can be interpreted in various ways, depending on whether we have in mind a theory of primed neurons, a theory of subassemblies, or a mixture of the two if we use the anti-Ockham principle for very complex systems. We shall now consider another semiquantitative aspect of the interaotion between assemblies. Suppose that A and B are two assemblies having no previous association, but that A happens to oocur before B, owing to the sequence of events at the sensorium. Suppose that each of the assemblies contains about a fraction a of the cortex (or of the association areas), where a might be, say, 1/30, although this is in large part a guess, as we said before. The neurons in common will constitute about aa of the cortex. The synapses connecting these will undergo a slight change of state, enoouraging interfacilitation. Thus the common neurons will have some tendency to include a set of subassemblies containing less than as of the oortex. It is not necessary to w u m e that the temporal order of A and B
69
IRVING JOHN GOOD
is also represented in the interfacilitation in order that a record be made of the temporal sequence of events, provided that we allow for assembly sequences consisting of more than two assemblies. When we recall some event having extension in time, we need to regenerate an assembly sequence. That this is possible is not surprising in view of the subassembly theory. For each assembly was originally fired by the subassemblies left behind by the previous assemblies of the sequence, so if we have succeeded in recalling most of these assemblies it is likely to be easy to recall the next one (since we shall have injected just about the right collection of subassemblies into our cortex), The subassemblies left in the wake of an assembly sequence A,, A,, . . . ,A, will tend to fire A,+,, not A,, that is, there will be little tendency to remember serial events in reverse time order. If assemblies A,, A,, , ,A,, having no previous association, happen to occur in sequence, where k is not more then about 20, then primitive subassemblies (or classes of subassemblies) (A,, A,), (A,, A*),. . ., (AE-,,A E )will form, and perhaps also some weaker subassemblies (A?,As),where T < B - 1. These will be at least analogous to the mutual informations I(& As), which, for nonperiodic Markov processes, do tend to be weaker and weaker, the larger is B - T . Similarly sets of subassemblies and perhaps subsubassemblies will form, corresponding to triples of assemblies, and analogous to the mutual informations I @ , , A,, At),and so on, for interactions of higher order. (Similar comments, both here and later, can be made if the strengths of association are defined in terms of K in place of I.)The set of subassemblies arising from the “intersection” of q assemblies of which none had been previously associated, could hardly occupy a proportion of the cortex larger than aq, so that, if a = 1/30,q could not be larger than log,, (5 x lo0) = 64. This would not constitute a serious biological disadvantage, since high-order interactions can generally be ignored, judging by the practice of statisticians in factorial experiments (see the Appendix). The upper limit is reminiscent of the “depth hypothesis” [SS,1071. Compare also the experiment mentioned at the beginning of Section 5. We have seen that it is impracticable to take a sample of language that is large enough to be able to judge the association factors (the exponentials of the amounts of mutual information) between all pairs of 30,000words by simple frequency counts. It is reasonable to assume that direct psychological association between words is determined by the frequencies with which they occur nearly simultaneously in thought, and this is easy to understand in a general way in terms of the assembly and subassembly theory. But we can recognize logical associations between pairs of words that have never occurred together in our 70
..
THE FIRST ULTRAINTELLIGENT MACHINE
experience; for example, the words “ferry” and “fare” can be seen to be associated in the same manner as “bus” and “fare,” even if we never previously made the association in our minds. Likewise, if we were asked to estimate the mutual information between the first two words “ferry” and “fare,” regarded as index terms for sentences, we could reasonably take it as equal to that between the second pair. This is a simple example to show that we make use of semantics even in the simplest problems of association whenever our samples have not been large enough to rely on mere frequencies. The simplest conditional probability machines, such as those designed by Uttley [ l o l l , rely only on frequencies, in other words the probabilities are maximum-likelihood estimates, and they make no use of semantics. Such machines could be improved in principle by means of automatic classification of words into “clumps” (see Section 6). The essential idea is that words can be seen to be associated not merely because they occur frequently together, but because they both occur frequently in conjunction with a third word, or more generally with other words that belong to some reasonably objectively definable clump of words. The search for clumps is especially interesting for the purpose of trying to construct a thesaurus mechanically, hopefully for application to problems of classification and mechanical translation. A comprehensive search is liable to be very expensive in computer time, if the computer is of classical design. By using an artificial neural net, it might be possible to perform the search faster, owing to the parallel working. If A,, A,, . . . , A, is a clump of assemblies having respectively n,, n,, , . n, subassemblies, and if Ai and A, have mu subassembliesin common; then, for each i , the “clumpiness”
.
Lk - 1 Z, ?
is much larger than i t would be for a random class of k assemblies. One can define a clump by insisting that the clumpiness is decreased if any assembly is added to the clump or removed from it. Many other definitions of a clump are possible (see for example Section 6, and [31, 411, and references given in the latter article), and it is not yet clear to what extent the definitions agree with each other, nor which definitions are appropriate for various purposes. A t any rate we must suppose that there is some mechanism by which an msembly representing a clump of aesemblies tends to be formed, a mechanism that will correspond at least to some aspects of “abstraction” or “generalization.” Often this assembly will itself represent a word, and the existence of the word will encourage the assembly to form (for example [all, p. 122): in the example of ferries and buses the word might be “vehicle.” I n the design 71
IRVING JOHN GOOD
of an ultraintelligent machine based on an artificial neural net, one of the most vital problems is how to ensure that the above mechanism will be effective. It seems to be necessary to assume that, when an aaembly is active, it causes a little activity in all the assemblies with which it is closely msociated, although only one at most of these amemblies will be the next to fire. This “priming” of assemblies is analogous to the priming of neurons; it is presumably operated by the subwemblies. The slightly active assemblies in their turn might encourage an even smaller amount of activity in those with which they are closely wociated. I n this way, there will be a small amount of activity in all the clesemblies of a clump, although none of them is actually fired, and consequently a gradually increased chance that an ctsaembly will form that will represent a clump. In terms of man, since, by hypothesis, we are not conscious of cortical activity that is not part of an active assembly, when we form a new abstraction it will emerge from the preconscious or unconscious in a manner that will seem to our conscious minds like a flash of inspiration! It is possible that one of the functions of sleep is to give the brain an opportunity of consolidating the waking experiences by means of unconscious botryologioal calculations, especially those leading to improved judgments of probabilities. This assumption would be consistent with the advice to “sleep on a problem.” It might turn out that an ultraintelligent machine also would benefit from periods of comparative rest, but not by being switched off. Some of the matters that have been discussed in this section can be apprehended as a whole in terms of the following survey of short-term and long-term memory. In most modern computers there are several levels of storage, sucoessively larger but slower. The reason for this is that it would be too expensive to have an exceedingly large storage with instant recall. It is natural to suppose that human memory too is split up into levels corresponding to different mechanisms. The following classification would be consistent with the discussion in this section. It is of course conjectural. (i) I m d i a t e recall (about sec~nd). Concepts currentl$’in consciousness, embodied in the currently active assembly. (ii) Very slaort-term mewmy or attention eyMin (i) second to 10 e m & ) . Embodied in the currently active subassemblies, largely the residues of recently active assemblies. The span might be extended up to several minutes, with embodiment in subsubeseemblies, etc. (iii) Short-term (from about 10 aeconrls or 10 minutes to about one day). Embodied in primed neurons. (iv) Nedium-term (about one day to about m month, below the age of 72
THE FIRST ULTRAINTELLIGENT MACHINE
30, or about one week above the age of 50). Assemblies are neither partly active nor partly primed, but present only by virtue of their patterns of synaptic strengths, and with little degradation. (v) Long-term (aboutone month to a hundred years). As in (iv) but with more degradation of pattern and loss of detail,
A program of research for quantitative theory would be to marry the histological parameters to those in the above list. This program will not be attempted here, but, as promised earlier, we shall give one example of how a quantitative theory might be developed (see also, for example, [6, 921). Let us make the following provisional and artificial assumptions: (i) The probability, in a new brain, that a pair of neurons is connected is the same for every pair of neurons. (ii) Each neuron has p inhibitory synapses on it, and vastly more excitatory ones. (iii) A single “pulsed” inhibitory synapse dominates any number of pulsed excitatory ones, during a summation interval. (iv) An assembly occupies a proportion 01 of the cortex and the active subassemblies not in this assembly occupy a proportion P - a,making a total activity equal to P. Then a random neuron has probability (1 - /I)of ’ escaping inhibition. In order to be active, the neuron must also escape inhibition by the centrencephalic system. So Therefore
P < (1 - P Y log P < log(1 - 8)
For example, i f g = 1/16, then p < 62. It seems unlikely that any biochemical mechanism could be accurate enough to give the required value of p, without some feedback control in the maturation of the brain. But it is perhaps significant that the number of neurons in the cortex is about 292, so that, perhaps, in the growth of the brain, each neuron acquires one inhibitory synapse per generation, 31 in all. The conjecture would have the implication that close neurons would tend to inhibit each other more than distant ones, as required by Milner [71] (compare [34]).We emphasize that this example is intended only to be illustrative of how a quantitative theory might proceed. Taken at its face value, the example is very much more speculative than the subassembly theory as a whole. We conclude this section with a brief discussion of an objection that has been made to the assembly theory. Allport ([3], p. 179) says, 73
IRVING JOHN GOOD
regarding the observation of a whole that consists of parts, a, by c, “. There is, in Hebb’s scheme, no apparent reason why we should not have . . . a perception of the parts a, by c and alongside these at the same time another equally vivid perception of the whole, that is, of t . This, however, does not occur: we perceive either the parts in their separateness OT the parts as integrated into a whole, but not both at once” (see Hebb [54], pp. 9F4-99). This does not seem to be an objection to the theory in the form presented here. Even if the assembly t were originally built up largely from parts of the assemblies a, b, c, it does not contain the whole of any one of these three assemblies. Instead, it consists of parts of a, b, c and also of parts not in a, by or c. Consequently it is only to be expected that we do not apprehend an object both as a whole and in its parts at quite the same moment. I n the next section we suggest how meaning might be represented in terms of subassemblies, but only in a general manner, and not with the degree of precision that could be desired. We aim mainly to strengthen the cme that semantics me relevant to artificial intelligence, and to lend support to the feeling, that is very much in the air at present, that much more detailed research into these matters is worthwhile.
..
7. An Assembly Theory of Meaning
Our purpose is not to define “meaning,” but to consider its physical embodiment. We have already discussed various aspects of meaning in previous sections, and this will enable us to keep the present section, and the next one, short. A distinction can be made between the literal meaning of a statement, and the subjective meaning that the statement has (on a particular occasion) to a man or machine. It is the latter that is of main concern to us in this essay. (For a man, subjective meaning could also be aptly called “personal meaning” but this name would at present seem inappropriate for a machine.) Although we are concerned with subjective meaning, the behavioral interpretation of meaning is not enough for us, as wm said in Section 4. Instead, the subjective meaning of a statement might be interpreted as the set of tendencies to cause the activation of each assembly sequence at each possible time in the future. The physical embodiment of meaning, when a statement is recalled to mind, would then be a class of submsemblies. This embodiment of meaning is related to the probabilistic interpretation for the meaning of a word, given in Section 4 (the probabilistic form of “Wisdom’s cow”). The qualities Q1,Qn, . , Q,, when noticed one at a time, would activate msemblies, but, when they are 74
. .
THE FIRST ULTRAINTELLIGENT MACHINE
noticed only preconsciously, and so directly cause activity only in subassemblies, they are at best contributory causal agents in the activation of assemblies. If a statement provoked an assembly sequence, So,presumably the (subjective) meaning of the statement is embodied in some of the subassemblies that were left behind by s,,the ones that reverberated the longest being the most, important ones. Two statements have close meanings if the sets of subassemblies left behind by them bear a close resemblance to each other, or even if the resemblance is not close provided that the effects are similar, just as a cow can be recognized on different oocasions by the apprehension of different sets of probable properties. We feel that we have understood the meaning of a statement when we somehow recognize that the statement waa a definite causal agent in our thought processes or in our propensities to future motor activity, and that these propensities are of a kind which we think wm intended by the person who communicated the statement. But I shall ignore these intentions and interpret “meaning” aa “meaning for us.” Degrees of meaning exist, and correspond in part to greater or lesser degrees of causal tendency. The rrcircularityy’mentioned in Section 4, in connection with the probabilistic interpretation of meaning, corresponds to the obvious possibility that an assembly can help to strengthen some of the weak subassemblies that helped to activate the assembly itself. A more formal suggestion for the representation of meaning can be framed as follows. Let 8 be an assembly sequence, and 6 a %et” in the psychological sense. (An assembly theory of psychological set is given in Hebb [54].) Let Zbe a statement. Denote by
.
P ( A I s G ’ Z) the probability that A will be the next dominant assembly to follow the assembly sequence 8 when the subject is in psychological set 6,and when he has been told C and had no reason to doubt the veracity of his informant. If the subject had not been told Zthe corresponding probability would be P(A Is ’ G) and, if he had been told that Z was false, the probability would be denoted by P ( A I 8 ’ G * Z) Then the function of A, S, and 6, with values
IRVING JOHN GOQD
for all A, S,and G,is a reasonable fist approximation to a representation of the “meaning” of C. The representation of the meaning of the negation of C is minus that of C. A reasonable representation of the “effectiveness” of the statement would be the function with values log[P(A
I s ’ 6 ’ C)/P(A I s
*
G)]
(7.2)
The rewon why this latter formula would be inappropriate as a representation of“meaning” is that it is sensitive to the subject’s degree of belief in C before he is told C. A man’s degree of belief in a statement should not be very relevant to its meaning. It is not intended to be implied by this representation that the subject could obtain the values of the probabilities by introspection. The probabilities am intended to be physical probabilities, not the subjective probabilities of the man or machine. (For a discussion of kinds of probability, see, for example, [36].) Expression (7.1) may be described as the log-factor or weight of evidence in favor of the hypothesis that C was stated rather than z, provided by the event that assembly A was activated, given that the previous assembly sequence was 8, and that the psychological set was G, (The terminology is that of [26] and [21], for example, and was mentioned in Section 5.) If the subject is deterministio, then the probabilities would be pseudoprobabilities, of the same logical nature as those assooiated with pseudorandom numbers. Expression (7.2) is the mutual information between the propositions that the assembly A was activated on the one hand and that C was stated on the other. If the class of values of (7.1) is extended also over several subjects (who could be specified in the notation) then we should have a representation of multisubjective meaning, and we might perhaps approximate to a representation of “true meaning” if there is such a thing. A representation of “literal meaning” could be obtained by restricting the class to “literal-minded” men and robots, in order to exclude the poetic and irrational influences of a statement. Formulas (7.1) and (7.2) are of course only examples of possible quantitative representations of “meaning.” It might be better to replace them by the formulas for causal tendency. (7.la) (7.lb) These formulas would be more consistent with the interpretation of the meaning of a statement in terms of its causal propensities. 76
THE FIRST ULTRAINTELLIGENT MACHINE
Although we are arguing that semantics are relevant to the design of an ultraintelligent machine, we consider that it will not be necessary to solve all of the problems of semantics in order to construct the machine. If we were using the approach depending on a “canonical language” (see Section 4), the problems would all need solution, but if a neural net is used, we believe that the net might be capable in effect of learning semantics by means of positive and negative reinforcement, in much the same manner as a child learns. The theory of assemblies and subassemblies, as applied to semantics, is intended to provide some at least intuitive justification for this belief. It should be possible, by means of more quantitative theory and experiment, to improve, to disprove, or to prove the theory. A thoroughgoing quantitative theory will be difficult to formulate, and the experiments will be laborious and expensive, but the reward or punishment will be great. 8. The Economy of Meaning
Just as the activation of an assembly is a form of regeneration, so also is that of a subassembly, although the regeneration of subassemblies might be less sharp. The degree of regeneration of a subassembly corresponds to a preconscious estimate of the probability of some property, so that the process of recall is physically one of regeneration mixed with probabilistic regeneration. We have argued that, in any communication system, the function of regeneration and of probabilistic regeneration is economy, and so the physical embodiment of meaning also serves a function of economy. It is even possible that the evolutionary function of meaning and understanding is economy, although metaphysically we might consider that the function of evolution is the attainment of understanding! Imagine, for the sake of argument, that each meaningful proposition (defined as a class of logically equivalent statements) could be expressed by each of a hundred different statements, each of which had an entirely distinct representation in the brain. Suppose that the number of ordered pairs of propositions that are mentally associated is N. Corresponding to each pair of propositions, there would be 10,000 equivalent pairs of statements. I n order to represent the N associations between propositions, we should require 10,OOON associations between statements. Although the number 100 is here a pure guess, i t is clear that there must be a tremendous premium on the representation of statemente by their meanings. For this saves a factor of 100 (nominally) in the storage of the propositions, and a corresponding factor of 10,000 in the storage of the associations between pairs of propositions. The latter factor is relevant in long-term recall, since the process of recalling
77
IRVING JOHN GOOD
a fact usually requires that one should have in mind several other facts. It is clear therefore that the physical representation of meaning performs a very important function of economy, especially in long-term recall, and can be expected to perform an equally important function in an ultraintelligent machine. 9. Conclusions
These “conclusions” are primarily the opinions of the writer, as they must be in a paper on ultraintelligent machines written at the present time. I n the writer’s opinion then: It is more probable than not that, within the twentieth century, an ultraintelligent machine will be built and that it will be the last invention that man need make, since it will lead to an “intelligence explosion.” This will transform society in an unimaginable way. The first ultraintelligent machine will need to be ultraparallel, and is likely to be achieved with the help of a very large artificial neural net. The required high degree of connectivity might be attained with the help of microminiature radio transmitters and receivers. The machine will have a multimillion dollar computer and information-retrieval system under its direct control. The design of the machine will be partly suggested by analogy with several aspects of the human brain and intellect. In particular, the machine will have high linguistic ability and will be able to operate with the meanings of propositions, because to do so will lead to a necessary economy, just as it does in man. The physical representation of both meaning and recall, in the human brain, can be to some extent understood in terms of a subassembly theory, this being a modification of Hebb’s cell assembly theory. A similar representation could be used in an ultraintelligent machine, and is a promising approach. The subassembly theory leads to reasonable and interesting explanations of a variety of psychological effects. We do not attempt to summarize these here, but merely refer the reader back to Section 6. Even if the first ultraintelligent machine does not after all incorporate a vast artificial neural network, it is hoped that the discussion of the subassembly theory is a contribution to psychology, and to its relationships with the theories of communication and causality. The activation of an assembly or a submsembly is an example of generalized regeneration, a function of which is again economy, The assembly and subassembly theories are e d e r to accept if combined with the assumption of a centrencephalic control system, largely because this would enable a very much greater variety of assemblies to exist. The process of long-term recall can be partly understood as a statistical information-retrieval system. Such a system requires the estimation 78
THE FIRST ULTRAINTELLIGENT MACHINE
of probabilities of events that have never occurred. The estimation of such probabilities requires some nontrivial theory even in simple cases, such as for multinomial distributions having a large number of categories. I n more complicated cases, the theories are very incomplete, but will probably require a knowledge of and an elaboration of all the methods that have so far been used by actuaries and other statisticians for the estimation of probabilities. Among the techniques will be included the maximum-entropy principle, the use of initial probability distributions [47, 56, 481, and “botryology” (the theory and practice of clump-finding). A form of Bayes’ theorem expresses the final log-probability of a “document” or “memory” as an initial log-probability, plus some terms representing I(Dj : WJ, the information concerning a document provided by an index term (or concerning a memory provided by a “clue”), plus additional terms representing the mutual information between index terms and the document. It is suggested that, in the brain, the initial log-probability is possibly represented in some sense by the strength of the connectivity between an assembly and the centrencephalic system; that the terms I ( D j : W,) are represented by the subassemblies shared between the assemblies corresponding to Dj and Wd;and that other terms are represented by the interactions between sets of at least three assemblies. An alternative suggestion, which seems slightly to be preferred, is that the strengths of association are expressible in terms of K(E : P), the intrinsic tendency of an event E to be caused by F.This is equal to minus the mutual information between F and not E . Then the strength of the association from the centrencephalic system and an assembly would be approximately equal to the initial (prior) probability of the firing of the assembly, given the psychological “set.” The same remarks concerning interactions apply here as in the first suggestion. Whereas, in ordinary information-retrieval problems, the expression I ( D j : Wi) will often need to be estimated with the help of computational techniques for clumping, the strength of the connectivity between two assemblies will often be physically represented because of the manner in which the two assemblies were originally formed, by being built up from co-occurring subassemblies. The representation of informational or causal interactions, or both, up to about the sixth or seventh order, is presumably embodied in the subassemblies common to assemblies. The magical proficiency of the brain, in recall, can be largely attributed to its facility in handling these interactions. My guess is that only an ultraparallel machine, containing millions of units capable of parallel operation, could hope to compete with the brain in this respect. 79
IRVING JOHN GOOD
It seems reasonable to conjecture that the organization of the interactions into subassemblies might require the intervention of periods of rest or sleep. A possible function of sleep is to replay the assembly sequences that were of greatest interest during the day in order to consolidate them. During wakefulness, half-formed subassemblies would be subjected to the inhibitory effect of fully active assemblies, but during sleep a half-formed subassembly would have time to organize and consolidate itself. On this hypothesis, a function of sleep is to strengthen the unconscious and preconscious parts of the mind. The first ultraintelligent machine will be educated partly by means of positive and negative reinforcement. The task of education will be eased if the machine is somewhat of a robot, sinae the activity of a robot is concrete. Regarding the microstructure of the learning process, it is proposed that this be effected by means of reinforcement of the strengths of artificial synapses, that the available strengths for each synapse should form a discrete set, that when a synapse is not used for a certain length of time it should have a certain small probability of “mutation” down one step, and that when a synapse is “successfully used” (i.e,, contributes to the activation or inhibition of an artificial neuron) it has a certain small probability of mutation up one step. The need for the changes in synaptic strength to be only probabilistic, with small probabilities, is that they would otherwise vary too quickly for the machine to be of any use, at any rate if the assembly or subassembly theory is incorporated. Deterministic changes, in any obvious sense, would be uaeful only if a very small fraction of the machine were in use at one instant, and this would be uneconomical. 10. Appendix: Informational and Causal Interactions
. .
Let E l , E,, . , En represent events or propositions. Let the probability P(El 1% . . . * En),for example, where the vinculum denotes negation, be denoted by where 0 mean$ false and 1 means true. The 2“ different possible logical conjunctions of the n propositions and their negations have probabilities denoted by pi, where i = (il,is, . . . , in) is an n-dimensional vector each of whose components is either 0 or 1. The array (pi) is a 2n population contingency table. A murginal total of the table is obtained by summing out one or more of the suffixes, and we denote Zi3, is pi, for example, by piliz ,;, ,isi, ... When the s d x e s not summed out are equal to 1, we use an alternative notation: for example, if il = i, = i , = i , = i , = . . . -ii,=l,we denote the marginal total by P l l o l o l l1,l Thus ~ ~ ~ the numbers (Pi)form 80
-
THE FIRST ULTRAINTELLIGENT MACHINE
another 2n array, which consists of a subset of the marginal totals of the original table. Note that Po,, . = p,,, . .., = 1, and that, for example, P,,o,, , , , = P(E, E J . The probabilities (Pi)have more direct relevance to information retrieval than the probabilities (pi), since it is more natural to assert an index term than to deny one. The most relevant Pi’s for this purpose will be those for which I i I is small, where I i I denotes the number of nonzero components of i. Just aa each Pi is the sum of some of the pl)i)s,so each pi can be expressed as a linear combination of Pi’s. For example (the BoolePoincarrB theorem),
,
-
.I
,
=-y(-l)lilPi =So -Ss,+S, -
... + (-l)”S,
where S,, is the sum of all Pi’s for which P1000
...0
= S;
- 8;
Ii I
= p,
+ 8; -
*
and
’’
where SL is the s u m of all Pi’s for which I i I = p and i, = 1. Interactions between events: Let E and F be two propositions or events. Write I @ ) = -logP(E) the amount of information in the proposition E concerning itself ([211,P. 74, [26l).Let
I ( E :F ) = I(E, F ) = log P ( E * F)- log P ( E ) - log P(F) =I(E)
+I(F)-I(E
*
F)
(A4
the amount of information concerning E pmvided by F . It is also called the mutual information between E and F, when it is desired to emphasize the symmetry, and in this caae the comma is more appropriate than the colon, since the colon is pronounced “provided by.” The equatior shows that I(E, F ) can be regarded as a measure of information interaction between E and F . For sets of more than two propositions, i t is natural to generalize this definition by using the n-dimensional mod 2 discrete Fourier transform, aa in the definition of interactions for factorial experiments (see for example [32]).We write
I j -- 2 where ij (iJ,, , ,
131
-fixi(-lyjlogPiAj
(A.3)
= iljl + , . . + id,,is the inner product of i andj, and i j . , id*,is) the “indirect product” (halfway between the inner A
=
81
IRVING JOHN GOOD
product and the direct product).
action of the $rat kind8 and order
...
... I,,,.. . = 1% Po00 ... loo0
0
= 1% Po00
0
1110..
.o
.
I i u o .. o
+ I(&) +
=I =
(4)
- I(E8
For example,
=0
- log P i 0 0 ... 0 = -1% P(4)= * Ea) = I(Ei9Ba) I@a) 0
+
= I(E4 :Ei)
+ I ( E i .Ba
+ I(E8 :EP) - I(B8 :El
+
I(84
wu
- I(Ba * ZJ
El) - I(E1. Ea)
= I(#, :El)
I i i i i o . .. o
0
We call I5 an (informational)inter-
I j I -1.
:Ea)
* 1 8 )
(A.4)
A'S)
+ I(E4 :fla) - I(E4 :Ba EA *
- I(B4 :E i * E a )- I(E4 :81.I#*)+ I(E4 :El * Ea * E 8 )
-
(A4 Es), but
I n [37], this last expression was denoted by I4E4 : El 8 s 18(E4: 1,* E, * E J would be a more natural notation. We avoid this notation here since we are using vectors for suffixes. We prefer to write Illllo.. . = I(El, BayE8, B4),and regard it as the mutual information between the four propositions. By means of the Fourier inversion formula, we see, for example, that (as in [16a], p. 68)
- log Piiiio ...o = I@i'1, B8 = -
+
c
6
E4) '('V
'8)
'2, E8Y where 1 < r < 8 < t < 4. Equation (6.6) is readily deducible. Interactions between causal tendencies (& or K, see page 67), are definable in a similar manner. (Compare [39], where the signs are not quite the same as here.) But we shall leave these to the reader's imagination. We also write J, = i(-1Ii5log 24 (A.7) and oall J, an (infOmnai0nal)interaction of the second kind, of order Ij I -1. (It was denoted by I j in [47].) Yet another kind of interaction, involving expected amounts of information, wm defined by McGill and Quastler [67]. If fi is not small, the number of cells in the population contingency table is very large, and an exceedingly large sample would be required in order to make a direct estimate of all the pi's. I n order to get around this difficulty to some extent we can sample just 8 m e of the marginal totals. Then we c m use the "maximum-entropy principle" [47, 48,
I(',,
c
02
E8Y
't)
THE FIRST ULTRAINTELLIGENT MACHINE
52,55,56,62,100] in order to m&e at least a provisional estimate of the pi's. According to this principle, one maximizes the entropy - z p c log pi, subject to the constraints (here, the assigned marginal totals) in order to set up a null hypothesis (at least this is the way it is expressed in [47] and [as]).The idea in a nutshell is to assume as much statistical independence as one can. Among other things, the following result is proved in [47]:
Suppose that we know or msume a complete set of rth-order wnstraints for ( p L )i.e., , all totals of the pi's over each subset of n - r coordinates. Then the null hypothesis generated by the principle of maximum entropy i%the vanishing of all the rth and higher-order interactions of the second kind. I n this theorem, instead of assuming a complete set of rth-order constraints, we could assume all the interactions of the first kind and orders r - 1or less. In order to see this, we take r = 4 for simplicity and consider Eq. (A.3). If we know Pifor all i with I i I 2 4, we can calculate all I, with I j I 5 4, i.e., we can deduce all the interactions of the first kind and of orders 3 or less. Conversely, given these interactions of the first kind, we can first calculate log P1oooo, log Poloo,log Pool0,log Poool, then log P,,,,(since we know log P,,,, - log P,,,, - log Pol,,), and so on. We can thus determine Pi for all i with I i I 5 4, i.e., we have a complete set of fourth-order constraints of the (pi's. Nearly always, when a statistician discusses interactions of any kind, he believes or hopes that the high-order interactions will be negligible, The maximum-entropy principle provides a comparatively new kind of rationale for this belief regarding interactions of the second kind. Whether a similar partial justification can be provided for other kinds of interaction is a question that has not yet been investigated. The question is analogous to that of the truncation of power series and series of orthogonal functions, as in polynomial approximation. REmRENOES'
1. ArtifiiuZ Intelligence. IEEE Publ. No. 13-142, New York (January 1963). 2. Adrian, E. D., and Matthews, B. H. C., The interpretation of potential waves in the cortex. J . PhysiOZ. (Londolc)81, 440-471 (1934). 3. Allport, F. H., Theoriee of Perception and ths Concept of Structure, Chapter
19. Wiley, New York, 1966. 4. &by, W. R., D e d p for u Bruin. Chapmen & Hall, London, 1960. 6. Bas-Hillel, Y., Semantic information and ite measures, Trans. 10th Conf. Cyaernetice, New York, pp. 33-48 (1963). *Only item mentioned in the text are lhted here; for a bibliography on artificiel inteLligence tee Mineky [72].
83
IRVING JOHN GOOD 8. Beurle, R. L., Funotiod organization in random networks. I n Prhiplee of SSy-Orgmkution (H. von Foerster and 0.W. Zopf, Jr., ede.), pp. 291-311
end dieauesion pp. 311-314. Oxford Univ. Prese, London and New York, 1982. 7. Blaok, M., -6 a d Phibaophv. Cornell Univ. Pmm, Itheoe,New York, 1949. 8. Brain, Lord, Reoent work on the phyeiologioalbesis of epeeoh. Aduanmmmt Sci. 19,207-212 (1982). 9. Bruner, J. S., Postman, L., and Rodriguee, J., Expeotation and the peroeption of o o h . Am. J. PeychoZ. 64, 216-227 (1961). 10. Cernep, R., and Bar-Hillel, Y., Semantic information. Brit. J. Phdl. Sci. 4, 147-167 (1963). 11. Carter, C. F., Problem of monomio growth. Adwancement Sci. 80, 290-298 (1983). 12, Cherry, E. C. On Human Cmmuniodwn. W h y , New York, 1967. 13. Cherry, E. C., Two eere--but one world. In Ssneory Communieacion (W. A. Roeenblith, ed.), pp. 99-116. Wiley, New York, 1981. 14. Chomsky, N., Explanatory modela in linguietioe. In Logic,Methodology and Philosophy of S c h a (E. Nagel, P. Suppee, and A. T d ,ede.), pp. 628-660. Stanford u d v . h, Stanford, Ca&fornia, 1982. 16. Culbertson, J. T., C o n d o u m a and Bdravior. W. C. Brown. Dubuque, Iowa, 1950. 18. Eoolee, J. C., Phyublosy of N m CcUe. Johne Hopkins Prees, Baltimore, Maryland, 1967. lSe.Fano, R. M., T m n m w k of Iqfoivn&ion. Wiley, New York, 1981. 17. Flory, P. J., PrinciplSe of Polymer C h i s b r y . Cornell Univ. h, Itheca, New York, 1953. 18. Friedberg, R. M., A laming maohine, Part I. IBM J. Ree. Dtwdop. 2, 2-13 (1968). 19. Gabor, D., Wilby, W. P. L., and Woodoook, R., A self-optimizing nonlinear Biter, predictor and eimulator. In Iqfownation Theory: Fourth London Sytnpouktn (E. C. Cherry, ed.), pp. 348-362. Butterworth, London end Waehhgton, D.C., 1981. 20. Gall, F. J., and Spurzheim, a., Andomieet PhyablogisduSydme Nervewen Qgnkd et du Cerveau en Pwtiolllisr, avec dee Obecrvdione w la PoeaibilW de Rsc0nnach.s Pluaiuwa DbpoadionaIet M& de I'Hmme ct dss A n h a w par la Conplrumtion de Lsu*o Mtee, 4 vole. Psrie, 1810-1819. (Cited in Zangwill [IO8].)Volumea 3 and 4 are by aell alone. 21. Good, I. J., Probddi@ and the Weighing of Evidena. Hafner, New York, 1960. 22. Good, I. J., Review of 8 b k by D. R. Hartme. J . Roy.Sta.twt. Soc. All4, 107 (1961). 23. aood. I. J., Rational deaidons. J. Roy.Stdut. Soc. B14, 107-114 (1962). 24. Good, I. J., in CmmunieatiOn T h m y (W. Jaokmn, ed.), p. 287. Butterworth, London and Waehington, D.C., 1963. 26. Good,I. J., On the population frequenoies of epeoies and the eatimation of population panunetere. BdOmstrib 40,237-264 (1963). 28. Good, I. J., Some terminology and notation in information theory, P m . IBE (London)Parl C ( 3 ) 108, 2W204 (1968). 27. Good, I. J., On the estimation of small frequencies in contingenoy tablee. J. Roy. Statiut. 800.B18, 113-124 (1966). 28. Good,I. J., Distribution of word frequenuiea. N d w e 179, 596 (1967).
84
THE FIRST ULTRAINTELLIGENT MACHINE
29. cood,I. J., Review of a book by U. S . Brown. Brit. J . Phi& Sod. 9, 264 (1968). 30. Good, I. J., How much science can you have a t your fingertips? IBM J. RM. Dov&6p. 2, 282-288 (1968). 31. Good, I. J., Speculetions concerning informahion retrieval. Reseesoh Rept. No. RC-78. IBM,Yorktown Heights, New York (1968). 32. Glood, I. J., The intereation algorithm and prectical fourier analysis, J. Roy. Stdiet. SOC.B20, 361-372 (1968);B22, 372376 (1960). 33. Good, I. J., Could a machine make probability judgments? Cmputwa AUofnatknb8, 14-16 and 24-26 (1959). 34. Good, I. J., Speculations on perceptrons and other Automata. Research Rept. No.RC-116,IBM, Yorktown Heights, New York (1969). 36. Good,I. J., P m . Intern. Cmf.Sod. Ir&vn., pp. 1404 and 1406.Natl. Acad. Sci. and Natl. Reseerch Council, Waehington, D.C., 1969. 36. Good, I. J., Kinds of probability. Science 129, 443-447 (1969). Itcllisn translation in L’Indwtria, 1969. 37. Glood, I. J., Effective sampling ratedl for signal detection or can the Uawian model be salvaged? I nfm. C&oZ 8, 118-140 (1960). 38. Good,I. J., Weight of evidence, causality, and fab-elsrm probabilities. In Irtformastson Theory:Fourth London Sgmpo&um (E. C. Cherry, ed.),pp. 126-136. Butterworth, London and Washington, D.C., 1961. 39. Good, I. J., A c a d calculus, Brit. J . Phil. 6’1%.11, 306-319 (1961);12, 43-61 (1961);18, 88 (1962). 40. Good, I. J., How rational ehould a manager be? Management Sci. 8, 383393 (1962).To be reprinted, with minor corrections, in Executive Reading8 in Manageme& Science, Vol. I (M.K. S b , ad.). Mecmillan, New York, 1965,in p m . 41. cood,I. J., Botryological speculations. In The Sc&enl& Specu2dse (I. J. Good, A. J., Mayne, and J.Maynard Smith, eds.) pp. 120-132.Basic Books, New York, 1963. 42. Good, I. J., Review of a book by J. Wolfowitz. J. Roy.Stat&. SOC.AlZS, 643-646 (1962). 43. Uood, I. J., The mind-body problem, or could an android feel pain? In Themia oftheMhd (J.M.Scher,ed.),pp.490-618. Glencoe,NewYork, 1962. 44. Uood, I. J., The social implications of artificial intelligence. In The Scientkt -S [all, pp. 192-198. 46. Good, I. J., Cascde theory and the moleculer weight averages of the Sol freation. Pmc. Roy. SOC.A272, 64-69 (1963). 46. Good,I. J., The relevance of mmantios to the economical construction of an artificial intelligence. In Arti&id InteUigence [ I ] , pp. 167-168. 47. coed, I. J., Maximum entropy for hypothesis formulation, especially for multidimensionalcontingency tables. A m . Math. Stat&. 84,911-934 (1963). 48. U d , I. J., The Eatimatbn of ProbaMitieS. M.I.T. %, Cambridge, Meeseohueett9, 1966. 49. Good, I. J., The human p m r v e . SpaceJIight in prees (1966). 60. Greene, P.H., An a p p m h to computers that perceive, learn, and reamn. pro^. Wcstem JOint CmC ~ n f ,pp. 181-186 (1969). 61. Halmos, P. R., NaiueSet Theory.Van Nostrand, Princeton,New Jersey, 1960. 62. Hmtmeni~,J., The application of mme besic inequalities for entropy. I n . f m . Control. 2, 199-213 (1969). 63. Heyek, F.A., The Seneory Order. Unk. of Chicago Press, Chicago, Illinois, 1962.
85
IRVING JOHN GOOD
64. Hebb, D. O., Organization of Behmiw. Wiley, New York, 1949. 66. Jaynes, E. T., Information theory and statistical mechanics. P h p . Rev. 108, 620-630 (1967); 108, 171-190 (1967). 56. J a p e s , E.T., New engineering applications of information theory. Proc. Firat-Symp. Eng. Applicdiom of Function Thcory and Proaabdity (J. L. B o g h o f f and F. Kozin, ede.), pp. 163-203. Wiley, New York, 1963. 67. John, E. R., Some speculetions on the psychophysiology of mind. I n Thgoriee of the Mind [43], pp. 80-121. 68. Johnson, W. E., Appendix (ed. by R..B. Braithwaite) to Probability: deductive and inductive problems. Mind 41, 421-423 (1932). 69. Kalmue, H., Analogies of language to life. In The Scientist Speoulcctee [41], pp. 274-279. 60. I(ieeda, J. R., Peterson, H. E., Seelbach, W. C., and Teig, M.,A magnetic seeocietive memory, I B M J. Ree. Develop. 6 , 106-121 (1961). 61. Laahley, K. S., In eeeroh of the engram. Symp. SOC.ExptZ. BWZ. 4,464-482 (1960). 62. Lewis, P. M., 11,Approximeting probability distributions to reduce etorage requiremente. Inform. Control 2, 214-226 (1969). 63. MacKay, D. M., The epistemological problem for Automata. In Automata Stud& (C. E. Shannon and J. McCasthy, eds.), pp. 235-261. Princeton Univ. Press, Princeton, New Jersey, 1966. 64. Maron, M. E., and Kuhns, J. L., On relevance, probabilistic indexing and information retrievel. J. Aaaoc. Computing Machznwy 7, 216-244 (1960). 66. McDermid, W. L., and Peterson, H. E., A magnetic essociative memory system, IBM J. Ree. Develop. 5, 69-62 (1961). 66. McDougall, W., Primer of Ph@Aogdoal Paychology. Dent, London, 1906. 67. McGill, W., and Quaetler, H., Stcmnhdized nomenclature: an attempt. I n I q f o m a d m Thmry in Paychology (H. Queetler, ed.), pp. 83-92. Olencoe, New York, 1965. 68. Middleton, D., and Van Meter, D., Detection and extraction of signals in noiee from the point of view of statistical decision theory. J. SIAM 8 , 192-263 (1966); 4, 86-119 (1966). 69. Miller, (x. A., Human memory and the storage of information. I R E Tram. I n f d i o n Thcory 2, 129-137 (1966). 70. Miller, (3. A., and Selfridge, J. A., Verbal context and the recall of meaningful mteri81. Am. J. PeychoZ. 68, 176-186 (1960). 71. Milner,P. M., The oell eeeembly: Mark 11.Paychol. Rev. 64,242-262 (1967). 72. Mineky, M., A selected descriptor-indexed bibliography to the literature on ertifioial intelligence. I R E Tram. Human Factore in Electrolt. 2, 39-66 (1961). 73. Minsky, M., and Selfridge, 0. G., Learning in random nets. I n Informdiorr Theory: Fourth LondonSympos&um (E. C. Cherry,ed.),pp. 336-347. Butterworth; London and Washington, D. C., 1961. 74. Mueller, P., Principles of temporel reoognition in artificial neuron nets with , 137-144. application to speech reoognition. In Artifiid InteEZigence [ l ] pp. 76. Needhaam, R. M., Research on h f o m t i o n retrieval, Clessification end Grouping. Rept. No. M.-L.-149, Cambridge Language Research Unit, 1961. 76. Needhaam, R. M.,A method for using oomputers in information claaaiflcation. In I n f o m a d m Proceasing 1962, (M. Popplewell, ed.), pp. 284-287. NorthHolland Publ. Co., Amsterdam, 1963. 77. Neyman, J., and Scott, E. L., Statistical a p p m h to problem of cosmology. J. Boy. Statist. SOC.BZO, 1-43 (1968).
86
THE FIRST ULTRAINTELLIGENT MACHINE
78. Parker-Rhodes, A. F., Notes for a prodromus to the theory of clumps. Rept. No. LRU-911.2, Cambridge Language Research Unit, 1969. 78a.Pask, G., A discussion of artificial intelligence and self-organization. Advan. Cmputera 6 , 109-226 (1964). 79. Penfield, W., and Jasper, H., Highest level seizures. Ree. Publ. Aaaoc. N e m Mental Diseczee 26, 262-271 (1947). 80. Penfield, W., and Roberta, L., Speech and Brain Mechniema. Princeton Univ. Press, Princeton, New Jersey, 1969. SOa.Pierce, J. R., Symbola, Signals and Noiee: the Nature and Proceea of Communication. Hutchinson, London, 1962. 81. Rao, C. Rdhakrishna, Advanced Stcctiatical Meth~dai n Biometric Reeearch, pp. 364-378. Wiley, New York, 1950. 82. Rosenblatt, F., Principles of Neurodynumica (Cornell Aeron. Lab., 1961). Spartan Books, Waahington, D. C., 1962. 83. Samuel, A. L., Some studies in machine learning using the game of chequers. I B M J. Rea. Develop. 8 , 210-229 (1969). 83a.Samue1, A. L., Programming computers to play games. Advan. Cornputera 1, 166-192 (1969). 84. Scriven, M., The compleat robot: a prolegomena to androidology. In Dimemimaof Mind (5.Hook,ed.), pp. 118-142. N.Y.U. Preas, New York. 1960. 86. Sebestyen, G. S., Deckion-Making Proceeaee in Pattern Recognition. Macmillan, New York, 1962. 86. Selfridge, 0. G., Pandemonium: a paradigm for learning. I n Mechanization of Thought Proceesee, pp. 511-626. H.M.S.O., London, 1969. 87. Serebriakoff, V., A hypothesis of recognition. I n The Scientist Speculatee [41], pp. 117-120. 88. Shannon, C . E., Prediction and entropy of printed English, Bell System Tech. J . 80, 5 6 - 6 4 (1961). 89. Shannon, C. E., and Weaver, W., The Mathematid Theory of Communication. Univ. of Illinois Press, Urbana, Illinois, 1949. 90. Sholl, D. A., The Organization of the Cerebral Cortex, pp. 6 and 36. Wiley, New York, 1966. 01. Shoulders, K. R., Microelectronicsusing electron-beam-activatedmaohining Techniques. Advan. Computer8 2, 136-293 (1961). 92. Smith, D. R., and Davidson, C. H., Maintained aotivity in neural nets. J. Aaaoc. Computing Machinery 9, 268-279 (1962). 93. Snath, P. H. A., Recent developments in theoretical and quantitative taxonomy. System. 2001.10, 118-137 (1961). 94. Solomon, R. L., and Howes, D. H., Word frequency, persons1 values, and vi~ualduration thresholds. Paychol. Rev. 68, 266-270 (1961). 96. Sphrck Jones, K., Mechanized semantic clegsification. 1961 Intent. Conj. cm Machine T r a n a l a t h of Languugee and Applied Language A d y a k , pp. 417436. National Physical Laboratory, Teddington, England, 1963. 96. Stiles, H. E., Association factor in information retrieval. J. Aaaoc. Compw?ing Machinery 8 , 271-279 (1961). 97. Tanimoto, T. T., A n elementary mathematical theory of classification and prediction. IBM, Yorktown Heights (November 1968). g8. Tompkina, C. B., Methods of successive restrictions in computational problems involving discrete variables, Section IV. Proc. Symp. Appl. Math. 15, 96-106 (1963). 09. Tower, D. B., Structural and functional organization of mammalian
87
IRVING JOHN GOOD
cortex: the correletion of neurone density with brain size. J. Comp Neurol. 101, 19-46 (1954). 100. "ribm, M.,Information theory aa the baais for thermostcltics end thermodpemios. J. A w l . M c ~ h$8, . 1-8 (1901). 101. Uttley, A. M.,The design of conditional probability oomputem. I n f m . COlztr012, 1-24 (1959). 102. Uttley, A. M.,Conditionel probability computing in the newow system.
103. 104. 106. 106.
McOhnkution of Thought Proceaaeu. National Phy&d Laboratory Symp. No. 10, pp. 119-147 (esp. p. 144, with a reference to an unpublished paper by G . Russell). H.M.S.O., London, 1969. Walter, W. G., The Living Brain. Norton, New York, 1963. Winder, R. O., Threshold logic in ertficial intelligenoe. In Artiflcda2 I M i g~ [ I ] . pp. 107-128. Woodwaxd, P. M., and Davies, I. L., A theory of r& information. Phil. Mag. [7], 41, 1001-1071 (1960). Wozencraft, J. M., and Reiffen, B., Seque&kzl Coding. Wiley, New York,.
1961. 107. Yngve,
V. H.,The depth hypotheaie. In Structure of LangzMge and ite Mdhcmdic02 A&p& (R. Jakobmn, ed.), pp. 130-138. Am. Math. Sm., Providence, Rhode Island, 1961. 108. Zengwill, 0. L., The cerebral looalisstion of psychologiaal function. Adw t a ~ e n Sci. t 20, 336-344 (1963).
88
Digital Training Devices CHARLES R. WICKMAN Haneywwll. Inc.,
West Covino, Corifornia
.
1. Introduction 2. TrminingRequirementa 2.1 Introduction . 2.2 Definitions 2.3 Training Problem and Requirementa . 2.4 Training Concept 2.6 Trmining Retionale . 3. Training Simulators Using General Purpose Digital Computers 3.1 Introduction . 3.2 Student Environment . 3.3 Instructor'e Console . 3.4 Red-World Simulator 4. Programming Considerations . 5. Non-Training Uses of a Training Simulator . 8. Future Training Device Requirementa Aaknowledgmenta
.
.
.
.
.
.
. . .
89
. 9 0
.
.
. . . . . . . . .
.
90 92 94 99 100 101 101 102 104 109 122 128 128 130
1. Introduction
The advent of complex manlmachine systems has resulted in an increasing emphasis being placed on the adequate training of personnel required for system operation and control. This training requirement has, in turn, led to the development of manlmaohine systems, especially designed for training, that effectively duplicate the functional environment to whioh the trainee will be exposed in the operational system. These systems, generally called training devices, have evolved in complexity and sophistication so that at times they rival the complexity of the operational systems. "mining devices are defined for the purpose of this article as equipment especially designed or configured for the purpose of instructing either individuals or groups. For the most part the present article is concerned with a restricted type of training device, namely, a training simulator. A training simulator is a device that effectively reproducea 89
CHARLES R. WICKMAN
certain aspects of a given system so that training may be obtained. The aspects reproduced depend upon the particular training problem and include not only external appearance, but also simulation of the operation of parts of the system and selected characteristics of the environment in which the system operates. Examples are many and varied including operational flight trainers for instructing of aircraft pilots and flight crews; submarine attack teachers for coordinated training of complete teams associated with certain aspects of submarine operation; driving simulators for driver education programs; and missile procedure trainers for training launch crews for ballistic missile operation. I n each of these devices the intent is to enable instruction by providing some replica of the operational, real system. These devices have increasingly used digital techniques, including large general purpose computers. It is the purpose of this article to discuss the use of digital techniques in training devices and, in particular, training simulators. The article is neither a complete text nor a highly technical treatise, but rather an introduction and survey of the application of digital techniques, directed towmd readers generally familiar with digital computers and associated techniques. The emphasis is on training aspects of the devices and ways in which digital techniques are used to enhance training. I n order to do this effectively, it is first necessary to explain the nature of training devices, from the standpoint of not only end use, but also the process and considerations underlying the eventual system implementation. I n an article of this length, it is impossible to exhaustively treat digital aspects of training devices. Therefore, it is assumed that the reader has a knowledge of not only digital computers and related techniques, but also the rudiments of systems design and analysis. For completeness, the problem of analysis is discussed, but only as a vehicle to clarify the concepts underlying the use of digital techniques. Similarly, certain aspects of general purpose computers are discussed in order to clarify their particular use in training devices, 2. Training Requirements
2.1 Introduction
As man/machine systems have become more complex, the role of man in systems has changed. One aspect of most complex systems is that fewer individuals are required for performance of given functions. Automatic equipment has replaced many man functions resulting in a 90
DIGITAL TRAINING DEVICES
significant reduction in the number of personnel required to implement complex operations. Paradoxically, this does not always result in a reduction of total personnel since the complexity or quantity of the operation may have increased significantly. A case in point would be the large number of operators employed by telephone companies. The original function of manually completing all phone calls has been replaced by a higher order function; yet the total number of operators employed is still very large since the volume of telephone calls and hence the need for the higher order operator function has had a manifold increase. At the same time, and as a result of both the reduction of personnel required for given functions and the performance of simple tasks by automatic equipment, the contribution of the remaining personnel as part of a system has become more complex and critical. Thus, the need for effective training has become increasingly important. Any machine, equipment, or system requires some degree of training or instruction of personnel for either operation or use. Completely automatic systems require minimal instruction. For example, an automatic elevator system requires no operators per se and requires only that passengers select the desired floor and push the correct button. Instruction is satisfactorily conveyed by means of simple printed directions. As the function of man in the system becomes more complex the need for training increases. Even equipment apparently as simple to operate as a telephone requires some training. In fact, industry expends a great deal of effort in improving telephone procedures so as to maximize the effectiveness of telephone use. The importance of adequate training is a function not only of complexity of operation and use by man, but also of the critical nature of the role of man in the effectiveness of system operation. For example, a bank may install an expensive, sophisticated machine bookkeeping system, relatively simple to operate. But, unless the personnel involved in its use are properly trained, errors will occur in these simple operations, both of commission and omission, and as a result the system will not be used to maximum effectiveness. This inefficiency implies a relatively poor return on the investment in the system. In the same way, unless personnel are adequately trained in correct use of a modern weapon system, the system effectiveness may be seriously compromised, Thus, complex man/machine systems require effective training of personnel involved in the use and operation of the system. As the importance and complexity of man’s function in modern systems has increased, training requirements have become increasingly 91
CHARLES R. WICKMAN
important. This,however, does not imply that training must in turn be complex. Many critical manual functions in a system are simple and, hence, adequate training can be provided by verbal instruction or simple treining. For example, the role of man in the bank bookkeeping system is important and critical but the individual tasks are by themselves simple. Therefore, a combination of classroom instruction and on-thejob training oan satisfrtotorily fulfill the training requirement. However, in many man/machine systems in existence today the role of man is of critical importance and complex to perform. This l e d to training requirements that in turn are important and complex, For the cams of important and complex training requirements that cannot be satisfied effectively by instruction, on-the-job training, or both, sophisticated training devices have been developed. Typical of the use of oomplex and sophisticated training simulators is the Fleet Ballistic Missile trainer produced by Honeywell for the Navy Training Device Center and installed at the U.S. Submarine Btlee at New London, Connecticut. This simulator mlistically reproduces the functional environment experienced by the attack team of a modern submarine to such an extent that the team may be exercised and trained in all critical tmks whose performance will be required in the operational environment aboard the actual submarine. This simulator is in itself a complex and sophisticated man/maohine system, and is required as a training device because of the critical and complex functions that must be performed by the submarine team. Thus, a complex man/machine system with important and complex functions assigned to operating personnel leads to the development of a sophisticated training device in which the personnel can gain the necessary proficiency in the performance of their assigned t d s . The use of digital techniques in training simulators can be explained only in the context of training problems. Digital techniques are used in the implementation of training simulators only because these techniques aid in the solution of particular training problems. The problems in implementation we seldom unique to training simulators. Only the application is unique. Thus, this section of the article discusses training requirements and serves as the framework for understanding the unique application of digital techniques to training simulators. 2.2 Definitions
Prerequisite to any detailed discussion of training requirements is an understanding of the terminology used. The terminology employed herein is not necessarily at odds with colloquial umge. However, many 92
DIGITAL TRAINING DEVICES
terms are used in a restrictive and somewhat narrow sense and thus can cause confusion if not properly explained. “Operational system” or simply “system” will in this article mean the functional manlmachine system or complex. Thus, the actual bank bookkeeping system or the real submarine are “operational systems.” “Training requirement” refers to the need to impart knowledge and skills, not possessed by the personnel assigned to operate the system and essential to successful operation of the system. I n the case of the bank bookkeeping system, the tasks that the operators are unable to perform properly, rapidly, or efficiently and that are essential to the effective use of the system constitute a training requirement, For example, keypunch operators may possess the usual skills required to operate a keypunch, but might not understand the function of the punched cards in the particular bank system. If this understanding is essential to effective use of the total bank system, then this constitutes a training requirement. “Training concept” means the method or methods used to satisfy a particular training requirement for a particular operational system. “On-the-jobtraining” means training effected by using the operational system to provide training while the system is in use. “Training device” denotes any mechanical aid used in training. “Trainer” will denote any training device that exercises the student in the performance of a task or function. “Training simulator” means a trainer that simulates or attempts to duplicate desired aspects of the operational system and is designed exclusively for training purposes. (Sometraining simulators are used for purposes other than training. Rather than confuse the issue, it will be assumed in this article that a training simulator is at least designed for the sole purpose of giving training. An example of a dual purpose device is given below.) “Task trainer” will mean a trainer that exercises a student in the performance of a particular task. A training simulator generally is concerned with a complete set of tasks and provides for training by providing a simulation of the operating environment. A task trainer, on the other hand, is concerned with one or a t the most a few tasks and does not necessarily provide simulation of the environment. “Training rationale” is the logical justification for the assumption that a particular training simulator will solve part or all of a particular training problem. The training rationale, thus, is an analysis and evaluation of a particular training simulator system design insofar as that design fulfills the training requirements. 93
CHARLES R. WICKMAN
2.3 Training Problems and Requirements
The need for a trainer is indicated when the man cannot acquire a necessary level of skill by on-the-job experience, “dry run” practice, or verbal instruction or written directions. A trainer can provide the opportunity for practice that may not be present on the job. For example, emergency situations can be set up on the trainer and responses practiced without hazard to the trainee. If he makes a mistake on the trainer, i t is not fatal. The situation on the trainer can be modified to permit learning by simplifying the task. An easy problem oan be set up first and practiced and then the difficulty level can be increased in gradual steps. As examples, a trainee pilot can practice control of each axis separately and then altogether; the sonar echo can be clear and distinct at first and then gradually obscured; the target can be moved in a straight line at first and evasive maneuvers gradually added. This problem control from simple to difficult is one of the main advantages of the use of a trainer, since practice in the real-world situation in gradual steps may not be possible. The trainer may also be necessary to sense and record trainee aotions for evaluation and critique. This is important where detection of errors or mistakes with the operational equipment is not possible. Comparative scores and measured skill levels are also obtainable in a trainer and may not be available with the operational equipment. These and other considerations determine whether a training problem exists in a given situation. The scope of this article is limited to those training problems requiring a trainer of sufficient complexity to use digital techniques. To a great extent, the article is concerned with an even more restrictive set of training problems, namely, those requiring training simulators. I n general, a training problem requiring use of a training simulator involves complex functions of the man in the system. For example, the functions performed by the pilot of a modern high-performance aircraft are very complex. This complexity is inherent because of the multitude of pilot tasks and is compounded because of the extremely short response times involved. An error can have very serious consequences. Thus, the complexity and criticality of the functions of the pilot indicate a training requirement that in turn indicates a sophistioated fiight simulator. Similarly, the functions of the submarine team involved in a complex torpedo attack problem are complex and important. I n this case individual tmks may be well understood, but what is required is that the total team effort be very smoothly coordinated. Individual tasks 94
DIGITAL TRAINING DEVICES
must not only be performed proficiently but the total set of tmks must be integrated successfully. Thus, the complexity and importance of team activity in a complex manlmachine system again result in a training requirement that indicates the need for a complex team-training simulator. The two examples given above are illustrative of the two principal types of training problem encountered. The first is characterized by complex individual functions. Successful operation depends to a great extent on the individual; he must be proficient in the execution of his assigned tasks. This type of simulator will be called an individual skill trainer. The second type is characterized by the complexity of interaction of many individuals. The individual tasks may be simple, but the coordinated functions of the team are not. This type of simulator will be called a team trainer. Complex team activities may and generally do involve complex individual functions. For example, the flight simulator is designed to solve an individual training problem. Once the pilot is proficient in operation of the aircraft he must become proficient in use of his aircraft as part of a team, such m a hunter-killer anti-submarine group. Thus, simultaneously, there is a training requirement for an individual skill trainer and a team trainer. Although many skill trainers are also useful for some aspects of team training and team trainers may incorporate features that allow use for skill training, this article will separate the two categories for simplicity of discussion. This categorization is somewhat artificial but clarifies some of the considerations underlying the implementation problems. As has been emphasized, one of the characteristics of a man function that may indicate the need for a training simulator is the complexity of the function. This is not a sufficient condition, since many complex functions require skill and knowledge already possessed by the operating personnel. Furthermore, in some cases the complex functions can best be learned with training devices other than simulators. However, complexity of function and interaction of many simple functions are generally a necessary condition indicating the possible need for a training simulator. An important consideration in understanding training simulators is the scope of training obtained by their use. No training simulator satisfies the complete training problem. Other training methods must also be used. For example, the Apollo Mission Simulator is being constructed for the training of the crew for the manned lunar mission. This simulator by itself is not sufficient to impart all the training 95
CHARLES
R. WICKMAN
required, It will be supplemented by clwsroom instruction, physical conditioning, and p a r t - t d training on other trainers. The complete mission simulator serves an important role in training but is not the only training technique to be used.
2.3. I lndlvldual Skill Tralners An individual skill trainer is a device used by one man at a time, and on whioh he learns and practices specific individual skills. These skills can be conveniently categorized into the following five groupings. (a) Perceptual Skill Qroup: A perceptual skill involves the ability of the man or trainee to see, hear, feel, or otherwise recognize nuances within the stimulus signal and to detect signals under confusing conditions, If the input signal or other informational input is likely to be missed, confused, or misinterpreted, perceptual training is required. The sensory organs of the man must be “trained,” so to speak, to detect the fine shadings and discriminatethe significant cues from the background noise. In engineering terms the sensitivity of the perception organ is increctsed so the man can respond when the signal-to-noiseratio is small. With appropriate training this sensitivity can be increased to the threshhold of detectability inherent in the man. The man’s discriminatory threshold is in certain caaes lower than can be achieved with present state-of-the-& equipment. Sonar, radar, and photo interpretation &ills are examples of tasks requiring perceptual training. A training simulator must provide cues with sufficient realism so the man can learn these perceptual skills. (b) Motor Skills: Motor skills involve muscles. If a tmk requires a degree of accuracy, timing, rhythm, or strength, motor skill training is indicated. Typing, weight lifting, and sending Morse code are typical examples of motor skills requiring training. A motor skill trainer must provide the equipment on which these motor skills are praoticed. (c) Pwcepto-Motor Skill.8: Percepto-motor skills characterize those tcleks in which perception is directly related to the motor skill. Tracking tasks such aa vehicle control and moving target marksmanship are percepto-motor skills that require training. Other skills such as stationary target marksmanship, surgery, and the playing of most musical instruments involve a motor skill with perception and are also called percepto-motor skills even though the motor fask is significantly more pronounced than the perception involved. These skills also require training. A percepto-motor skill trainer must provide both the perceptual cues and the equipment for making the skilled motor responses. 96
DIGITAL TRAINING DEVICES
(d) Procedural Skills: A procedural skill is the ability to perform a sequence of tasks in appropriate order and at the appropriate time. Even though each ta& is simple, the multiplicity of tasks and their performance on cue can create a training problem. The astronaut tasks in Project Mercury are a good example of a series of simple tasks in which an orderly and timely procedure was of significant importance in mission success. A procedural skill trainer must present all cues to action present in the actual situation. Since motor skill is not involved, the response equipment is less important unless it provides additional procedural cues. (e) dlental Skills: Other skills are characterized as mental skills and include decision making tasks, mental processing tasks, and memorization tasks. These skills may require a trainer but the rationale for equipment is beyond the scope of this paper.
2.3.2 Team Trainers One of the purposes of any operational Navy submarine is to carry out conventional torpedo attacks, against either surface targets or other submarines. The torpedo attack has several phases culminating in the launch of the weapon and destruction o i the target. This attack may take several hours and requires the use of many submarine subsystems including the submarine itself. There is little opportunity to effect sufficient training aboard a real submarine. Dry run or dry firing practices help, but do not solve the training requirement for assured peak performance. The training requirement for a submarine attack team is the provision of a situation in which all relevant cues are provided and tasks can be practiced so that the desired level of skill is attained. The training situation must be so designed that the practiced tada will transfer to the operational tasks. If the operational hardware is used and stimulated with realistic signals, the transfer is almost certain to occur because the inputs affecting the man are equivalent to the actual inputs and his responses are the same as the actual responses. The first step in establishing the training requirements is to determine those elements or tasks that cannot be practiced effectively in the aotual submarine on the open ocean. Immediately, it can be seen that submarine operation, engine operation, housekeeping tasks, etc., can be practiced to any desired skill level. Thus, their inclusion in an attack trainer is not warranted. The actual detection, closure, and attack on a target, however, cannot be practiced from beginning to end. If the attack cannot be practiced in total, elements of the attack procedure and integrated task 97
CHARLES R. WICKMAN
performance cannot be performed until an actual attack is in process. Mistakes here can result in serious consequences. To assure peak performance a series of practice situations are required, The trtlining requirement is the provision of sufficient practice of a complete attack or attack phases so that in the actual attack all manual tasks or man functions will be performed effectively. Each man who contributes to the success of the attack is important and his performance must be added to the total team effort. The first area of concern is thus to identify the man functions that are directly related to the attack. In the submarine this includes the sonar operator, fire control personnel, launch personnel, and conning station personnel. Engine room, stores, and mess personnel and others not directly involved do not contribute to the success of the attack and can be safely eliminated. Second, the equipment that is used by these personnel in an attack must be identified and those equipment features which contribute directly or affect the attack or relate to the man’s performance should be determined and specified. This includes the weapons to be employed. Third, the probable target and ocean environment characteristics that affect the attack must be ascertained. These three areas constitute the basis for the training requirement. Each area must be studied to determine specific details that relate to problem success and these details listed as necessary to fulfill the training requirement. For example, the sonar operator position is studied to determine the relevant cues which must be provided on his displays. The cues he uses during an attack must be provided in the trainer so he can practice responding to these cues. The fire-control equipment must be analyzed to determine the part it plays in the attack problem and what effect the team members can have on its function. Equipment should be provided in the trainer that plays a similar part and can be affected in a similar way by the trainee operators. The targets in the problem should have the same characteristics as real targets. They should move, maneuver, hide, and otherwise act as real targets would act when under attack. Those aspects of the ocean environment such as thermal layers, sea state, and signal attenuation should be included in the trainer because they affect the team performance. Ocean and target characteristics should be simulated on the equipment with which the men come in contact. To be an effective trainer other features are required to enable learning to take place. Simple attacks must be provided EM well as more complex and difficult problems. Emergency situations and degraded equipment situations should also be provided because an actual attack may occur under these conditions. 98
DIGITAL TRAINING DEVICES
Contingencies of reinforcement and realistic outcomes should be included so that each man can recognize the results of his and the team’s action. Means of evaluation of each man’s performance as well as overall team performance should be provided to insure that faulty or erroneous performance is detected and corrected. I n this type of team trainer, it may not be necessary to include all of the tasks performed by each man. Only those relating to team performance are pertinent to a team trainer. On the other hand, in an individual skill trainer, all tasks the man performs should be included if total training is necessary. If all tasks are not included, then parttask training results and the man must learn to coordinate and integrate the total tasks in a different trainer or on the job. 2.4 Training Concept
Once the training requirement is known, a concept must be developed for satisfying the requirement. This concept will suggest the various ways by which all of the requirements are met. If a training simulator is indicated, the training concept will specify the training to be given in the simulator and will further specify the types of simulation required and the degree of fidelity desired. The training concept serves as the basis for the system design of the training simulator. The training concept should ideally be independent of implementation considerations. If, in the context of the training requirement, certain training can best be satisfied or can only be satisfied with a simulator, then the training concept would so indicate. However, it is pointless to specify training simulator characteristics that are impossible or impractical of attainment. For example, i t is currently impossible to attain a zero-g environment in a training simulator and therefore it would be of little value for a training concept to specify that a training simulator provide a zero-g environment. However, if a real need for a particular feature is indicated, it may serve to direct development of new techniques to enable the feature to be included, With regard to zero-g environment, there has been serious discussion for inclusion of a training simulator in an orbiting vehicle so this aspect of environment can be obtained. Thus, although the particular effect is impossible of attainment today, thoughts have been directed toward its eventual realization because the requirement exists. The scientist developing a training concept is generally a human factors specialist and may not be technically competent to judge the femilibity of a particular implementation feature. Thus, without assistance from competent engineers, there is a danger of not obtaining optimal features in a trainer. I n owes of doubt, the human factors 99
CHARLES R. WICKMAN
apecialist should request a particular feature and leave the implementation decision to the engineers for realkation. If the engineers cannot provide the feature within the state-of-theart, reliability, or economic limitations, they should discuss the difficulty with the human factors specialist so that the best possible compromise can be obtained. The human factors specialist should be consulted so that the compromise feature will provide the most training value short of the optimum. Thus, the lack of realism of an approximate solution may not have a critioal effeot on training. The training concept thus specifies the manner and degree in which the training requirement will be met with a training device. It is the result of numerous trade-offs and compromises between training requirements and equipment capabilities. A TV view from a camera descending onto a model and shown on a screen ahead of the pilot in a flight trainer to simulate the view of an aircraft landing is one concept for providing an out-the-window view. Obviously the TV picture is not realistic but it provides selected cues for the pilot and enhances training. Another concept for the same purpose is the use of a filmed view on an actual wrap-around screen. The former has the advantage of a dynamic display that responds to pilot actions, while the latter provides better depth effects. However, neither meets the full training requirement, which is for the pilot to lemn to land an aircraft. The selection of one of these or another technique must be baaed on the training concept. The TV presentation is better if the trainer is designed with internal cues for the pilot’s action. The filmed view is better if the pilot must observe ground cues to initiate action and if the problem can be stopped when an action is not initiated at the correct time. 2.5 Training Rationale
Once the training concept has been developed and a system design for a training simulator formulated, it is then required to analyze and evaluata the efficacy of the system design of the simulator in satisfying the training concept requirements. T w o results may be expected from this training rationale analysis. First, if deficiencies or excesses in a training simulator system design are noted, the simulator design may be modified. If the deficiencies or excesses are minor this is practical. Slight ohanges in the problem setup provision may add additional realism or permit simpler problems to be set up to enhance early atages of training. Additional sound tracks may be added for realism or to provide an overlooked auditory cue, or additional recording oapacity may be added for measurement and evaluation of responses. Excesses that may be removed are those that have proved to be costly to provide and are not sufficiently realistic to fulfill their intent. 100
DIGITAL TRAINING DEVICES
These excesses occur mostly in the inability to provide sufficient realism in displays. Simulated views of the real world and realistic sounds are frequent CMS where excessivecosts occur and do not provide sufficient training value. The second result of the training rationale analysis is the determination of additional training devices or programs to satisfy requirements that are not or cannot be met with the simulator. This may result in changes to the basic training concept for the simulator, which in turn could lead to changes in the simulator design. For example, if the trainer concept includes navigational training, some provision for navigation will be included in the simulator design. If the training rationale analysis shows that the navigation capability provided in the simulator is not effective and is too costly to improve, a separate navigation trainer may be indicated. If this trainer will provide effective navigation training, then the concept of the original trainer can be modified and the provision for navigation removed. The training rationale is thus the analysis of the resulting implementation of the training concept with logical assumptions to demonstrate its effectiveness or the need for modification. 3. Training Simulators Using General Purpose Digital Computers
3.1 Introduction
The use of general purpose digital computers is probably the most dramatic use of digital techniques for training of people. It is also a restricted application since the majority of training devices currently in existence do not employ general purpose digital computers. There are several reasons for emphasis of the general purpose computer application. First, since training simulators employing general purpose computers also use many other digital techniques, they are illustrative of the general application of digital techniques. Second, the discussion is timely since the use of general purpose computers for training devices is of very recent advent. For these reaaons and because training problems requiring the use of general purpose computers are generally more complex, and hence more interesting from the engineer’s viewpoint, this article stresses the application of computers. Digital computers are used in a training simulator only if appropriate to the training system concept. There is seldom an absolute training requirement for use of a computer. In the following discussion, the dependency of the computer on the system concept is very important, A statement made here cannot be considered as a general and inviolate rule, but must be considered applicable only if it is suitable for a specific system concept. lOl
CHARLES R. WICKMAN
General purpose digital computers are generally suitable for application within a training device only when an economic advantage results from their use. I n a training device, the total system and system costs are usually definable and, therefore, a valid cost comparison can be determined. The total functions implemented within the computer can be specified; and the total computer cost, including programming, can be compared with the cost of non-computer implementation. Everything else being equal, the method having the lowest total cost is selected. Other factors do intrude, such as flexibility and growth, and it is difficult to assign costs and value to these factors; but in general, a close approximation of alternative mats can be realized. The application of general purpose computers, therefore, must be considered with regard to both total system concept and overall economic advantage. A training simulator consists of three conceptually distinct parts, namely, the student environment, the instructor console, and the realworld sirnulator. These distinct parts will be discussed separately in the following sections. 3.2 Student Environment
The first distinct part of any training simulator is the student environment. The student environment is intended to duplicate all aspects of the operational system environment that are significant to the training requirement. The student environment contains all instruments, controls, and other effects indicated by the training requirement, For example, in a trainer designed to teach a student how to effect the landing of an aircraft in a thick fog, the student’s environment must contain the instruments normally located on the panel of his airplane; a method of communicating with the (simulated) ground station; and the controls that he normally uses to maneuver the aircraft, such as the elevator, rudder, and aileron controls. I n this particular application, the simlation of the view out of the aircraft’s window serves no useful purpose and therefore need not be simulated. It is desired to provide an environment with sufficient realism that the student engrossed in a problem will forget that he is in a simulator and act and function as he must within the operational environment. If this realism is achieved, maximum transfer of training will more probably ocour. This does not necessarily imply that every feature of the operational environment be present in the training simulator; only those that affect the function being t r h e d need be reproduced. It is always possible for a disinterested observer to discern many differences between the training simulator and the operational system. 102
DIGITAL TRAINING DEVICES
The submarine attack trainer will not have all physical appurtenaxices existing in the real submarine. The view out of the window of a flight simulator will not duplicate the view from a real aircraft. These are not important to training. (If they are, they must be provided.) The trainee soon forgets these nonessential discrepancies and believes he actually is aboard the real submarine or is flying the real aircraft. A flight simulator that almost “crashes” causes real anxiety. Just as the ardent concert attendee automatically compensates for the minor deficiencies of the auditorium, so does the professional pilot compensate for the nonessential deficiencies of the flight simulator. If this were not the case, training simulators would have little value since it is impossible to design a training simulator that exactly duplicates an operational environment. The problem of designing the student environment, then, is to provide that realism indicated by the training requirement. Other aspects of the environment can be ignored. For convenience, the environment can be separated into three parts, the physical appearance, the dynamic similarity (or static similarity), and the motivational similarity. The physical appearance or static similarity includes the shape, texture, color, feel, and placement of objects within the environment. The fidelity of physical appearance varies greatly with the training problem. I n an operational Aight trainer, essentially every internal physical aspect of the aircraft is duplicated. All controls, instruments, and obstructions are duplicated. I n fact, many portions of the operational aircraft are used. At the other extreme, a tactics trainer used by fleet commanders has almost nothing that duplicates the appearance of real equipment. Only the information used in the problem is required. The training situation and the training problem are the determining factors. Physical appearance is reproduced only if it is required for training. The dynamic similarity of the environment includes all student dynamic simulation and response mechanisms required for training. This includes not only controls and displays, but also such simulation aa background noise, out-of-window views and temperature effects if these are important to training, time varying displays, and changing visual environments. Again, only the essential features are included in the training problem. Dynamic similarity can be divided into two categories, open loop similarity and operational similarity. (a) Open Loop Similarity (programmed cues): Open loop similarity describes the kind of dynamism that generally occurs in the absence of pilot or operator. Uniform aircraft motion which might include 103
CHARLES R. WICKMAN
buffeting, vibration, and soundand programmed changes on the environment are principal cues for open loop similarity. An example of dynamic similarity is motion pictures of the flight environment that provide programmed cues. (b) Operational Similarity (unprogrammed cues): This is the more important variety of dynamic similarity because it concerns the response or behavior of the simulator as a result of what the operator does. It can be termed “closed loop,’’ since most sequences of cues are dependent on what the operator does and therefore cannot be programmed in advance. Only the simulator providing unprogrammed cues or cue sequences c m provide operational similarity. Operational similarity is the key aspect of the flight trainer, the factor most directly responsible for learning. However, in many aspects it is very difficult and in some respeota impossible to completely simulate all things flight vehicles can do. I n most cases the principal concern of the simulator designer is a design that incorporates a high degree of operational similarity with some real aircraft. Motivational similarity deals with similarity in feeling or attitude on the part of those trained in the simulator as compared to the feeling experienced in the actual vehicles, whether ship, aircraft, submarine, or tank. This can be only partially achieved, because of its intangible nature. Although it forms a critical aspect of the training simulator it is often ignored. Unless the operators being trained capture some of the feeling associafed with the events simulated, much of the value of even the finest operational simulator may be lost. Naturally the attitudes of the crew in the sirnulator will differ in many respects from what they would feel on a real mission. I n the simulator, the attitudes should be such that they motivate good performance. The pilot in the simulated aircraft will not feel in danger when he performs poorly, as he might on a combat mission, but he should feel disturbed. The submariner will have difficulty experiencing the same urgency he would experience when pitted against an enemy submarine. The sonar operator will not get the same experience he would if a real torpedo were fired at his ship. The esprit de corps of the weapon system team will be evident to a different degree in the simulator, but if training is to be effective it must be present. The motivation to learn and to perform well on the simulator muet be strong, even though it will be different in both quality and intensity from the motivation for performing well on the real mission,
3.3 Instructor’s Console The seoond distinct part of any training simulator is the instructor’s console. The instructor’s console contains all the controls required to 104
DIGITAL TRAINING DEVICES
set up and control a given training problem, as well m all displays required to monitor the problem progress and student’s activity. I n addition to the functions of problem setup and control, the instructor will provide some of the capability not present in the training simulator. I n particular he usually provides any required voice communications from outside the training simulator. The design of an instructor’s console depends upon the information and control requirements of the instructor operator. These requirements are based upon the nature of the operational problems to be conducted on the simulator, the variation in the problems required, and the functions of the console operator-instructor. I n general, the instructor’s console will intedace directly with the computer and for the most part will control the problem by setting values within the computer. I n some cases, problem control will be accomplished by direct connection with the operational or simulation equipment. Communication links generally do not involve the computer. However, if automatic control of propagation characteristics is desired, these may be convenient for computer implementation. To perform these many functions, a carefully designed console is required. The minimum number of displays and controls should be used in order to minimize the complexity of the instructor’s twk. As many functions as possible should be handled automatically by the computer to relieve the instructor of routine noncritical functions. Those displays that are essential should be emily read and provide the specific information required by the instructor to eliminate complex mental tasks. The controls should be easily reached and tw simple tw possible to operate. Thus, design of the instructor’s console must be based on a number of interrelated factors. The role of digital techniques in the design of the instructor’s console includes not only standard displays and controls but also the intimate relation between the console and the computer. An operator using a properly designed instructor’s console in a training simulator need have no knowledge of computer operation. I n fact, there is no requirement that he even be aware that a computer is part of the system. A properly designed instructor’s console is the best example known to the author of a problem-oriented computer console. As such, it is instructive to examine the design and use of a typical console. Similar techniques are applicable to the design of any problem-oriented console. An excellent example of such an instructor’s console is that provided for the operation of the Submarine Attack Center Trainer at New London. 105
CHARLES R. WICKMAN
Operation of the Submasine Attack Center Trainer is controlled by means of Honeywell-designed and -developed consoles located in each of the three attack centers and in the tactical display room. Consoles me designated Master Instructor’s Console, located in the tactical display room, Program Operator’s Console, and Assistant Program Operator’s Console, one each installed in each attack center. The Master Instructor’s Console (Fig. 1) is the control and monitoring instrument for the entire Submarine Attack Center Trainer, and controls problems, assigns vehicle designation and control, and designates the modes of operation for all attack centers, &B well as the tactical display room. I n addition the operator may select any attack center problem for monitoring, either by digital display of problem parameters and status, or by optical projection of problem tracks on the tactical display room screen. I n all, except independent attack center problems, the Master Instructor may override the Program Operator or the Command Center for instructions or critiques or for insertion of d d i t i o n d information. In order to set up a typical tactical problem on the trainer, a minimum of input directives must be inserted into the computer. The input directives me called “commands.” The commands or input words establish the parameters of the problem. The parameters include the initial location of the attacking submarines and targets, motion of the targets, and other features of the simulation. In order to exert overall control of the problem, the Mmter Instructor has 18 “words” or computer command entries for use at his discretion. Computer command entries and their functions are given in the accompanying tabulation. The console operational requirements are in five me-: (a) Establishment of problem relationships between vehicles (b) Initiation or cessation of operations (0) Insertion of data into the system (d) Obtaining information from the system (e) Control of services All of these operations except the service controls have a similar sequence of procedures. The services are controls that do not directly involve the computer system. The service controls provided include: console illumination intensity, projector power and lamp controls, communications, radar and sonar monitoring, and maintenance mode aontrols. The sequence of procedures for constructing and entering words to the computer system basically calls for: (a) The selection of the vehicle 106
DIGITAL TRAINING DEVICES
(b) The selection of the order type (function) (0) The entering of order value (by Keyboard) (d) The introduction of this data to the system (by Execute switch). Logical interlocks in the consoles and computer program checks 1 07
CHARLES R. WICKMAN
will prevent the input of impossible or incomplete information
to the system for the protection of the aomputer program (e) The operator is furnished information concerning the suitability
of his selection by the status display of the Execute switch; thus, the system will be safeguarded and the operator informed if his entry is acceptable Computer command entry
Function
Freeze
Establishing the type of ship for targeta Establishing the Master Instructor’s Console or Atteok Center as target controller, and ability to sense Assigning targeta to a mode of operation, i.e., Independent or Convoy Eetebliahing psrametere for eaoh target Check geographical location of target Provide sinuous course for taxgeta in convoy Entering course, speed, and turn rate of convoy Display of problem on Tactical Display Room Screen Obtain dsta on target relative to an operating submarine (08)’ 0 8 to another 08, and 0 5 to weapon Start the problem Stopping the problem (problem can be reeumed
Deactivate Hit c l w Soreen scale offset
after hem) Removing target from problem Clew hit light from target if oontrolled by Mae Size of simulated 008811 to be used, centered
Type eseignment Problem essignment
Taxget mode Order poeition and motion Position and motion readout Convoy mode Convoy order Problem select Relative position reedout Stert
Time rate Delete time merks
about selected vehicle Starting problem time Real time or other time rate Removing time maxka from vehicle projector
Reeotivate time mmka
treoke Resume time marks for vehicle projector traoke
Start clock
A typioal entry to the computer using the Mmter Instructor’s Console would be as follows: To order target 2 to make 20 knots, the instruotor would press target 2 button, press the speed button, set in 20 on the keyboard, verify that 20 r e d s out on keyboard, verify that the execute button is m e d , and then press the execute button. Target 2 would then come to 20 knots at the prescribed acceleration or deceleration rate established for the type of ship that target 2 represents. It is estimated that the Instructor has 1048 control variations over the vehioles and weapons at his command. 108
DIGITAL TRAINING DEVICES
3.4 Real-World Simulator
The third distinct part of any training simulator is the red-world simulator. The red-world simulator must generate all the effects required in the student environment. For example, the real-world simulator might be required to generate the aircrdt’s air speed. The air speed is a function of the aircraft being simulated, the throttle position set by the trainee, and the wind gusts set by the instructor. The relationship between the aircraft’s air speed and all of the parameters controlling it is usually given in terms of mathematical equations. A computer is most reasonably employed in fulfillment of this function of ml-world simulation, but it is also intimately associated with the other functions and often provides for the display of quantities to the instructor that are not otherwise available. In providing real-world simulation, the computer is, in essence, taking account of all significant effects that occur in the operational environment but are not capable of exact duplication within the training simulator. Any necessary characteristic of the training problem that concerns the physical world external to the duplicated environment must be simulated. Motion simulation is a prime example of such an effect. In an operational flight trainer, it is intended that the pilot “fly” the trainer. Since the trainer does not actually fly, some means must be provided to simulate the aircraft flight motion. In a training simulator this is accomplished by solving motion equations that provided for proper activation of all indicators available to the pilot. Hence, the solution of the motion equations substitutes for the real-world environment of the actual aircraft motion. The adequacy of these equations and their solution determines in large part the efficaoy of the simulator. The Fleet Ballistic Missile attack trainer, installed at New London, Connecticut, uses a Honeywell 800 digital computer to provide many of the real-world simulations required by the tr*g problem. The principal computations performed within the digital computer are: Own 8hip m o t h
Target motion weapon m o t h Weapon-target hit evaluation O m ehip-target relative mmh f?O?lUr8
i m a h
Each of these is critical to the training problem and represents characteristics of the real world that cannot be duplicated within the 109
CHARLES
R. WICKMAN
trainer but must be simulated. The net result of these simulations is to provide an environment for the trainee that has no significant training difference from the real world. It will be noted that five of the six real-world simulations listed above involve motion or effects of motion. This is typical of training simulators that concern moving vehicles of any type. Since the training simulator does not move, simulation must be provided to substitute for the vehicle motion. Even so-called moving base simulators have only restricted motion artificially induced, and require simulation of the real vehicle motion. The sixth item listed, sonar simulation, does not involve motion as such, but does involve propagation of acoustic energy in water, which cannot be duplicated within the training simulator. Hence simulation of the propagation and return of the sonar energy are required. Three types of vehicle motion are indicated, namely, own ship, target, and weapon. These m separate and distinct simulations, not because the vehicles are different, but because the detailed training requirements are different. The trainees are “aboard” the own ship. Hence, own ship motion simulation must have detailed and precise characteristics. Target motion is perceived by the trainees only aa it appears on various sensor equipment. Therefore, simulation of target motion can be gross, since the trainee cannot peroeive extremely debiled motion perturbations. Although detailed simulation of target motion would not detract from training, detail is not required since it is costly to provide and adds no value to the training. Not all real-world simulation is provided by the computer. Some effects a m better simulated in special computing elements, and some are essentially impossible to implement within the computer. Voice communication from the attack center to other stations of own ship is simulated by letting the instructor represent the other stations. In this role, the instructor is effecting real-world simulation. I n addition to real-world simulation, the computer may also perform oomputations that effectively simulate equipment within the trainee environment. For example, consider a sonar simulation. The real-world simulation consists of computing the position and timing of the indicated sonar return, taking into account the necessary propagation and target reflecttion characteristics appropriate to the training problem. If operational sonar equipment is used in the training simulator, the output of the real-world simulation will be compatible with the physical characteristics of the sonar. No simulation of the prooessing characteristics of the sonar system is required. If a simulated sonar is used, further computationsmust be performed, which effectively duplicate the internal characteristics of the opera110
DIGITAL TRAINING DEVICES
tional equipment. Many times i t is simpler and more economical to simulate equipment rather than to use operational hardware. No general rule can be given. Only by study of each given problem can it be determined whether the operational equipment should be stimulated by a real-world simulation or whether a complete simulation of the equipment should be undertaken. Because of the critical nature of motion simulation, i t is of value to explore the problem in more detail. The following is a detailed discussion of the own ship motion simulation provided in the FBM attack trainer at New London, It is typical of the considerations underlying motion simulation. 3.4. I Own Ship Motion Simulation
(a)Introduction: The attack center in the New London FBM Trainer is stationary as opposed to moving base simulators. Therefore, the trainees receive no physical sensation of motion, Instead, an illusion of motion is created by activating the various repeaters in the attack center, which imply that the own ship is moving, such as the speed indicator, course indicator, and depth indicator. The motion of the own ship is also displayed to the instructor or program operator in the form of digital displays and position graphs. The program operator must not only monitor these displays from the standpoint of an instructor, but also act as a helmsman and msure that various orders pertaining to the motion of the submarine are properly executed. The purpose of the own ship motion mathematical model is to accept orders from the program operator and to subsequently generate and display the motion of the ship consistent with physical laws and the characteristics of the particular submarine being simulated. The perceivable quantities that must be displayed are:
(a) Components of horizontal position (X,Y ) (b) Speed (4 (c) Course angle (C) (d) Rate of change of course angle (el Depth (2) (f) Depth rate (2) (g) Rudder angle (Sr)
(c)
There are four major aspects involved in the own ship mathematical model : (a) Selection of a general set of equations that describe the dynamic motion of a submarine 111
CHARLES R. WICKMAN
(b) Construction of helmsmen algorithms to simplify the orders of the program operator (0) Development of a numerical integration method that can be used to solve the dynamic equations of motion (d) Determination of coefficients in the general equations of motion to simulate a particular submarine (b) E q w t i m of Motion: It is necessary to obtain a set of equations representing the dynamic motion of a submarine under the action of propellor thrust and hydrodynamic forces against both hull and movable members, such as the rudder and stern planes. To completely describe a submarine’s motion, six degrees of freedom are required. Three variables are normally used to locate the submarine’s center of gravity (0.g.) with respect to a fixed coordinate system, and three variables are used to describe the orientation of the submarine body axis with respect to the fixed coordinate system. The variables usually chosen are the X , Y, and 2 components of the c.g. position vector and the roll, pitch, and yaw Euler angles. I n this training simulator, the six output quantities corresponding to the six degrees of freedom are not required since the roll, pitch, and yaw of the submarine are not displayed to the trainees. Therefore equations, which correctly desoribe the motion of the submarine’s c.g. as a function of tactical decisions and are consistent with hydrodynamic effects, are completely adequate for the training purposes of this attack center. A major consideration in the construction of these equations was that it should be possible to evaluate the coefficients for a particular ship on the basis of tactical data memured a t sea rather than of data obtained from experiments on reduced scale models representing the ships. The coordinate system used in this trainer is the left-handed system shown in Fig. 2. North is taken as the Y axis and East as +X axis. Depth is taken as positive downward along the +Iaxis. ; The course angle (0) is measured as the clockwise angle from the Y axis to the projection of the velocity vector of the ship onto the X-Y plane. The dive angle (D) is measured as the elevation angle from the X- Y plane to the velocity vector. In Fig. 2, D is negative since the velocity vector has a downward component.
+
The empirically constructed equations of motion to be used for advancing the own ship are the following second-order differential equations:
fi =Ai{J’-(1+Aa
I Sr I )S){J’+(l+Aa
I
Sr I )d+AsS)
(3.1)
DIGITAL TRAINING DEVICES
D
=
-{A,SB+A,B
I b I +A~D+A,@+A,~P~,}
(3.3) where 6, is the rudder angle; 8. the stern plane angle; d3 the ordered speed; and A,, A,, ., A,, are coefficients to be determined for each particular ship. Unless otherwise stated, the units employed in this report are: for distance-yards ; for angles4egrees; and for time-seconds. Equations (3.1), (3.2), and (3.3) permit the calculation of the submarine's velocity vector m a function of time. However, the coordinates of the velocity vector will be given in terms of spherical coordinates (8,C, D). In order to transform the velocity vector into rectangular coordinates, the following transformation equations become necessary:
..
S =800sDsinC
(3.4)
P
=
2
=
(3.5) (3.6)
8 COB D OOSC -8sinD
x (EAST)
FIG.2. Coordinate system.
For the sake of convenience,it is postulated that the longitudinal ctxis of the submarine is tangent to the motion versus time curve ( i a , pardel to the velocity of the submarine). This may be interpreted m 113
CHARLES R. WICKMAN
using equations of motion with five degrees of freedom with a constraint on the pitch and yaw of the submarine. The postulate permits the use and dive angle ( D ) ae angles desof the quantities course angle (0) cribing the yaw and pitch of the submarine, as well as referring to the orientation of the velocity vector. Roll must, however, remain indeterminate in the problem. It would be desirable to have equations of motion that would yield the dependent variables, X, Y,and 2 directly in terms of the independent variables #, s, s,, and time (t). However, since the forces acting on the submarine m velocity dependent and position independent, it is necessary to first solve for the submarine’s velocity. Equation (3.1) represents the acceleration of the submarine ae a function of speed (S), ordered speed (,S), and rudder angle (8,). If 8, = 0, then the acceleration becomes a quadratic function in both S and ,,S, Equation (3.1) is a modification of 8 = (08p-Sa) in which the thrust is proportional to ,,#a. When a rudder angle is inserted, the effect is to essentially decelerate the ship as though the present speed were raised to (1 A , I 6 I )S and the ordered speed remained as before, Since 1 is equal to zero if and only i€ {,,S-(l A , I 6, I )S}= 0, the ratio of final speed in a turn to ordered speed (usually the same ae initial speed in a turn) is a constant for a fixed rudder angle. Equation (3.1) was formerly written with a rudder term ( 1 - A p 1 8, I ) multiplying ,,S instead of 8.Conceptually, this is simpler since then a rudder angle can be interpreted m an ordered reduction of speed, Difficultiesarose, however, if at the same time that a rudder wm being applied, an ordered speed of zero was entered, thereby completely eliminating the rudder dependence in the speed equation. It would perhaps have been preferable to incorporate the effect of speed loss in a turn by making Eq. (3.1) a function of 6 instead of a function of 8,. However, since experimental data usually relate speed loss in a turn to rudder angle, this latter course waa not pursued, and it is doubtful that any improvement resulting from this change would be noticed by the trainees in the attack center. Equation (3.1) as it stands wm found not to be completely satisfactory. The acceleration of the ship became too large if ,,S were greatly different from S . A limit was therefore placed on 8 so that 8 would always remain less than 8 maximum, depending on the particular ship. It was also decided that the maximum acceleration should not be attained at once. Therefore, the acceleration is not permitted to change by more than a fixed constant each iteration cycle. However, a different constant is chosen for acceleration than for deceleration. Equation (3.2) represents the course angle acceleration of the submarine. 0 hae the usual dependence on s,6, and 8,. In a steady turn
+
114
+
DIGITAL TRAINING DEVICES
e
when = 0, Eq. (3.2) can be rewritten so that Slbecomes a quadratic function of 01s. Since C/fl is equal to a constant divided by the steady turn diameter, 6, becomes a qudratic function in the reciprocal of the steady turn diameter. The relationship between t?/S and the reciprocal of the steady turn diameter can easily be seen from the fact that the ship traverses a distance of 27rR, while moving with speed 8,in the same time that it turns 360". Therefore,
0
360
s --2nR
(3.7)
The b equation resembles the 0 equation with the exception of the Cgand D terms. The dependence of D on the roll angle haa been included through the d2term. It is msumed that the ship will not roll unless it also turns and that, in this caae, the roll will be a function of the course rate. The D term causes the ship to return to a horizontal position in the absence of other forces. The depth of a submarine is to a large extent controlled through ballasting. This factor is not included in Eq. (3.3) wherein depth is controlled exclusively by stern plane movement. It is, therefore, assumed that the ship is always in neutral equilibrium with its environment and that there exists no net buoyant force acting on the vessel. (0) H e l m Algorithms: Given the thrust and the positions of the movable members (rudder and stern planes), the submarine will change its position in accordance with the equations of motion. Normally, ship conning orders are translated into rudder and stern plane angle changes by the ship's helmsmen. In this trainer, no helmsman exists; consequently, i t is the responsibility of the program operator to perform this task. However, it is impossible for the program operator, together with his other tasks, to continuously manipulate the rudder and stern planes to effect a desired maneuver. It is therefore necessary to develop mathematical helmsmen algorithms that will take conning orders and translate them into rudder and stern plane angle changes, thereby simulating the actions of the real-world helmsmen. The ship conning orders that must be implemented are: (a) Ordered speed (b) Ordered course angle (c) Ordered rate of change of course angle (d) Ordered rudder angle (e) Ordered depth (f) Ordered depth rate Since o8 appears directly in Eq. (3.1), no algorithm is needed to 115
CHARLES R. WICKMAN
simulate this order. If, in faot, an ordered number of turns were an allowable input, then an algorithm that converts turns to ordered speed would have been required. Actually, an ,,8of zero is interpreted by the program as a rapid deceleration order involving the reversal of the screws. In this case the ship decelerates at the maximum permissible value of 8. The other ship conning orders that must be implemented control either course or depth maneuvers. It is desirable to discuss these separately. (d) Courae Control: The course of the submarine is computed by using Eq. (3.2).Since the only independent variable in that equation is the rudder angle, all conning orders affecting course must eventually be interpreted in terms of the rudder angle. Of the three conning orders affecting course, .C, $, $, the ordered course rate and the ordered rudder cannot be ordered simultaneously, since 6 is a function of 6,. Figure 3 shows the course control program flow chart. Initial tests are first conducted to ascertain if the subroutine need be performed. It is possible to order a new course, in which w e the vessel will proceed to the new ordered course, or it is poseible to select an ordered rudder or ordered course rate without specifying a course, in which case the ship will continue to “orbit” until a new command is issued. If the orbit switch is on, the rudderlcourse switch determines whether the ship is following an ordered rudder command or an ordered course rate command. Both rudder and course rate cannot be ordered simultaneously. If the ship is controlled by an ordered rudder angle, then the ordered rudder is used as the value for the rudder angle term in the equations of motion. However, if the ship is controlled by an ordered course rate, then the course rate helmsman performs the helmsman function and calculates a rudder change. In either case, the new rudder angle must undergo a series of rudder limit t a t s to make certain that the magnitude of the rudder angle and its rate of change do not exceed physical limitations. A proximity test is performed each iteration cycle to determine whether the course of the vessel is near the prescribed course, and whether the course rate is such that a deorease in rudder angle will take place by the steady-up helmsman. If the proximity test is not paseed, the submarine is advanced by either the ordered rudder control or ordered course rate control as before. If the steady-up switch is equal to zero, which condition means that a proximity test was passed during the previous iteration cycle, a new rudder angle is calculated by the shady-up helmsman algorithm. A t high speeds, the steady-up helmsman algorithm causes the ship to 116
I---1
0 l~iT,4c1zfNbFOR
r-
?CC.(lMlN
I
--_PROXI W I T Y T E S T
Fra. 3. Course control,
T C I T
CHARLES R. WICKMAN
oscillate about the ordered course with damped oscillations; eventually the ordered course is attained. Unfortunately, at low speeds the algorithm becomes unstable and the ship never attains the ordered course. In order to eliminate this difficulty the program was modified so that once the ordered course is pmsed, the c o m e is set equal to the ordered course and is not permitted to change until a new order is given, (e) Depth Control: Depth control is given by Eq. (3.3) wherein the independent variable is the stern plane angle. The stern plane angle is not an allowable conning order; consequently, planesman algorithms are necessasy to execute depth maneuvers. The depth control subroutine (Fig. 4) is similar to the courae control subroutine. However, since a stern plane angle may not be ordered, it is simpler. The submarine will move to an ordered depth according to an ordered dive rate. If no dive rate is prescribed, then the vessel will
FIG.4. Depth control.
118
DIGITAL TRAINING DEVICES
prooeed along a standard dive rate which is inserted in $. The dive rate limit test ~ s u r e sthat the submarine will not Msume a pitch angle greater than the maximum permissible pitch angle for that specific submarine. It also prohibits dive rates greater than the velocity of the submarine. The steady-up'planesman, dive rate control, proximity test, steady-up setup, and the stern plane limit tests are analogous to their counterparts in the course control subroutine. At low speeds, the steady-up planesman algorithm becomes unstable in a similar manner to the stectdy-up helmsman algorithm in the course uontrol. This problem haa been eliminated by setting the depth equal to the ordered depth once the depth pmses the ordered depth. (f) Integration Method: The equations of motion together with the helmsmen algorithms determine the motion of a submarine. It is now necessary to determine a numerical solution method for these equations that is compatible with the speed and memory limitations of the Honeywell 800 coiuputer. The method used in integrating the equations of motion is shown in Fig. 6. First the equations of motion are evaluated at time t = n. Acceleration limit tests are performed to make certain that the magnitude of the acceleration or its rate of change does not exceed allowable values. The forward integration formula
+
(3.8) fn+l =fn idt (3f, -fn - l > together with the equations of motion are used to calculate the speed, course rate, and pitch angle rate at time t = n 1. Since the constants A, and A, in the equation of motion may be negative, i t is possible, by inserting absurd initial conditions, to blow up the solution of the equations of motion. I n order to protect against this, stability tests have been inserted. The stability tests are actually very weak when orders of magnitude are considered. and The values for Cn+land Dn+l are obtained by integrating l)n+l by the trapezoidal formula: (3.9) ~ n + l= Bn idt (Bn+I+ ri,) The velocity vector can then be transformed from spherical coordinates into rectangular coordinates by the standard transformation equations. The Cartesian components of velocity are then integrated by the trapezoidal formula to obtain the new position components. The method just described, using a forward integration formula followed by the corrective formula (i.e., trapezoidal formula), is discussed by W. E. Milne in Numerical Solution of Diflerential Equations (Wiley, New York, 1953). Figures 2,3,and 4 represent a complete flow diagram for the own ship motion equations.
+
+
119
CHARLES R. WICKMAN
r-
Q
T 1 1
e 8 8F
3
a
o-. Admittedly, errors exist in the equations of motion and in the integration method employed for their solution. However, in most cmes the errors are compensated for, and in all caw8 the errors are bounded. 120
DIGITAL TRAINING DEVICES
No errors exist for the w e of isovelocity motion; both the equations of motion and the integration method are exact. The motion of the submarine normally consists of isovelocity motion intermittently mixed with sequences of maneuvers such aa changes in speed, course, or depth. At the completion of a maneuver, the conning officer reevaluates his location and motion before ordering a new maneuver. Consequently, errors must be considered in terms of a definite maneuver, and the cumulative error for a complete problem becomes meaningless. The following maneuvers are possible: (a) A change from present speed to a new speed. (b) A change from present course to a new ordered course utilizing a preassigned rudder angle. This rudder angle may be the ordered rudder angle or a standard rudder angle if no rudder angle is ordered. (c) A change from a present c o m e to a new ordered course utilizing an ordered time rate of change of c o m e angle. (d) A continuous change in c o m e from a present course by utilizing an ordered rudder angle. (e) A continuous change in course from a present course utilizing an ordered time rate of change of course angle. (f) A change from the present depth to an ordered depth utilizing a preassigned depth rate. This depth rate may be the ordered depth rate or a standard depth rate. (g) Permissible combinations of the previous maneuvers.
Since definite maneuvers me completed in finite time, the errors are automatically bounded in a loose sense. Some errors are, however, bounded in a stronger sense. In each maneuver, the final value of a dominant variable is assigned. For example, in a dive the depth is the dominant variable and the ordered depth is the final assigned value. In dl cams the dominant variable attains its final value. Therefore, the only quantities that may be in error are stable (i.e., a small change in an initial condition will have a small effect on the solution) and sinoe the initial and h a 1 values of the dominant variable axe fixed, the error in time for that variable is bounded in the stronger sense. Test calculations have shown that the errors in all the variables are small, even over large ranges of parameters. In those maneuvers wherein helmsman algorithms are used (e.g., ohange in course to a new ordered course), it is difficult to evaluate errors sinm actual ship’s helmsmen differ in their characteristics. Therefore, small errors occurring in these maneuvers can be considered unimpo~t. The coefficients A,, A,, ., A,, in the equations of motion have been
..
121
CHARLES R. WICKMAN
determined utilizing the integration technique discussed. Therefore, the errors in the equations of motion and the errors in the integration technique tend to cancel each other. For example, the constant A, controls the tactical diameter. Therefore, A, was chosen so that a true solution of the equations of motion would yield a tactical diameter smaller than the true hctical diameter by the same amount that it is increased by the integration method. This ww accomplished by adjusting A, so that the tactical diameter obtained by utilizing the present integration method has the desired value. It is mentioned that the coefficients derived for simulating the attack centers are dependent upon the integration method employed. If the integration method is in any way changed, for example by changing the iteration cycle, then it would become necessary to redetermine values of these coefficients. 4. Programming Considerations
The programming considerations attendant upon a training simulator do not differ significantly from other real time programming considerations. Fundamentally the program must implement the mathematical model within the specified time constraints and within the constraints imposed by the computer inputloutput characteristics. Ideally the mathematical model should be a complete statement of the functions of the computer including all necessary problem constraints. This, together with a complete statement of the pmticular computer characteristics including input/output characteristics of the external hardware, should enable the program to be developed and coded. Seldom in real life does such a pure division of labor obtain. The problem of concurrent development of mathematical model, external htwdware, and program, compounded by less than perfect documentation of both mathematical model and external hardwtwe, makes the task of the progranuuer very difficult. These are practical problems. The theoretical problems are generally not profound. The function of the program is to provide implementation of the mathematical model and thus effectively provide correct stimuli to the training device. Because of the external equipment consideratione and the man/maahine relationship embodied in the device, the programming of a training simulator is similar to any other real-time programming problem. However, there is one important difference: the universe that the training device encompasses is closed. Also the bounds and conditions of this universe me amenable to change by the programmer. Thus, he can sometimes solve difficult programming problems by changing the characteristics of the problem. Since the programmer is generally involved in the systems design phase, he can influence the design to 122
DIGITAL TRAINING DEVICES
prevent impractical or impossible demands on the computer. The result is generally a more balanced design than will necessarily exist in a realtime system that is not a closed universe and where most of the system bounds and characteristics are dictated by invariant constraints. A training device program can be thought of as a mapping function that transforms a set of inputs, according to the dictates of the mathematical model, to produce a set of outputs. This process is cyclic and continues for the duration of the training problem. The entire training simulator can be analyzed as a closed loop function in which the total training program is a single loop entity. Internal aspects of the program may also be closed loop functions. A characteristic of training device programming is the requirement that the program be amenable to change as the training device itself is changed. This occurs not only during the device construction, but will occur on a continuing basis after the device is installed due to changes in the training problem and/or changes in the operational systems being simulated. This requirement imposes a demand on the programmer to clearly and exhaustively document the program, and also prevents him from using shortcuts or tricks that would create difficulties at some later time. For example, it is seldom allowed to have the program modify itself during execution. Address modification is permissible but command modification per se is generally inadvisable. Furthermore, the program must generally be prepared in self-contained segments, which implies that some optimizing techniques are not available to the programmer. To some extent these same considerations apply to any real-time program and should be considered by any programmer preparing a large complex program intended for extended use outside the control of the present programmer. These considerations are omnipresent in training device programming. Another consideration in training device programming, that also obtains to some extent in any real-time program, is the necessity of constructing a program to function within a system which is itself not completely defined. This creates difficulties not only in the detailed construction of the program, but makes the program checkout or debugging with the actual system extremely difficult. If an error occurs during system checkout, it is natural for the programmer to suspect the terminal equipment and the design engineer to suspect the program. If a stalemate exists, both will suspect the computer, and the computer maintenance personnel will have a low opinion of everyone. Only by mutual regard and understanding of each other’s problems can a system checkout be satisfactorily accomplished. 123
CHARLES R. WICKMAN
Since a large-scale training device will generally involve several programmers, and also because of the requirement for exhaustive documentation, it is critical that conventions and standards of docud mentation be agreed on at the onset of a program. Again, this should be standard practice for any large programming effort, but is extremely important for training simulator programming. Assuming that these are dehed, the programming effort can be subdivided into a set of tasks, which are definitely not independent nor even necessarily sequential. Program testing consists essentially of two phases: testing independent of the training device, which is similar to any program checkout, and testing with the complete training device. After all checks of the program independent of the training device have been executed, it can be assumed that the program will at least cycle and perform computstions close to those originally desired. However, it is almost axiomatic that the system will not perform satisfactorily. Three types of difficulty will be experienced. First, although great care may have been taken in constantly reviewing the interactions of the mathematical model, progmm, and external equipment, differences will be uncovered between how the program waa intended to perform and how in reality it does perform. For example, the mathematical model may be correct over most of its range but, due to a combination of implicit wumptions and computation approximations, singularities will be uncovered. These hopefully will require only minor changes, but in any c m they do require extensive and detailed analyais with the added factor of severe pressure due to the ever present project schedule. It is true that theoretically all of these interactions could be predicted and errors therefore prevented. In actual practice, however, errors will occur. A second type of error is that caused by an incomplete understanding of the desired responae. Generally this type of error includes some detail of external equipment operation. After the device hm been installed and actual training use has been experienced, many changes will be suggested. These are not errors in the usual sense. Rather they are indications of less than perfect understanding of the entire training problem and are due in large part to the subjective and qualitative determination of the training problem. The proof of the training system is simply the quality of training imparted. This can be obtained only by actual use of the device and hence only after the complete installation of the airnulator. The power of the general purpose digital computer now becomes very apparent. Many changes can be incorporated without modification of external hardware, Further, if a change involves only the computer, no lost time in use of the trainer need occur. Thus the use of a central 124
DIGITAL TRAINING DEVICES
oomputer enables modification of the training problem simply and without interference with the training schedule. The program itself has some unique characteristics. Probably foremost is the extreme variability in execution time. Generally, a real-time training program is constructed with a major invariant cycle time. A convenient cycle time for submarine attack trainers is 1 sec. For flight simulators it may be much smaller, perhaps 100 msec. I n any cme, the major cycle will be an integral multiple of the integration interval. Within the major cycle all computations are executable at least once. Minor cycles, if required, will become integral fractions of the major cycle and for convenience may be a power of 2, such aa 4, A, etc., although this is by no means required or universal. The major cycle is so chosen that all time-dependent calculations, euch as integration, are performed correctly, and so that no response delays that affect training will occur. If a student actuates a control, the aomputer response to that control must have the same temporal oharacteristics as the real control. I n a complex training simulator, it is possible for many controls to be wtuated simultaneously. The major cycle of the program must be such that all possible computations can be performed within the cycle. Queuing of student inputs is not permissible. Queuing of some instructor inputs is possible, but generally these are insignificant calculations. The major cycle therefore must be able to satisfactorily handle a worst case condition of many simultaneous events. However, the normal situation is that the worst case will not occur very often. Usually very few events occur simultaneously. Thus, the average computation load is much less than the worst cme. During that portion of the major cycle not required by the computation load the computer will generally idle, waiting for a synchronizing signal indicating the start of the next major cycle. (If minor cycles are umd, some idling will occur during each minor cycle as well.) I n a worst case condition this idle time will be minimal, amounting to only a very small percentage of the cycle. Under average loads the idle time may be SO-SO% of the total cycle. Thus during normal training exercises the oomputer may be idle over half the time. There is generally no practical way to reduce the ratio of worst case to average execution time. Queuing of inputs cannot be done without compromising training. Diagnostic testing could be programmed, but seldom will the effectiveness of such diagnostic techniques justify the cost of programming and adaptation of the hardware. Thus it is a characteristic of training simulators using a general 125
CHARLES R. WICKMAN
purpose computer that the computer will be idle for m appreciable portion of each training exercise. Another unique program characteristic apparent in the physical layout of the computer is the absence of normal inputloutput processing. A training program does not require any of the standard peripheral equipment. Once the program is loaded, the only input/output processing required is that which concerns the unique terminal devices associated with the simulator itself. Magnetic tape, high speed printers, and other adjuncts to the modern computer are not required. They are sometimes used for recording of problem progress, but seldom is such use sufficient to justify purchase of the equipment. In most other respects the program evidences no significant uniqueness. The sophistication it represents is a product of the total system concept and in particular the mathematical model. Nevertheless, it is a formidable undertaking to prepare a training simulator program, and competent programming techniques must be employed. The result is a complex and very specialized program, which is essential to successful use of the training device. 5. Non-Training Uses of a Training Simulator
Non-training uses of a training simulator can be conveniently divided into two categories, applications using the complete simulator complex and applicationsusing only the computer. These alternate uses of the trainer are generally not planned for in the original concept of the system but are an outgrowthof the power and flexibilityof the computer. Since a training simulator reproduces the operating characteristics of a real, operational system, it can be used to develop and evaluate procedures and tactics applicable to the operational system. Within the limitations of the realism incorporated within the simulator, this use can be of great value and effect considerable economies. As an example, the submarine attack trainer can be used not only to train attack teams, but also to develop and improve the basic approach and attack tactics. If several attack centers are incorporated within the same trainer, mock battles can be staged between the various submarines. Since perfect data can be recorded as to the “true” situation, the tactics employed can be evaluated against what actually occurred. Also, replays can be conducted allowing the effects of perturbations in the tactics to be studied. The use of operational systems would require time-consuming deployment of complete submarines, and, unless expensive recording equipment is installed, a complete record of the exercise would not be available. Also, it would be very difficultto reconstruct and replay the 126
DIGITAL TRAINING DEVICES
exercise. Thus, the simulator can provide a very convenient means for improving the use of the operational system. To some extent, a training simulator can be used to evaluate operational equipment. For example, if a simulator incorporates an operational fire-control computer, new operational programs can be quickly evaluated. Thus the simulator is again valuable in improving the effectiveness of the real system. In order for a simulator to be useful in improving operational effectiveness, the limitations and departures from realism of the training simulator must be completely understood. The training device was originally designed to provide the realism required for training within the stated environment. If this environment changes or if new procedures or tactics require different realism reproduction, then the evaluation of the new equipment or tactics may be based on erroneous data. The limitations must be understood and considered in the evaluation. If the computer used in a training simulator is &generalpurpose device, then it may be used for any normal data processing function. Although this is not generally planned for in the original design of the sirnulator, it is seldom difficult to augment the computer with standard peripheral equipment and thus provide a standard data processing facility. Such use of the computer when training is not being conducted presents no problems. Normally, no special provisions are required. If protection against inadvertent access to the trainer equipment is required, a simple interlock could be provided. Otherwise the computer will function in a completely normal fashion. It is theoretically possible to time-share the computer between training problems and normal data processing. The attractiveness of such an idea is based on the average idle time of the computer, aa discussed in Section 3.3. I n theory the computer would be available about 60% of the time during a normal training problem. Thus, it would appear that considerable data processing could be accomplished without interfering with training. However, there are many practical difficulties in implementing such an arrangement. Since no interference with the training can be allowed, suitable absolute safeguards must be incorporated to protect the integrity of the training program. First, it would be required that the data processing program be prevented from halting the computer. Illegal commands, overflows, and any other condition that might normally stall the computer can usually be trapped. However, a HAIT command per se poses some difficulties. Again, this might be handled by trapping. Another problem is to assure that the data processing does not modify any storage location required by the training program. Memory 127
CHARLES R. WICKMAN
barriers help somewhat but they generally me not foolproof. Because of the short time available in each cycle, it is impractical to use magnetic tape to store and retrieve either the training program or the data processing program. It is dso required that the data processing program be prevented from inadvertently accessing the trainer equipment 1/0 channels, Again this could be prevented by some form of trapping. It would likewise be necessary to prevent an inadvertent transfer of control to the training program. Perhaps memory barriers and trapping could prevent this. Since the execution time of the training program is variable, the time available for data processing is variable. It is also unpredictable. Thus an executive program would be required that could quickly and absolutely turn off the data processing. Generally the time synchronizing of the training program is accomplished by some form of priority interrupt. This conceivably could be absolute. However, if the data processing program is performing an 1/0function, it becomes difficult to wure safe and timely restart of the training program. None of these problems is impossible of satisfactory resolution, However, they do present serious practical problems and would require extensive analysis, a sophisticated executive program, and probably modification of the computer before simultaneous training and data processing could be undertaken. It hw not been done to date and probably will not be undertaken for quite some time, if at all. It is a very interesting programming problem and, if it could be solved, would make available to training simulator users considerable computer capacity. A modification of the concept is practical. During most training exercises some idle time occura due to intended interruptions of the training. These interruptions can vary from a few seconds to upwards of an hour. If a long interruption occurs, then it is practical to dump the training program on magnetic tape and load some data-processing program. When training is resumed, the revem transfer would take place. It is possible that interruptions w short w 10 sec would make such a procedure practical and attractive. However, the author knows of no instance where this has been accomplished. 6. Future Training Device Requirements
In one sense, any discussion of future training device requirements is presumptuous sinoe it must wume certctin training problems and further preempts detailed analysis of the suspected problems. However, in a broader sense, it is a necessary and desirable thing to be able to discuss the training needs of w yet undefined systems. Only by this 128
DIGITAL TRAINING DEVICES
tspe of planning can the tools be adequately developed to satisfy future training needs. Considering only the implementation of training devices, it is to be expected that training simulators will become more complex and sophisticated and make use of many advances in technology. Each generation of training devices has advanced in the use of techniques, and there is no reason why this trend will not continue. More importantly, however, there are good and sufficient reaaons why such advances me required. As has been stressed previously in this article, the underlying and paramount reason is the need to provide better training. The first consideration has to do with the role ,of man in future systems. It is expected that, aa systems become further automated, the remaining functions assigned to man will become increasingly critical. This trend has occurred in weapon systems, command-and-control systems, logistics systems, and nearly all system development. The rote function is automated, and only critical nonrote-type functions remain. Thus the expectation is that man’s role will become concerned to a much greater extent with decision processes that require judgment too complex or too little understood to automate. These types of system will in turn lead to the development of more sophisticated training simulators, and, since the decision process is amenable to logical manipulation,it is expected that the increased use of digital computers will result. This is an evolutionary process. More revolutionary changes are also expected. Today, reliability of training simulators is not of significant concern. Training problems either me of short duration or present no hazard in themselves to the trainee. Thus seldom is it economical to attempt to achieve reliabilities compatible with operational systems. However, some simulators have been designed where reliability is a very significant factor. I n space vehicle training, it is desirable to simulate mission profiles in their entirety for short missions and only the greater portion of the miasion profile for long missions.-Thus,it has become a requirement to effeot training problems of many hours’ or even many days’ duration. It then becomes important to assure that the training is not interrupted. This in turn pltwes a premium on the reliability of the training simulator. The Apollo Mission Simulator,for example, is required to operate over extended training periods. Thus the system design must reflect this requirement in the inherent system availability. The inherent capabilities of moving base simulators for modern airoraft or spaoe vehicles ctre such that injury could result to the trainee from erratic and hazardous behavior. Thus, reliability again becomes a factor to prevent the trainee from being exposed to such hazards. I n a more mundane vein, the ever increasing sophisticationof training I29
CHARLES R. WICKMAN
simulators has resulted in increased cost. In order that the simulator aohieve a reasonable oost/effeotiveness,utilization of the devioe must be high. Utilization is a result of many factors, but one that is paramount is the availability of the simulator. Thus, to ensure a high utilization of the simulator, a premium is again plaoed on reliability. The result of all of these faotors will place an increasing and somewhat revolutionary emphasis on the reliability, maintainability, and availability of the training simulator. This will be aohieved by the use of many teohniques, suoh as fundamentally more reliable oomponents or redundanoy at either the oomponent or subsystem level. However, it is not expeoted that the trrtining simulator requirements will lead the advanoe of reliabilityteohniques.Rather, the training simulatorindustry will adapt teohniques developed in the design of operational system& One area of simulation that requires extensive advanoement is visual simulation. Today very few teohniques are available to generate dynamio, panoramio, visual displays. Suoh a display would be of great value for training situations requiring out-of-the-windowtype of simulation. Airoraft landing prooedures, spaoeoraft navigation and dooking, and other teeks requiring extensive visual oues are examples of training problems needing suoh a devioe. Although some techniques are available, muoh needs to be done. Sinoe the need exists, it is hoped that a solution will soon be forthooming. This is one area where true development is required striotly for simulator use. The use of digital teohniques, and in partioular large general purpose oomputers,has enabled very oomplex and sophistioated training devices to be oonstruoted. As oomputers become more refined it is expeoted that their power will enable many advanoes in simulation techniques, Many problems beoome praotioal of solution because the computer exists. Many more require still faster and larger oomputers. However, the oomputers will be available and the problem8 will be solved. The result will be better trained personnel, not only for oomplex weapon systems but for industrial systems as well. Through training, system effeotiveness oan be improved. Thus, many direot and indireot benefits will amrue. AOKNOWUCDQ~~~JT~
The author wiahea to expreaa hia appreaiation to the many people who have oontributed to thie article. Perticuler thenke me due to Mwara. Msurioe Bark, Stenley cfryde, and Be& Yeeger for their invaluable contributiona. Special mention end acknowledgment must a h be given to the U.S.Naval Training Devioe Center, Port Waahington, New York, under whoae eegia moat of the training simulators diaouseed in this articlehave been developed. The keenly perceptive foresight of Tminiig Devioe Center persome1 ia primarily reaponeible for the fmt that digital oomputera have oome into use in training aimulatora.
130
Number Systems and Arithmetic HARVEY L. GARNER The University of Michigan Ann Arbor, Michigan
1. Introduction . 2. Classification and Characterization of Number System 2.1 Definitions 2.2 Linear, Weighted Number System 2.3 Caxry Assimilation 2.4 Interpretation 3. Addition 3.1 Notation and Basic Addition Proceases . 3.2 Caxry Statistics and End of C m y Detection . 3.3 Improved Ripple Caxry Circuitry. 3.4 Logical Organization for F a t Addition 3.6 Conclusions 4. Redundant Number Systems 4.1 Separate Carry Representation 4.2 Redundant Signed Digit Number System . 4.3 Extended Digit Number System 6. Multiplication 6.1 Multiplier Coding 6.2 Multiplier Logic. 6.3 Multiplication for Complement Coded Operands . 6.4 Nonstandard Multiplication Logic 8. Division 8.1 Nonrestoring and Restoring Division . 6.2 Generalized Nonrestoring Division 6.3 SRT Division . 6.4 Modified SRT Division 7. Residue Number Systems 7.1 Basic Characteristics 7.2 Applicationa . 8. Digit by Digit Computation 8.1 Paeudo Division and Multiplication 8.2 The CORDICTrigonometric Computing Technique Nomenclature References
.
.
.
.
.
.
.
.
.
. . . .
.
. .
. 131 . 132
. 132 . 134 . 136 . 138 . 143 . 144 . 148 . 149 . 160 . 168 . 167 . 167
. 169 . 182
.
. 183 . 164 . 167 . 167 . 188 . 188
.
. 176 . 177 . 177 . 179 . 182 . 182
.
. . .
.
.
. 183
. 170 . 173
. 188 . 190 . 191
1. Introduction
This paper is concerned with number systems and arithmetic for maohine calculation. Techniques, concepts, and models are presented 131
HARVEY L. GARNER
whioh are believed to be relevant to the development of improved mwhine arithmetio prooesses. An attempt is made to present the important bmio oonoepta and the essential flavor of eaoh item considered. Limitations in time and spctoe have required the omission of of eventual importanoe to the serious student of maohine many denumber systems and arithmetio. For these details the reader must refer to the original papers. The reference list appearing at the end of this artiole provides some direction for more detrtiled studies. This referenoe list, however, does not oonstitute a oomplete survey of the literature of maohine number systems and arithmetio. X Classification and Characterization of Number Systems
I n this section, definitions and baeio oonoepts are given, and the general properties and oharaateristios of maohine number systems axe considered. Emphaeis is plaoed on the distinotion between a representation in a maohine number system and the interpretation whioh may be given to a representation. The manner in whioh a maohine designer restriots interpretation is also considered for fixed point number system. The oonsideration of number systems for digital computer applioations may be restrioted to finite number systems. The most important oharwtaristio of maohine number systems, oontrasg to the most prevalent opinion, is finitude rather than the faot that b i n q logio ie employed. Overflows, underflows, soaling, and oomplement ooding whioh chaxaoterize maohine number systems are direot oonsequenoes of finitude and, in faot, illustrate both the detrimental and benefioial oonsequenoes of finitude. 2.1 Definltlons A finite number system N oonsists of a finite set of symbols. In general, the ohaxaoteristios of useful arithmetio require that the abstrwt structure of N be a t least an abelian group under addition. Usually the elements of N have the form of a finite n-tuple (xs, . ,q); zi is the ith digit of the number representation and is an element of Zi, the set of digit values for the ith position. If Z4is the atme set for all digit positions, the number system is said to be fixed base or oonsistently baeed, otherwise mixed base or nonoonsistently bawd. The British m o n e t q number systam is a mixed baee number system. The elements of N require interpretation if anything other than abstraot oomputation is to be obtained. Let Q be the set of interpretations. A mapping a between N and Q is required to oonneot the number representation and the interpretation. The nature of this map is an important ohmaoteristio of the number system. If the map N to Q is
..
132
NUMBER SYSTEMS AND ARITHMETIC
one-to-one and onto, then the number system is complete and nonredundant. If the map is one-to-one from N into &, then the number system is incomplete, since some interpretations in Q do not have representation in N. If the map from N to Q is many-to-one and into, then the number system is redundant but incomplete; if onto, then redundant and complete. (A mapping a of a set 8 into a set T is a oorrespondence that associates with each 8 €8 a single element t E T. A mapping is mid to be onto if every t E T is associated with some 8 EB.)
If the interpretation is dependent on the order of the digits, then the number system is a positional number system; otherwise it is a nonpositional number system. I n a nonpositional number system a is not ohanged by any permutation applied to N. Most useful number systems are positional. The number system is weighted if for all (xn, . ,xl) E N,a:N +& is determined by a weight function
..
w(X)=
xipi
q mod H,
q EQ
i-1
The digit weight of the ith digit position is pi. H is the modulus of the number system. The modulus is a characteristic of the interpretation. If Q is a finite set of fractions with magnitude less than unity then M = 1. If Q = (0, 1 , . . ., A-1} then H = A. Note that the definition for a weighted number system given by Eq. (2.1) is a generalization which includes the number systems defined by
The generalization is necesmry because it is consistent with standard lhaohine practice of executing addition mod H. A number system is a linear number system if the map a is a linear function of the digits xi under digitwise addition. An example of a nonlinear number system is the excess three code. The nonlinearity is due to the addition of the constant three. The excess three code is weighted if a digit position of weight negative three with a constant digit value of one is assumed. Alternatively, the excess three code can be classified as a nonhomogeneous linear code. We prefer the more restricted definition of linearity and thus place the excess three code in the nonlinwr category. The r e w e d binary code is a nonlinear weighted oode. It is known that the digit weights for this code are pi = f 6-1- 1
(2.3)
For the reflected code the weight of the most significant digit with 133
HARVEY L. GARNER
nonzero value is positive. The weights of all digits with nonzero value alternate in sign. Hence the sign of the digit weights is a function of the partioular number being represented. The exact weight function irs w(x)
+ xn-lpn-l(-l)z. + zn-Ppn-n(-l)"n@2"-' +... +xlp,(-l)xn@
= xnpn
-..@9
(2.4)
In general, addition is not possible or at best is extremely COMplicated in nonlinear number systems. However, in the cme of the reflected binary code, a rather simple modification, due to Lucal [39], permits the development of a reaeonable addition algorithm. The modification is the addition of an auxiliary digit xo = x n @ . . . @ x1
(2.5)
Given xo, the sign of the weight at any digit position may be determined sinoe x n @. . . @ x* = X o @ X 1 = y l (2.6)
@ xk = Yk-S @ xk-l = yk-1 (2.7) @ For a refleoted binary number representing the positive integer J, xn
20 =
{
if J i s o d d if J i s e v e n
1 0
To obtain refleoted code addition, the adder must propagate a y digit for both operands md both oarries and borrows. Lucal has shown that (x,,-~, , , xl, xo) is a representation in a modified refleoted binary number system [39] for which
..
pi = 2t(-1)h@**-@#+1,
i
=0
.
~ 1 , . . ,n-1
(2.8)
The Hamming distance between successive representations is two rather than one. The main advantage of the modified system is that multiplication by 2k is accomplished by a left shift of k bits, 2.2 Linear, Weighted Number Systems
I n general, we shall discuss only weighted linear positional number systems and the term system will have this meaning unless otherwise qualified, Garner [20] has shown that for a nonredundant complete, weighted, linear number system to exist, i t is necessary and sufficient that: (1) The product of the cardinality of the digit value sets is equal to the cardinality of Q,the set of interpretations. 134
NUMBER SYSTEMS AND ARITHMETIC
(2) For all i, the digit weights are
hM
ei =i ,
GCD(rni,ki)= 1
(2.9)
5fimj
.
where mj is the cardinality of the set Z j , and Z j = (0, 1, . . , mj-l}; M is the modulus of the number system. A redundant number system is obtained if either of these conditions is not satisfied. If the cardinality of the interpretation set is less than the cardinality of the product of the base sets, then the number system is both complete and redundant. If condition two is violated, then the number system is redundant, and incomplete if condition one is satisfied. These conditions define a very large class of number systems. For example, the cardinality of the set of distinct number systems for an n digit, fixed base, weighted, linear number system is mz4(m), 2 = n(n-1)/2, and 4 is Euler’s function; m is the base of the number system. While the characteristics of the complete class have not been studied in detail, i t appears that the number system with conventional weighting obtains the simplest arithmetic properties. The conventional number system has the property that hi = 1 for all i. If mj (the cardinality of the base set) is chosen such that all pairs of mj are relatively prime, then the class of number systems contains, at one extreme, residue type number systems, and, at the other extreme, is a number system with conventional carry (hi = 1 for all i). Rozenberg [59] has investigated this class of number systems and has concluded that only the conventional number system and the residue number systems are of interest.
2.3 Carry Assimilation The carry or borrow assimilation processes for the class of linear weighted number systems may be studied using a related redundant number system. Assume that the carry or borrow rules are unknown. This is indeed the case when one investigates a new number system. Consider the nonredundant number system N , with digit sets of cardinelity m,, . , ,m,. If digitwise addition is performed, the weight function will continue to provide the correct interpretation modulo M ,but the initial sets of digit values with cardinality m, are no longer adequate. No loss in generality occurs if it is assumed that Q = I;, = (0, 1, , . , bl- 1) (integer interpretation). Then
.
.
nm4=M
(2.10)
i-1
135
HARVEY L. GARNER
We seek a new set of digit values which i s fmite and is closed under addition. The set I;, has this property since for any i, Mp, 0 mod M. 2, is also the smallest set with this property. We may now define a redundant number system N , with the same digit weights aa N,. N , has the property that all x, E I;,. Furthermore, N , is included in N,. The cardinality of N , is &in and every interpretation in ZM is associated with exactly N"- 1 distinct elements of N,. The M" different elements of N , can be arranged in a rectangular array, M x M" - l, so that the leftmost oolumn contains the M elements of N , ordered in increasing magnitude. The array can be arranged so that the M" - entries of the first row are the M"- 1representations congruent zero modulo M.It can be shown that K,the set of representations congruent to zero modulo bl, is a proper normal subgroup of N,. Thus, the array is a coset decomposition of the N,. Each element of N , occurs in exactly one coset and may be considered as the coset representative. The remaining elements of a given coset are redundant representations, due to unassimilated canries, which have the same interpretation as the coset representative. Two elements of N , are in the same coset if and only if they differ by an element in K. The problem of assimilation for N , is conveniently studied by means of the homomorphism between N g and N,, or the isomorphism between N , and the factor group N,/K. The elements of N,/K are the cosets. This approach is due to Arnold [Z]. Arnold has shown that the appropriate algebra for the study of N , and N , is found in module theory. I n particular, the carry structure or the borrow structure of N , can be completely specified in terms of the generators of K. The generators of K provide the transformations required for carry assimilation. The generators of K always form an n x n triangular array. The diagonal elements of these arrays are filled by rn, , , , ,m,, where m, is the cardinality of the ith set of digit values. The generator array for residue number systems contains nonzero elements only on the main diagonal. For a conventional system, the diagond below the main diagonal is filled completely with negative ones (i,e., M-1). See also LeVeque [37l. R m [52] has made some extension of the above concepts for the study of number systems which are redundmt because the cmdinrtlity of N is greater than &. However, the problems of canonical reduction and carry assimilation for redundant number systems require further study. 2.4 Interpretation
Some additional comments on the interpretation problem are appropriate at this time. Computer number systems usually do not 136
NUMBER SYSTEMS AND ARITHMETIC
employ an explicit symbol or marker to denote the separation of integer and fractional weighted digit positions. The position of the point is specified by convention in fixed point representations and the user can, to some degree, modify the convention. Floating point representation employs a fixed point number and an auxiliary number called an exponent. The exponent indicates the direction and number of positions that the point should be moved. Most modern general purpose computers provide for floating point operation. Floating point notation simplifies the scaling problem since fractions and integers over a large range can be easily represented. Arithmetic operations must include exponent calculation, and rtddition or subtraction is permissible only if the two operands have the same exponents. Shifting of one operand relative to the other is required prior to the addition of two operands with different exponent values. Results are usually shifted left, with the appropriate exponent modification, to obtain a nonzero digit in the high order position of the fixed point representation. This process is called normalization or standardization. The resulting fixed point part of the number is said to be in normal or standard form. Normalization may produce results with questionable significance in the low order digits. This problem has been studied in detail by Ashenhurst and Metropolis [I, 441. All computers permit fixed point arithmetic, some permit only fixed point arithmetic. Fixed point notation is a basic part of floating point notation. The discussions which follow in this paper will be concerned with the arithmetic operations for fixed point number representations. Another aspect of the interpretation problem is the representation of positive and negative numbers. Again there is no special plus or minus symbol in the machine representation. The designer must establish an interpretation and a convention which permits the machine number system to represent both positive and negative numbers. Two basio schemes exist. (1) Magnitude plus sign. Normally the high order digit represents the sign. The magnitude and sign computation are separated. Magnitude plus sign coding is natural for the human but unnatural for the machine because special sign computations are required. If x - y is executed in magnitude plus sign code and y > 2, then the result is obtained in complement coded form. ( 2 ) Complement coding. Two types of complement code are in general use. These are the radix complement code and the diminished radix complement code. Complement coded number systems have the property that no special sign computation is required. Thus, in general, computation for complement coded number systems can be obtained
137
HARVEY L. GARNER
with less complexity than that required for magnitude plus sign coded number systems. On the other hand, the complement coded representations are not natural for the user of the machine. The baeis for complement coding is found in the theory of cyclic additive groups of finite order. We give here an abstract presentation of the interpretation problem for machine number systems for both complement and magnitude plus sign coding. Digital computer number systems have the properties of a finite additive group for single length representations. Double length representations of interest in multiplication am also elements of a finite additive group [23].One property of the additive group Q is the exis0 = a, a E a. Another tence of a unique zero element such that a property is the existence of a unique inverse element for every element of the group. If a b = 0, then a is the inverse of b and b is the inverse of a. Thus b can be interpreted ae negative a and a can be interpreted as negative b. Thus every element of the group and every element of the machine number system may have both a negative and a positive interpretation. Closure is the third property of the group. Closure means that if a, b E Q and a b = c, then c E Q. The closure property of the machine number system creates the overflow problem because the machine number system is finite while the number system of the user is infinite. Scaling is the process of interpreting the finite machine number system in the appropriate way to represent a selected set of numbers from the user's number system. An overflow occurs when a representation is needed which is not a member of the selected set. More precisely, a machine number system N having the properties of a finite cyclic additive group is mapped into the set of real numbers. Since the group is cyclic, it can be generated by one additive generator S'(0) E N. The nth successor of zero is defined as AS"(0) E N. Let # N, the cardinality of N, be equal to a. An important identity is P ( 0 ) = 0. The predecessor of zero, P(0)EN, is the nth element preceding zero, and since P ( 0 ) = 0, i t must be the cam that P ( 0 ) = P " ( 0 ) .The map for complement coding is determined by the desired interpretation. The map N --f R must have the following properties:
+
+
+
(1)
o-to
(2) S ( 0 )-+ 0
+ x8
(3) P ( 0 )- t o - y8 (4) m + k + l < # N = a
0
E
R
O,SZ(O), P(0)E N
~8 E R , x y8 E R , y
= =
1,2,...,k 1,2,...,m
Consider two elements of the reels having representation in N; an overflow occurs if the sum, of two elements, does not have a representation in N. 138
NUMBER SYSTEMS AND ARITHMETIC
Let us consider the possible complement coding schemes. I n the first place we may choose 6 > 0 or 6 < 0. The usual choice is t o choose 6 > 0 so that complement coded representations and the magnitude plus sign representation for positive numbers are identical. If m or k equals zero, then a number system representing only numbers of one sign is obtained. I n general, it is desirable to choose m = Ic so that every number has both a positive and a negative representation. Suppose k = m = a - 1. This is a violation of property four. I n this case, every element of N has both a positive and a negative interpretation. This ambiguity causes no arithmetic difficulties if the sign of the result is known. However, this is usually not the case, and therefore property four is essential in order to obtain nonambiguous interpretations with respect to sign. The complement code is called a radix complement code if (0, . . . , 0 ) -+ 0. If 2 I # N . then usually m = k 1. This choice permits sign detection on the basis of the high order digit. For example, in the binary number system, the high order digit conveys the sign information. For a base ten number system, the high order digit contains both sign and magnitude information. Digit values 0 , 1, 2, 3, 4 are associated with one sign value and digit values 6, 6, 7, 8, 9 with the other sign value. The 1 if 2 I # N also permits a simple detection scheme choice of m = k for additive overflow. Overflow occurs if and only if the addition of two numbers with the same sign interpretation produces a result with the opposite sign interpretation. The addition of two numbers with opposite sign interpretation cannot produce an overflow. A diminished radix complement coded number system is obtained if ( b - 1, . . . , b - 1) -+ 0, where b is the base of the number system. The diminished radix complement coded system is unusual because (0, . . . , 0 )is not an element of this system. The complement coded number system is defined without (0, . . . , 0 ) ,but in order to obtain simple addition properties, the complement coded arithmetic is executed in a number system which includes (0, . . . , 0 ) and a correction is used when needed. Addition may be viewed as enumeration using the successor concept. Let N , = N - ( 0 , . . . , 0); # N = a,# N , = a - 1. For S’(0) E N , and Sb(0) E N,, we have Sa(0) Sb(0) = S’(0) where c = I a b I a = a+ b - ta butS’(0) E N . It is obvious that a correction is needed when t # 0 because ( 0 , . . . , 0 ) has been counted. This condition is easily recognized in practice. Addition for an n digit 1 digit position whennumber system produces a carry in the n ever t # 0 (i.e., c,,+~ = 1). The required correction is the addition of one in the low order position since (0, . . . , 0 ) should not have been enumerated. Hence the term “end around carry correction.” If 2 I a and k = m, then the machine number system has one zero 139
+
+
+
+
+
HARVEY L. GARNER
element, (./a) - 1 elements with negative interpretation, and (a/2)- 1 elements with positive interpretation. End around carry correction is always required if two elements of the set P ( 0 ) . P ( 0 ) are added together. Consider P ( 0 ) Si(0), 0 <j < ( a / 2 ) - 1, 0 Ii <(a/2) - 1; in this case, the correction is conditional and the need for correction is not known until c,,+~ is generated. Overflows may be detected, after end around carry correction, using the same scheme given for radix complement codes. A magnitude plus sign code of n 1 digits can be defined easily using the structure which has been developed for complement codes. The n digit, magnitude part of a magnitude plus sign code, is characterized by m = 0; k = a - 1, andS"(0) +-0.A separate digit is used to display the sign. Overflow can be detected by = 1. Subtraction can be reduced to enumeration using the predecessor concept. Consider S > 0, then D = #yo) - Sb(0) = P"[Sb(O)]
..
+
+
= sa-a[s*(o)] -&a-a+b
(0)
(2.11)
If a > b the difference is negative and appears in radix complement coded form. This demonstrates again the naturalness of complement coding to machine arithmetic. In a fixed point system it is assumed that all elements of the machine number system N have the same interpretation. If this is not the case then there must exist some explicit notation for the proper interpretation. A number system with this property is classified as floating point. The degree to which the machine designer prevents the use of multiple or variable interpretation, implicitly determined by the user, is revealed in the following analysis of the problem of interpretation in fixed point arithmetic systems. Obviously, the user can assign any interpretation to the number system for addition since in the absence of an overflow S ( 0 ) +a8
(2.12)
S*(O)+ b 8
(2.13)
P(0)
+ S*(O)+-a8 + b8
=
(a
+ b)S
(2.14)
The situation is somewhat different for multiplication. Normally a double length product is obtained by multiplying two single length representations. We can study multiplication by extending N, the single length number system, to obtain N,, the double length number 140
NUMBER SYSTEMS AND ARITHMETIC
system. # N = CL and # N , = a,. This is a natural extension since N , is the set of product representations obtained by the multiplication of two operands from N, and N can be embedded in N,. The multiplication process can always be obtained in the machine by a sequence of additions. N, is also an additive group. This property can be used to study complement coded multiplication. However, we are concerned at the moment only with the question of scaling for the operation of multiplication, and we shall not pursue the question of complemented coded multiplication. Let Sa(0),Sb(0)E N, then since multiplication can be obtained by addition, we have
X“(O)Sb(o)= P ( 0 ) -+ abS,
Sab(0) EN,
but the correct product is abP, so a corrective multiplication by 6 is required. Thus, the elements of N, must be given the interpretation S”b(0)+ a b P
There exists a problem of compatibility between N and N,, since, in general, N f N,. Incompatibility, in the case ofthe conventional fixed point, weighted, fixed base number system appears because single length operands produce double length products. Either the double length product is approximated by a single length representation or the subsequent computation must be executed in a double precision mode. A product in N, may be approximated by a representation in N. The approximation is not valid if overflow or underflow occurs. If S“(0) E N, then S’(0) + zS2 = (z8)S. Consider Sz”(0).If zS 2 # N then S‘’ (0) $ N and an overflow condition exists. If xS < 1 then an underflow condition exists. Overflow occurs only for 6 2 1 and underflow occurs only for 6 < 1. If 1 < zS < # N then S[2a1(0)E N approximates S’(0) E N,. Let [xS], = [zS] if zS - [zS] < 4, otherwise [zS], = [zS] 1. Then S[”lR(O) E N is the rounded product. S[z”l(0) E N is readily obtained from S’(0) E N, for fixed base b number systems if 6 = b-,; for integer values of k. In this case S“(0) E N is an n digit representation and S’(0) E N, is a 2n digit representation. Let S’(0) = (a,,, a,,-,, . . . , a,,,, a,, a,-,, . . . , a,) E N , and S[z’l(0)= (b,, . . . , b,) E N. Then bi = a,, for i = 1, . . . , n if k i = 1, . . . , 2n, otherwise b, = 0. Notice if 6 = b-” then Sz”l(0) = (a,,, . . . , a,,,) and if 6 = 1 then k = 0 and S[z”’(0)= (a,, , . . ,a,). Notice that A!5“zb1(0)= 0 if 2n I k or k 5 - n and only when 0 k, n are some bi not always equal to zero. Thus for the conventional fixed point number system the user’s interpretation of the machine number system relative to multiplication is restricted because of compatibility, overflow, and underflow considerations. The interpretations S = b-,, k = 0, 1, . . . , n are preferred.
+
+
141
HARVEY L. GARNER
The machine designer usually incorporates instructions favoring a given interpretation (i.e., the low order product instruction is consistent with 6 = 1 while the high order product is consistent with 6 = b-"). Given a dividend represented by S'(0) E N and a divisor represented by S ( 0 ) E N, a fixed point division algorithm obtains Sq(0) E N and S'(0). Sz(0),SY(O), and tP(0) have the same interpretation. Division algorithms are iterative and the precision of Sq'(0)which approximates S ( 0 )is improved at each iteration by an incremental correction, Sqk(0). Thus the interpretation of N, the number system containing the product representations of the form Sqk(0)lP(O)= k W ( O ) , must be considered in specifying the iterative computation. In particular, the requirement that S ( 0 )should have the same interpretation as P ( 0 ) and SY(0) necessitates the consideration of the compatibility of interpretation between N , and N. Consider the specific case of the conventional iterative division algorithm for a fixed bme, n digit, weighted, number system. For N let 6 = b-', then a ( 0 ) = (al, . . . , qn) E N and 8(0) --+ q l P k - l . . . + q,,b-'. (The ordering of the subscript from left to right is contrary to the general notation used in this paper but is used for quotient representations, obtained by the conventional algorithm, to emphasis the order in which the quotient digits are obtained.) If ql is to have the correct weight then at the first iteration the machine must perform a calculation representing
+
(2.15)
where S'(0) -+ yb-' IXJ
and =
=
d
xi - qj+lb"-"j-ld ,
j =0,1,
...,n
(2.16)
Equation (2.15) determines an integer value for ql, and b"-'-l is associated with d. Usually, logic exists for shifting the machine representation of x rather than d. Multiplication of Eq. (2.15) by b-n+'+l yields xl = x(j-"+k+l - q d (2.17) =
bx,
- qla
where X, =
Xb-wk
and xj+l
142
=
bxj
- qj+ld
j = 0,. . . ,n
(2.18)
NUMBER SYSTEMS AND ARITHMETIC
Thus an initialization of the machine representations of the divisor or the dividend is required to obtain a sequence of quotient digits comprising a quotient representation with the same interpretation as the divisor and dividend representations. The initialization displaces the dividend representation n - k digit positions to the right of the divisor representation. Thus if k = 0, which is the standard integer interpretation, then zois initially located in the low order part of the accumulator. If k = n,which is the standard fractional interpretation, then x o is initially located in the high order accumulator. Thus by virtue of a given initialization the machine designer restricts or specifies the interpretation for a fixed point division instruction. The machine representation for the partial remainder x;+~or zj+l may require more than n digits except for the case k = n. If radix complement coding is employed then the computation required for each iteration can be obtained using only the representation contained in the high order part of the accumulator representation. I n effect the machine representation is given a different interpretation at each iteration in order to restrict the computation to the high order part of the accumulator while providing the correct weight for qj+l. Diminished radix complement coding requires end around carry correction. Also the process of complementation of the divisor must be considered with respect to the complete partial remainder representation. Because of these considerations the logic for the mechanization of machine division using diminished radix complement coding is more complicated than the corresponding logic for a radix complement coded number system. The notation of Eq. (2.18) is adequate for a general discussion of the properties of a machine division algorithm and is used in Section 6. Except for the notable exception k = n a detailed study of the mechanization of the division algorithm requires a notation for the machine representation and a separate notation for the interpretation of this representation. Floating point interpretation can be studied within the same framework used for fixed point interpretation if the restriction that each operand which is an element of N should have the same interpretation is dropped. I n this case 6 become a variable and the exponent part of the floating point representation provides the value of 6. 3. Addition
In conventional arithmetic the adder is the basic element, All other arithmetic operations are obtained using combinations of addition, complementation, and shift. Because of this, i t is important that the addition time be minimized. I n general, fast addition techniques exist 143
HARVEY L. GARNER
but the logical circuitry is complicated and engineering decisions must be made to determine the proper balance between addition time and circuit complexity. At one time these techniques appeared to have only academic interest, This, however, is not the case today. There is a genuine need for rapid computation rates and the availability of solid state circuitry has made it possible to construct reliable complex switching networks. At one time the task of the logical designer was to find compromises between cost and machine effectiveness. Present emphasis is much more on effectiveness. Also, the user has money and is willing to pay the price required for fast computation. In this section we shall discuss the characteristics of the basic addition process and the various schemes which have been advanced to improve the machine addition process. This discussion is restricted to the conventioml binary number system. 3.1 Notation and Basic Addition Processes
The following notation will be used: (1) Capital letters will denote n-tuples of digits representing numbers. (2) Elements of an n-tuple are represented by small letters. (3) The elements of an n-tuple are ordered from one to n starting
from the right. (4) Subscripts are used to denote n-tuple components or digit positions. Example: C = (cn, , cl). (5) Superscripts are used when needed to index sequences of n-tuples.
.. .
(6) A shift is indicated by multiplication by the appropriate power of two. Given A , then 2A denoted the result of a one bit left shift of A. (7) (a) @ denotes exclusive OR; ai @ b, = ri. (b) v denotes inclusive OR; a, v b, = vi. (c) Conjunction is denoted by juxtaposition; aibi = ki.
The basic addition process has three parts: (i) digitwise modulo addition; (ii) carry generation; and (iii) carry assimilation. Define R as the digitwise modulo 2 sum of the two operands A and B. If the carry C is known, then the sum S is obtained by assimilating the carries using the digitwise modulo 2 sum operation. Thus
S = A @ B @ C = R @ C (3.1) The basic problem is, of course, the determination of C. Carry digits are of two types: initiated and propagated. An initiated carry occurs at the ith position of C whenever E A and E B are both equal to one. A propagated carry exists in the (i k ) position of C if ri+E-l,
+
144
NUMBER SYSTEMS AND ARITHMETIC
.
. . ,ri = 1 and ci = 1 where rj E R.Frequently, the digitwise modulo 2 sum is obtained from the following logic: ~
ri
=
(ai v bi) aibi
(3.2)
Thus the carry initiations, ci = ui-lbi-l, are available after only one logical level. The critical problem in addition is the determination of the propagated carries. The correct sum is obtained in one or two levels of logic after all carries are known, since S = R @ C. A given digit position of C cannot be both a propagated carry and an initiated carry. Consider the ith position of C. If c1 is an initiated carry, then ki-l = 1. If ci is a propagated carry, i t is necessary that riFl = 1. However, ki-lri-l = (ui-lbi-l)(ai-l @ bi-l) = 0 (3.3) Carry assimilation is the process in which the carries are added to the digitwise modulo 2 sum of the operand digits to yield the final sum. Individual carry digits, in contrast to the scheme described above, may be assimilated as generated to yield partial sums. The final sum exists when all carries have been assimilated. The techniques used in the carry assimilation process and the carry generation process are the most important distinguishing characteristics of the various addition processes. We shall now consider three well-known addition schemes representing certain extremes in design. These are the conventional ripple carry adder, the simultaneous adder, and the carry storage adder. The carry process can be represented in matrix form and the various addition schemes are different factorizations of this matrix [21]. 3. I . I The Ripple Carry Adder
In a ripple carry adder the carry generation logic for the carry at the (i + 1) position is ci+l = ci(ui 0 bi) v uib,= c,ri v ki
(3.4)
Due to the fact that riki = 0 , the connective in the above equation can be either an inclusive or an exclusive OR. The carry logic requires a two input OR gate. and a two input AND gate per digit. An n-bit adder using the ripple carry technique has a chain of 2n gates. I n this chain the AND and OR gates alternate. In the worst case a carry will be required to propagate through the whole carry chain. The worst case occurs when all of the digits of R are equal to one and c1 = 1. (A “one” in the least significant position of C is not due to the addition process per se. Carries of this type originate because of the need for end around carry correction in the diminished radix complement system, and as a result of the technique which is used to obtain two’s complement representation 145
HARVEY
L. GARNER
from the one’s complement representation in a parallel adder.) Unless special circuitry is employed to indicate when the propagated carries have been determined, it is necessary to allow a length of time equal to the time required to propagate the maximum length carry for each addition operation. This is the main disadvantage of the ripple carry adder, The main advantage is, of course, that the realization of this concept requires fewer components than any other parallel adder. 3. I .2 Simultaneous Carry Loglc
The ripple carry scheme without end of carry detection is one extreme. At the other end of the spectrum is the scheme called “simultaneous carry logic” or “look ahead.” This type of carry has also been designated as “nonrecursive carry generation. ’’ The simultaneous logic for the carry at the ith position is given by equation ci = kor1.
. .ri-1 v kit+,. . .ri-1 v . . . v ki-arC-1 v 4 - 1 i =2,
(3.5)
...,n + 1 ,
where c1 = E,. The use of this type of logic for the determination of the carry digits was first considered by Weinberger and Smith [74]. The adder proposed by Weinberger and Smith did not employ simultaneous logic to the fullest possible extent because of economicconsiderations and limitations in the maximum possible number of gate inputs available. Notice that the logical connection between terms in Eq. (3.5) may be either inclusive or exclusive disjunction because, at most, one term has the value one. This follows from the fact that riki = 0 for all jfrom 1 to i - 1. Observe that each carry digit is an explicit function of all operand digits of lower order. The carry logic has a depth of two. The determination of the ith carry digit requires i - 1 AND gates, the outputs of which drive a single OR gate having i inputs. If it is assumed that each logical element has sufficient output drive capabilities, then the simultaneous adder provides the final sum using logic with a depth of six. A logical level of two is required to obtain R, the digitwise modulo 2 sum of the operands. A level of two is required to obtain the complete carry C. A logic level of two is required to obtain the digitwise modulo 2 sum of C and the partial sum R. If, as in many cases, the OR gate can be obtained by the junction of two wires, then the logical depth of this particular circuit can be reduced to three. I n practice, it is usually not the case that the logical elements have the drive capabilities required by this circuit when n is large. In this case, additional circuitry must be used to provide the necessary drives and this must increase the depth of the circuit. 146
NUMBER SYSTEMS AND ARITHMETIC
3. I .3 Carry Storage A basic concept which may be employed in addition logic is “carry storage.” If carry storage is employed, the addition of two operands can be obtained using only one half-adder at each digit position. Two n-bit registers are required, one for carry storage and the other for the storage of partial sums. Let Si designate the ith partial sum and let Ci designate the ith partial carry. The first partial sum is equal to the digitwise modulo 2 sum of the two operands. The first partial carry is equal to the digitwise conjunction of the operands, shifted one place to the left, i.e., C’ = 2K. Note C1consists of n 1 bits. If “two’s complement arithmetic” is employed, then c:+~ may be ignored. If “one’s complement arithmetic” is used, then c1 = c:,~. I n either case, only n bits are required. C’ is stored in the carry storage register and 81 is stored in the partial sum register. Succeeding operations are of the form
+
@+’ = 8i
0c,
c”-’ = 2(SiC”,
i
=
1,2,,..
(3.6)
i
=
2,3,.
..
(3.7)
The addition process continues in this way. At each step the partial carry is assimilated to obtain a new partial sum and a new partial carry is generated until such time as Ck,the kth partial carry, is identically equal to zero. For the n-digit adder, n steps may be required in the worst case. Consider the ith position of each partial carry:
ci1
=
ki-l
cia =
cis
= .$-lc”_i =
=
ri-lkc-z
=
(ri-l 0ci-l)(ri-zki-s)
0 r.i-z)(ri-zki-s)
= ri-lri-zki-s
Observe that ci, the final carry in the ith position, is equal to ci = c,’vc,,2 v . . . c i “
(3.8)
(3.9) where C: for all k greater than i are identically equal to zero. Equation (3.9) is identical to Eq. (3.6) given for the simultaneous carry logic. Thus it is known that at any digit position in the carry storage register there will occur during the addition process, at moat one digit having value one. Carry storage is discussed by Burks, Goldstine, and von Neuman [8] and was used in WHIRLWIND I [I71 to speed up multiplication which consists of a sequence of add and shift operations. 147
HARVEY L. GARNER
The baaic oarry storage technique is easily extended to obtain an adder which has the property that one new operand can be introduced at each step, consisting of the assimilation of the partial carry representation and the determination of a new partial carry and a new partial sum. In particular, complete carry assimilation is not required before the introduction of a new operand. In multiplication the successive operands are the products of the multiplicand and successive multiplier digits. In order to achieve this, the basio adder for each digit must be a three input adder. The logic for the partial sum and the partial carry at the ith step, i > 1, is given by
8' = S'-'
@
A"1
2[(@-1
81=
A' 0 A%
C'
i
= 2,3,.
..
A&')@-' v @-1A"+l]
Q =
@
@ @',
= 2(A1AB)
(3.10) (3.11) (3.12) (3.13)
This basic carry storage scheme has been modified somewhat in ILLIAC 11so that carries initiated by the conjunction of @-l and A"+'are never stored in the carry storage register but are assimilated immediately. (This modification is discussed in Section 4.) 3.2 Carry Statistics and End of Carry Detection
The statistics of the carry generation and propagation process are of particular interest since the determination of the end of carry propagation is one technique which may be used to decrease the average addition time. Note that if carry storage is employed, the end of carry propagation condition occurs when the carry register contains all zeros, This condition is readily detected by an n-input AND gate. An alternate scheme to be employed when carry storage is not available was proposed by Gilchrist, Pomerene, and Wong [25].In this scheme two ripple carry chains are employed. The conventional ripple carry chain is augmented by an auxiliary carry chain. An auxiliary carry di is initiated in position i when &l&-l = 1 where ui-' and bt1 are the ith - 1 positions of the two operands. The auxiliary carry di is propagated by the condition airi = 1. It is apparent that when carry propagation has been completed there will exist a "one" in every digit position of either the carry representation or the auxiliary carry representation. This condition is eaaily detected. If all digits of both operands are independent and the digit values 0 or 1 are equally probable, then the probability for carry initiation at any digit position is one-fourth. The average length of carry propaga148
NUMBER SYSTEMS AND ARITHMETIC
tion has an upper limit of two digits. This figure does not include the carry initiation. Thus the upper limit for the average length of the carry sequence is 3.0 digits. In a given addition several different carry sequences may exist. The addition process cannot be completed until the longest carry has been propagated. Therefore it is not the average carry length which is important for the end of carry detection, but, rather, the average maximum carry length. Burks et al. [8] have shown that the upper bound on the average maximum length of carry for the addition of two operands is given by log,n. The maximum average length of carry for n = 40, the operand length in the Princeton machine, was given as 4.62. The formula for calculating the maximum average length is given, although this formula is in error. I n particular (see page 10 at the end of the third paragraph in Burks et al. [a], the formula for P,(v) should read
P,(v)
=
P,-lW
+ E l - P,-"1/2w'
(3.14)
Gilchrist et al. [25] used simulation to determine that the maximum average length of carries, when both the conventional carry and the auxiliary carry are considered, is approximately 6.6 for a 40-bit operand. The actual length is close to 6.69. Reitwiesner [54] developed. the probabilistic equations for this type of carry completion circuit and provided results for 2 5 n 2 39. If carry storage is employed, then the end of carry is easily detected without the auxiliary carry. As we have indicated above, one of the important advantages of the carry storage scheme is that a new operand can be introduced at each step. The statistics of the carry after the introduction of the last operand were studied by Estrin, Gilchrist, and Pomerene [ l a ]and shown to be independent of the number of operands introduced. This means that it is possible to construct a high speed multiplication circuit in which a partial product and a partial carry representation are obtained in n or less addition steps. Complete msimilation of the carries obtains the final product. On the average less than log,n additional steps are required for the complete assimilation of the carries. 3.3 Improved Ripple Carry Circuitry
The standard realization of the ripple carry circuit is a chain of alternating AND and OR gates. Improved performance can be obtained using different circuit configurations. Improvement of this type is distinguished from improvement obtained by changes in the logical organization of the carry structure. Both examples which follow are basically ripple carry logic. 149
HARVEY L. GARNER
3.3.1 Exclusive OR Carry Logic
Exclusive OR carry logic is not new since this concept has been used in arithmetic units using relays. The equations for the exclusive OR carry are the same as those given for the ripple carry. However, the hardware implementation is different. Three switches are required per stage. A switch connects ci to c ~ + The ~ . switch is closed if and only if ri = a, 0 bi = 1. The second switch clamps c,+~to the “one” voltage level and is closed if and only if ki = a$, = 1. The third switch clamps ci+l to the “zero” level and is closed if and only if Zi6{ = 1. The carry chain has only one logical element per stage rather than the two required for the AND, OR realization. Kilburn et al. [30, 311 and also Salter [60] have proposed a circuit using saturated transistors to realize the required switches. Using a type SB-240 transistor, the carry logic for 20 stages operates on an 80nsec cycle. Only 20 nsec of this time is required for the propagation of the maximum length carry. The remaining 60 npsec are required to switch the transistors in and out of saturation. It is expected that the carry propagation could be cut to about 10 nsec if the length of wire in the carry chain were to be reduced. This circuit is characterized by a propagation of carries with a velocity of about one-fourth the speed of light. 3.3.2 Threshold Carry Logic
Equation (3.4) defines the logic of the ripple catry. This equation can be interpreted appropriately for threshold logic as follows: ci is a “one” if and only if two or more of the variables a,, b,, c, have values equal to one. A tunnel diode is extremely fast and can be used as a threshold device providing a “one” output when two or more of the three inputs are in the one state. Sear [61] and Daly and Kruy [I31 describe a ripple carry circuit using tunnel diodes as threshold devices. A propagation delay of 2.2 nsec per stage is claimed. 3.4 Logical Organization for Fast Addition
The adders considered in this section employ different logical techniques to obtain fast addition from those discussed previously. 3.4. I Factorization of the Simultaneous Logic
Various logical configurations requiring more than two logical levels may be obtained if auxiliary carry functions are defined. This approach overcomes fan-in and fan-out limitations imposed by physical switching 150
NUMBER SYSTEMS AND ARITHMETIC
circuits at the cost of added logic depth. An order of auxiliary functions may be defined. Each order of auxiliary functions adds a logical depth of two units. The particular auxiliary function scheme used by Weinberger and Smith [74]is a typical example. Let c{+,, = ciri . . . ~ i + ~v- kiri+1 l . . ~ i + ~v- l . v ki+a-gri+a-l
-
v ki+a-1
(3.16)
Two types of first level auxiliary function are defined: Type (1) ?-/ik(j+l)b-l
is defined for j = 0, 1,
,
- ri+bj
-
*
(3.16)
ri+(j+l)b-l
. . , such that
i < i + ( j + l)b - 1 < i + u -
1
0 <(j+ 1)b - 1 < a - 1
IfbS.a,let
[;I
= f,then
-
Yi+a-l
= ri+jb '
Type (2) 4+(j+l)b-l
= k$t/bri+jb+l
* *
(3.17)
ri+a-l
- - - ri+(j+l)b-2
v-
-
*
(3.18)
v ki+(j+l)b-l for j as defined for type (1). If b I a, then
. . . ~ i + ~v- l. . . v ki+,-1
~ i + ~= - lki+jbri+fa+l
Thus %+a
= caYi+jb-l
* * * ?-/i+a-l
v . . . v XiM-1
v xi+jb-l?li+jb-2
* * *
(3.19)
?/i+a-l
(3.20)
The formof the functions of types (1) and (2) may be changed and applied to Eq. (3.20) to obtain a new equation for ci+,, with a logical depth of six. The forms are changed by the following substitutions: y --+ r, x --f k, z -+ y, and w -+ x. This recursive process may be continued to obtain any required logical depth. 3.4.2 Carry Halving
The carry halving technique is due to Nadler [50] and is one of two techniques basic to the pyramid adder. The operation of the pyramid adder may be considered as a sequence of steps in either time or space. In each step a partial carry and a partial sum are generated. The partial carry and partial sums at each step are, in general, different from those given for the carry storage adder because, in the carry halving 151
HARVEY L. GARNER
process, partial carries are assimilated only at every other carry position which could be a nonzero. The carry halving process permits carry assimilation to occur over an increasingly large number of digits because prior to the j t h step, j > 1, all carries of either type caused by initiated carries at digit positions 25-1(2i-1)+2, . . . , i2jfor i = 1, 2, . . , have been assimilated. The carries at positions 2j-1(2i-1)+1 = 2,i = 1 , 2 , . . . , are assimilated at s t e p j f o r j = 1, 2, . . . . The carry c, is the disjunction of the previous partial carries in digit position x: c, = c:-1 c,
v , , . vc,’
(3.21)
1)
(3.22)
E{O,
Only one c:- 1, . . . , c,1 = 1. Thus at the j t h step the assimilation of the carry c, occurs over groups of 2j-l partial sum digits. Thejth step produces final sum values for digit positions 2j-1 1, . . . , 25, Consider the addition of two operands. I n step one the partial sum is determined for all even digit positions by assimilating the carries initiated at the odd positions:
+
€8’ (3.23) for all i > 0 such that 2 I i. Propagated carries which result from the assimilation process at the even digit positions are also determined in step one, as follows: c;+l = a&, v (a6 0 bi)ui-lbi-l = ki v riki-l, C;+l €C1 (3.24) for all i > 0 such that 2 I i. I n the second step the carries at positions 4i - 1, i = 1, 2, . , are assimilated. As a result of step one, the carries at all even positions have been assimilated. Thus, each carry at position 4i - 1can be assimilated over two digit positions. This is effected by using combinational logic which adds “one” to the low order digit of the sequence, Also during step two, the partial carry contributions to carry positions 4i 1 are determined: I fori = 1, 2 , . . , (3.26) 8ii-1 = U4i-1 CB b4i-l CB cgi-l 8i1 =
ai @ bi @ ~ i - l b i - 1 ,
86’
.. +
2 84i
= 8:i@ c4i-1(u4i-1
(3.26)
@ b4i-1)
2
(3.27)
%+1 = 8i%44i-1(u4i-1 @ b4i-1)
Carries assimilated during thejth step at positions 2’-1(i - 1) i = 1, 2, . may generate carries at positions i25 1, and
..
Cb+1 =
(c!-’
v-
*
-v
(~b-3(4Ll)+l)*
x = 25-1(2i
152
+
@ b~)(8:+1)(4+2)(8:+3)
+ 1 for
..
’ (8127)
- I) + 1
(3.28)
NUMBER SYSTEMS AND ARITHMETIC
Figure 1 shows the range and order of the sequence of carry generation and assimilation.
Step
I
*
''OD
''OD
1
x
Assimilate Generate .. .
x
Assimilate Generate
x
~
-
u u u u u x x x x X
Assimilate Generate
x
x
x
X
x
x
X
X
x
X
x X
X
I
U
Y
X
X
1
Y
X
I
X
X
Assim.ilate Generate
0Indicotes
I
range of tho assimilation
Correct sum digits
FIG.1. Carry mimilation and generation for carry halving.
In the above discussion we have assumed that c1 = 0. The logic used in step one can be designed so that c1 is assimilated with c, at step one to obtain the correct sum digits in the two low order positions. At the end of the j t h step the 2j lowest order sum digits are correct. Therefore the complete carry assimilation process is realized in no more than q steps where q is the smallest integer such that 24-' < n < 2 4 . So log,n < q < 1 + log+ The following example demonstrates the carry halving process. Example: Carry Halving Addition for A B.
+
s = 0000011101010001
c,, = 1. A B
steps 1
2
3
4
0 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 1 0 0 1 0 1 0 1 1 0 1 0
Assimilate Generate
S'
Assimilate Generate
SP
c4
0
Assimilate Generate
S8 C8
1
Assimilate Generate
S'
C'
' C
1
0
0
1
0
1
1 0
0 1
1 1
0
0
0 1
1
1
0
0 0 0 0
1
0 0
0 1
0
0 1
1 0 1 0 1
0
0 0 0 0 0 1 1 I 0
153
HARVEY L. GARNER
3.4.3 Condltlonal or Carry Select Adder
The conditional sum adder, proposed in 1961 by Sklansky [63], is similar to the pyramid adder in performance but the basic principle is quite different. In this adder as in the pyramid adder, no more than 1 log,% steps are required and the number of correct sum digits available at each step grows exponentially as a power of two. Basically, the conditional sum adder is a carry select adder. The correct sum cannot be determined in digit position i until the existence or absence of a carry in position i is determined. However, it is possible to determine the sums from position i to position 2i - 1 conditionally upon ci during the period that ciis being determined, where i = 2n, m = 0, 1, , . , , When ci is determined, the proper sum is then selected. The adder is organized so the carries cl, c,, c4, and c8, etc., appear explicitly in unconditional form. All other carries which appear in the logic are conditional. The notation qjlP)denotes a conditional carry; qjlP) is read, “The carry in the j t h position is a one if the carry in the qth position is a one,” q j l P )E (0, l}. In order to clarify Sklansky’s technique, let us consider a specific example: c8 is obtained at the fourth level of logic. In terms of third level logic (3.29) c8 = c(814)c4 v c(814)E4 In shortened notation, Eq. (3.29) becomes (3.30) 8 = (8 I 4)4 v (8 I T)T Equation (3.29) or (3.30) is read, “The carry at position 8 is a one IF the carry at position 4 is a one AND the carry at position 4 is a one OR the carry at position 8 is a one even if the carry at position 4 is a zero AND the carry at position 4 is a zero. The pertinent conditional logic a t the third level, in terms of level two logic is:
+
I 4 = (8 I 6)(6 I 4) v (8 I ’iT)(B I 4) 8 I 7 = (8 I 6)(6 I T) v (8 I 6)(3 I 9 8
The conditional logic at the second level in terms of first level logic is: 8 I 6 = (8 I 7)(7 I 6) V (8 I T)(’5 I 6) = 0 7 2 ) ~ v k78, 6 I 4 = (6 I 6)(6 I 4) V (6 I g)(g I 4) = VUgv4 v k6E0 8 I Z = (8 I 7)(7 I 6) v (8 I ‘i)(T I 6) = v7L6 v k7E6 6 IT = (6 I 6)(5 1 T ) V (6 I B)(6 I T) = v6k4 v kbg4 Note that the conditional terms required in the last set of equations can readily be obtained from the operand digits. This particular realization of the carry in terms of conditional carries is due to Sklansky. The logic is highly redundant and can be simplified, 154
NUMBER SYSTEMS AND ARITHMETIC
In 1962 a different realization of the carry select principle was proposed by 0. J. Bedrij [ 6 ] .In this adder the addend and augend are divided into subsections which are added twice t o produce two subsection sums. The two additions differ in that for one, a carry digit is forced into the low order position in each section and, for the other, no carry digit is forced at the low order positions. Special carry generation circuitry is employed to determine the correct low order carry in each subsection. The determination of the correct low order carry digit in each subsection permits the selection of the correct sum. In this case, all adder sections operate simultaneously to produce their respective sum and carry digits. It is desirable that the subsum generation path and the carry logic which determines the low order carry should have approximately the same logical depth, since no speed advantage is realized if the subsums are produced before the low order carries are available to select the true sum. Because of this, short ripple type carries can be used within subsections. The figures quoted by Bedrij in comparing two particular adders to the ripple carry adder are quite impressive. A speed factor of approximately 20 is obtained over the ripple carry adder for an increase in hardware by a factor of two. 3.4.4 Carry Skip
The concept of carry skip is apparently due to Babbage. This technique has been discussed by Burtsev [ l o ] ,Morgan and Jarvis [as],and Lehman and Burla [35, 361. In its basic form the carry skip circuit is a simple logical circuit which, when added to the ripple carry logic, permits a carry to bypass a sequence of circuit positions for which all ri are equal to one. This technique is also sometimes called the anticipatory carry. A portion of the standard ripple carry chain may be bypassed if kiri+l, . . . , ri+,, = 1. This condition can readily be detected by an AND gate with n 1 inputs. The output of the AND gate drives the OR gate at the i + n 1 position in the carry chain. Thus, this particular carry need not propagate the n AND, OR gates in the ripple carry logic, but, rather, i t is permitted to bypass these gates thus speeding up the efitire carry process. In the circuit described by Burtsev, and Morgan and Jarvis, the adder is divided into skip sections each containing an equal number of consecutive digits. Lehman and Burla [35] consider the question of what is the optimal number of sections of equal size. They show for an adder of length n divided into k sections of s bits that the optimum condition is obtained when s is equal to This result is obtained on the basis of a worst case analysis. The worst case is the situation where a carry is initiated in the least significant digit 155
+ +
4%
HARVEY
I. GARNER
and is propagated to, but not beyond, the most significant digit. This is the worst case because the carry must propagate both the low order and the high order sections. Lehman and Burla also consider the question of whether the adder should be divided up into nonequal sections. The answer to this question is in the affirmative, since the number of carry skip gates can be held constant while the length of the sections may be changed so that the number of digits in the low order and high order groups is reduced. The optimum configuration is obtained from the equal section size configuration by redistributing the size of the sections so that the section size increases by one for each group up to the middle section or pair of sections, and then decreases by one up to the high order section. This type of structural change always achieves a reduction in the worst case propagation time as compared to the equal section configuration. Finally, Lehman and Burla [35]consider the nesting of skip gates. No definitive answer exists for this. However, it is of interest to note that if the skip technique is carried to the ultimate limit, the resulting circuit is identical to the simultaneous carry circuit. 3.4.5 Modulo 2' Arithmetic
The technique of generating carries modulo 2' is a special type of skip circuit. I n this case every group of 8 bits is regarded as a modulo 28 number representation and the maximum length of the carry chain is reduced by a factor of l/8. An adder organization termed stored skip carry is considered by Metze and Robertson [47], and Heijn [28].I n this type of adder the basic arithmetic is binary but carries are modulo 2' (normally 8 > 2). Thus, carry storage can be used economically since the number of storage registers can be reduced by 118. Application of this type of adder has been discussed by Lehman [33]. 3.5 Conclusions
It is most likely that an optimum adder design will employ certain combinations of the techniques discussed in this section. For example, in ILLUCII carry storage is employed in conjunction with modulo four arithmetic. Within the arithmetic unit the sum representation is redundant due to the separation of the partial sums and partial carries. Carry and partial sum representations are maintained inside the adder until such time as it becomes necessary to determine the correct sign of the sum. For division, a small number of leading carries are completely assimilated using a simultaneous carry logic. 156
NUMBER SYSTEMS AND ARITHMETIC
A comparison of the effectiveness and the complexity of the different logical circuits for addition is difficult because such comparisons are very much dependent upon the exact circuit technology employed. Comparisons have been given by Sklansky [62], MacSorley [all, Lehman and Burla [35], and Lehman [32].Lehman's results are excellent in that upper and lower bounds of both complexity and speed are given. 4. Redundant Number Systems
Redundancy is a very important concept of modern machine arithmetic. In this section we shall discuss various proposals advanced to solve the carry propagation problem by means of redundant number representation. 4.1 Separate Carry Representation
Robertson [55, 57, 701 and Metze [46] have considered in detail various arithmetic schemes involving redundant number representations consisting of separate pseudo sum and carry representations. The pseudo sum is the same as the partial sum. The carry representation consists of the unassimilated carries. Full carry assimilation yields the oorrect sum. Among the schemes are: (1) Redundant representation consisting of a pseudo sum and carries restricted to the arithmetic unit. Full carry assimilation is executed only when necessary. This scheme is employed in the ILLIAC II [70]. (2) Redundant representation employed in the storage of operands as well as the arithmetic. (3) An arithmetic unit capable of both addition and subtraction with redundant representation consisting of a pseudo sum and a single representation for both carries and borrows (coincident carry, borrow storage). For redundant number addition, the basic carry storage adder is modified to conform to the following equations: si = #i-1 0 ~ " 10 (@-I v 2Si-lA"tl 1 (4.1) @ = 2[(Si-1 0 A"l)(Ci-l
v 2fj"-'A'+l)].
(4.2) The difference between this adder and the carry storage adder discussed previously is that carries due to the digitwise conjunction of the new operand and the partial sum are assimilated immediately into the new partial sum. Thus only the propagated carries are stored in the carry register. On the basis of a single digit the modified addition logic is: =
c; = (8;::
0 a;+,,
@
0 u;:;)(c;:;
v&'&1) 3-1 3 1
v 8;:;u;:;).
(4.3) (4.4)
157
HARVEY L. GARNER
This logic can be realized using only a single carry storage bit per stage because c3. i s5i
=
0
(4.8) for j = l , 2 , . . . , ; i = l , 2 , . . . . This fact is proved by the observation that 8; is the modulo two sum of two terms, and c i is the conjunction of the same two terms. A detailed study of the logic for a subtractor using separate borrow storage reveals that b; st = 1 where bi is the j t h digit of the borrow at the ith step,and 5; is as before the pseudo sum digit. Since bit?; = 1 and c; si = 0, carries and borrow can be stored in the same register. If the j t h digit of the carry-borrow storage is one, then 5; determines whether the digit is a borrow or a carry. The redundant notation within the arithmetic unit has far-reaching consequences. In particular, the sign digit no longer indicates under all circumstances the sign of the true sum. Because of this, the term indicator digit is used in place of asigned digit for redundant representations. Radix complement coding is generally preferred. Diminished radix complement coding or absolute value plus sign representation suffers because many operations in these codes require true sign information not provided by the indicator digit. A detailed study of a one's complement arithmetic unit using redundant representation was conducted by Metze [as]. The general result indicates that the use of one's complement representation leads to complications which can be avoided by using the two's complement representation. Consider for the purpose of simple explanation the radix point associated with the redundant representation to be a t the extreme left: k digits are added to the left of the radix point as an extension of the pseudo sum to serve as indicators. It has been shown that the total assimilation of carries can change the indicator digits by one. Similarly, for coincident carry-borrow storage the indicator digits may be changed by plus or minus one as a result of total assimilation of carries. Let the range of the indicator digits in the nonredundant representation be the set of integers from to 2k-2 - 1. Standard radix complement notation modulo 2k is used to represent the positive and negative indicator integers. Indicator digit addition is executed modulo 2k. In general, the k high order digits of an n bit number representation serve as indicator digits and fast carry assimilation is provided. Carries are assimilated only when necessary over the n - k low order bits. Indicator digit sums, I , obtained from one or both operands in redundant form, have the following ranges for a carry storage system: (1) Sum in range if I = (2k - 2k-a,. . . ,2k - 1,0,, . ,2k-2-2} (no overflow). 158
.
NUMBER SYSTEMS AND ARITHMETIC
( 2 ) Sum out of range if 1 = (2k-2,. , 2' - 2'4 - 2 ) (overflow). (3) Sum range depends on assimilation if I = (2k-2 - 1, 2k - 2k-2 - 11-
..
The three sets of indicator values are disjoint. If coincidence carryborrow is employed, the sets given above must be modified to account for the fact that assimilation may subtract one from the indicator digit sum. Thus 2k-2 and 2k - 2k-2 must be removed from the out-of-range and the in-range sets, respectively, and added to the sum range set dependent on assimilation. I n general, results in range two may be accepted as in-range and computation may proceed until a definite out-of-range condition exists. Absolute overflow detection for addition and subtraction of two operands in redundant representation without assimilation is possible, except for indication digits in range three, with a carry storage adder if k, the number of indicator digits, is greater than or equal to three. If the adder is of the carry-borrow coincident storage type, then k 2 4 is required. If the sign of the overflow is needed, then a n additional indicator digit is required in either case. If numbers are not stored in redundant form, then only one operand is in redundant form. For the carry storage adder, k = 3 provides both overflow indication and the sign of the overflow. I n general, the storage of a binary redundant representation consisting of 2n bits is not considered practical. However, it may be practical to treat the arithmetic unit as a binary coded radix 2" adder where m I n . Then the carry representation is reduced to nlrn bits. This amount of redundancy may be entirely acceptable. 4.2 Redundant Signed Digit Number System
Signed digit representations have been used extensively in conjunction with the process of multiplier recoding. A signed digit representation for a base r number system requires a set of r - 1 digit values representing a sequence of integers including zero. I n general, the set is chosen so that symmetry or approximate symmetry around zero is obtained. Avizienis [a] has proposed an arithmetic system, termed totally parallel arithmetic, which uses a redundant signed digit representation. In the basic addition operation of two signed digit representations, only initiated carries or borrows are assimilated. The need for further propagation or carry storage is eliminated because redundancy exists in each digit position. For radix r arithmetic each digit position is allowed to assume q values where q is greater than r. 159
HARVEY L. GARNER
Avizienis has shown that q has a lower bound such that, for smaller q, the carry propagation chains are not broken and an upper limit which is necessary to insure a unique representation for zero. The bounds are such that r + 2 I q < 2r - 1. An odd value of q is selected and complement coding is employed so that the q values are in the set A ; 1 ,..., - 1 , 0 , + 1 , . . . , + u }
A ={-a, -a+
For decimal representation r = 10 and 12 < q c 19. This means that the redundancy associated with the usual binary coded representation of a decimal number is sufficient for the purpose of signed digit arithmetic. Note also that the signed digit arithmetic bounds for q yield the restriation that r > 2. A transfer digit is a carry or borrow. Transfer digits generated in signed digit arithmetic operations have values of plus or minus one. Consider the conventional radix r addition for the minimal-redundancy symmetric signed digit number system. Let 2
,...,0,...,
then si, a,, and b,, the sum digit and the two operand digits in the ith position, are elements of B for all i if the following rules are followed: ti =
1
if ui-,
>w
ti =
0
if
< ui-, < w
-w
ti = -1 if ui-l < -w where w = reven/2= (rdd - 1)/2. Let U$ = a,
(4.6)
+ bi + ti
(4.7)
- rt+,.
(4.8)
then si = u,
Note that either --reven/2or r,,,,/2 is a redundant value, since one or the other could be deleted from the set B without changing the completeness of the representation but this would destroy the symmetry. In general, symmetric sets of values are preferred because an operand may be negated by changing the sign at each digit position. A carry chain may exist if addition is defined as by Eqs. (4.7) and (4.8). However, if ai and bi are elements of the redundant set B, the carry chains are broken if addition is defined by
+ bi
(4.9)
- rti+, + ti.
(4.10)
~i=
a,
and tc is defined by Eq. (4.6) si = ui
160
NUMBER SYSTEMS AND ARITHMETIC
But si is not an element of set B when (ui, ti) = (w,1) or (-w, -1). This situation can be handled if set B is modified to include at least one redundant value at both the extreme positive and negative values. Set A has this property. Hence for addition as defined by Eqs. (4.6)) (4.9), and (4.10)) ai,bi, and siare elements of set A. I n the signed digit representation the sign of a number is indicated by the sign of the most significant nonzero digit. The representation for zero is unique and is the representation which haa all digits equal to zero. Additive overflow can be either positive or negative and both conditions can be detected by an examination of the two most significant digits of the sum. However, due to the fact that the number system is redundant, the characteristics of overflow detection are similar to those found for systems with separate carry representations. Consider an n + 1 digit signed number system representing numbers between + I and -1. An overflow is always detected for those numbers with a magnitude greater than 1
+ (A)
(r-1
- r-")*
(4.11)
Overflow is never indicated for numbers with a magnitude less than 1
+ -1r - (L) (r-1 - r-") r-1
(4.12)
For the numbers between these two ranges overflows may or may not be indicated, depending on the particular representation which occurs. The maximum digit value is represented by (a) in Eq. (4.11) and Eq. (4.12). For example, for each redundant digit in decimal arithmetic, the values allowed are the integers from - 6 to +6. Thus r = 10 and a = 6. Neglecting the effect of the terms r-", it is observed that the ovedow detection scheme indicates an overflow for all results greater than 32/30. No overflow for results less than 31/30 is indicated, and overflow may or may not be indicated for results lying in the range between 31/30 and 32/30. Multiplication and division are carried out essentially in a straightforward manner except that multiplication requires an overflow check. In division the redundant quotient digit representation of the SRT type is used rather than a conventional nonrestoring division process. (SRT division is discussed in Section 6.) Preliminary investigation into the complexity of a signed digit adder has yielded rather favorable results. In particular, an investigation for decimal arithmetic has shown that the signed digit number adder to be about two times more complex than the excess three adder. Redundant number representations with limited carry propagation are particularly suited for arithmetic operations which generate results 161
HARVEY L. GARNER
in a sequence starting with the most significant digit. Avizienis [5] has discussed this type of arithmetic for the redundant signed digit representation. Addition and subtraction offer no particular complications since transfers do not propagate. The existence of a transfer at the ith digit position is dependent only on the operands at the i - 1 digit position and these may be examined prior to the addition at the ith digit position. Division is not unusual since most division methods produce the quotient digits in a sequence starting with the high order digits, Products must be checked for overflow, but otherwise multiplication processes are conventional, except for the order in which the multiplier digits are selected. Some additional complications appear because the multiplier digits have nonbinary values. A redundant signed digit arithmetic unit is, at present, being designed by Avizienis to be used with Estrin’s [15, 161 fixed plus variable computer, 4.3 Extended Digit Number System
The extended digit number system has been studied by Chamberlin
[Ill.The basic concept employed is the use of redundancy in each digit position in order to break carry propagation chains. However, the values of each digit are restricted to positive integers. The following equations define addition on a digitwise basis: pj
= (a,
si
=
Ci+l
(Pi
+ b, - c i + l T )
+
E
Ci) E (0,1, * *
(0,1,
*
. . ,T
. ,r + 1)
=O
ifai+b,
=I
if r < a i + b i < 2 r
=2
if a,
+ b6 2 2r.
- I}
(4.13)
(4.14)
(4.15)
Extended digit systems exist only for r > 2 and at least two redundant digit values are required. These requirements are identical to those obtained for redundant signed digit number systems. Similar requirements obtain for basic subtraction of numbers expressed in extended digit form. The diminished radix complement of a digit x,where 0 < x 5 T + 1, is defined as f = r - 2 - 1. This definition introduces negative digit values. This difficulty can be removed by using standard signed digit recoding techniques, but this adds the requirement of borrow propagation. Hence, c, is replaced by ti E { -1, 0, 1, 2}. This plus the restriction that all digits should have only positive values limits extended digit complement coded arithmetic to r > 3. The extended digit number system does have a unique representation for zero and has the further advantage that no input conversion is required as is the case for signed digit representation. Extended digit 162
NUMBER SYSTEMS AND ARITHMETIC
arithmetic for both direct subtraction or complement coded subtraction is dependent upon the reduction of extended digit representations to conventional nonredundant representation in order to determine sign. I n other words, unassimilated carries which are stored in the sum digit may change the sign digit'. Chamberlin considers only a binary sign digit. There is no reason for not using a multivalued sign digit. This digit then serves as an indicator (as discussed in Section 4.1). If an indicator digit is employed, then both positive and negative overflow can be determined for most indicator digit values. Conditional indicator values would require conversion to conventional form to determine overflow. A similar conversion would be required to determine sign. The advantage obtained follows from the fact that many basic arithmetic processes consisting of additions and subtraction can be controlled by indicator status rather than sign status. 5. Multiplication
The most significant recent advances in machine multiplication are direct consequences of the concept of signed digit multiplier coding. The signed digit code provides a means for reducing the number of add type operations required for multiplication and increasing the average shift length. 5.1 Multiplier Coding The use of the signed digit code to improve the multiplication process is not new and has been used extensively on desk calculators. Recent requirements for high speed multiplication have forced the machine designer to use this technique. The studies of Tocher [?'I], Lehman [34], and Smith and Weinberger [64] are of particular interest. The additional logic required for multiplier coding in a parallel arithmetic unit is trivial. The following identity is basic to multiplier coding: 2kfn31
- 2 k = 2k+n
+ 2k+n-1 + . . . + 2k.
(5.1)
The identity indicates the possibility of replacing a sequence of add and shift operations in multiplication by a subtraction and an addition. Coding may be performed sequentially starting from either the high order or the low order digit positions. Only coding starting a t the low order digit position is considered in this paper. A sequence of n digits may be coded in one step as a function of n 1 digits. The one bit per step transformation of the binary number A to the signed digit coded number B is defined by a( + bi for i = 1, . . , n using the rules given in Table I.
+ .
163
HARVEY L. GARNER
TABLEI R m s FOR CONVER~IONBINARYTO CANONIUAL SIUNEDDIUITCODE
0 0 1 -1
1 -1 0 0
Examplee; (1)
110111010
=
1ooi oo io 1o (2) 0010011100
=
ooi o iooioo where 7 A - 1. The rules given in Table I produce a canonical signed digit coding with the property that every pair of digits with unit magnitude is separated by at least one zero. The canonical coding also has the minimal possible number of nonzero digits. Reitwiesner [53]has proved the existence of a canonical signed digit representation for each binary representation. The statistics of the canonical signed digit code and the binary code are important t o the machine designer. The probability of a “one digit” is one-half, and the expected number of zeros between “one digits” is one for the binary code. The probability of a “one digit” with either sign is one-third and the expected number of zero0 between digits of unit magnitude is two for the canonical signed digit code. Let ps(v) denote the probability that a pair of unit magnitude digits is separated by v zeros. For the binary code pa(v)= 2-”-l, v = 0, 1, . . , and for the canonical signed digit codeps(0) = 0 andp,(v) = 2-”, v = 1 2, . . .
.
5.2 Multiplier Logic
The multiplication of two n bit operands can be accomplished using a parallel adder with a right shifting accumulator. Multiplier digits are 164
NUMBER SYSTEMS AND ARITHMETIC
examined in sequence beginning with the low order position. If the multiplier digit is a one, the multiplicand is added to the contents of the accumulator and a right shift of one digit position is executed. If the multiplier digit is a zero, then zero is added to the accumulator and a right shift is executed. Thus n additions and n shifts are required. If the addition operation is omitted for zero valued multiplier digits, then, on the average, only n/2 additions will be required, since the probability of a one is one-half. If the multiplier is coded in the canonical sign digit form, then the number of additions or subtractions is, on the average, equal to n/3. Subtraction is now required but this is no problem since subtraction is available and requires the same time as addition. When the multiplier is expressed in the signed digit code, the partial products in the accumulator appear in two’s complement code. It is necessary to fill in the sign digit after the right shift. The shift does not change the sign. The sign digit is filled with the value existing prior to the shift. The statistical results suggest further refinements in the design of the multiplier. A sequence of v zeros between two digits of unit magni1 digit position shift. Special shift circuits must be tude permits a v added to the logic to obtain the available reduction in the total number of shifts required. If logic were provided to obtain all possible shift lengths in one step, then the average shift length for the uncoded multiplier is two digits and the average shift length for the canonical signed digit coded multiplier is three digits. An average shift length of 1 3 is obtained for the uncoded multiplier using logic which permits single as well as two digit shifts. If three digit shifts are also permitted the average shift length is If. Single and double length shifting for a multiplier in the canonical signed digit code obtains an average shift length of 14 digits. Suppose that the multiplier digits are coded and that all shifts are two digits in length. This is essentially base four arithmetic. The multiplicand and two times the multiplicand must be available. This multiplication requires, on the average, n/3 additions or substractions and exactly n/2 shifts. Base eight arithmetic can be used to reduce the number of shifts required to n/3. Each shift is exactly three digits in length and multiplicand multiples of 1, 2, 3, and 4 must be provided. Only multiplicand multiples of 1, 2, 3 , and 4 are required since signed digit code representation is used. The average number of add-like operations is less than n/3, on the average. The various schemes for multiplier logic are summarized in Table 11. Carry propagation has not been considered. It is possible to obtain further decreases in the time required for multiplication if the sequence of multiplicand-partial product additions can be executed before the 165
+
HARVEY L. GARNER
TABLEI1 DIFFERENT MULTIPLIER CONBIaURATIONS
COMPARISON OB
Additions Standard multiplier Skip zero add Skip zero add single and double shift Skip zero add single, double, and triple shift Skip zero add ell possible shifts
Average shift length
n
Coded multiplier Skip zero add single and double shift Skip zero add all possible shifts Double shift only Triple shift only
n/2 eve
1 1
n/2 ave
It
nl2 ave
14
n/2 ave
2
n/3 ave
19
n/3 ave n/3 eve < n/3 ave
3
2 3
carry assimilation has been completed. Carry storage and additional adders can be used since it is known that it is only necessary to allow the carries to propagate one position between additions. Notice that, as a result of the one digit position right shift of the accumulator, it is not necessary to propagate carries relative to the adder logic. In particular, the adder logic for single shift multiplication is
S,i
ci
(as defined below)
=
t(Si)*
-
i(P-1@ p
=
(Pi-' @ AGl)@-1 "Si-lAifl.
- 1
@
Ci-l)*
The addition involves only n digits, hence if Si
si = t(si)* = (S,,, S,,,
a,,-,,
(6.2)
= ( s ~s,,-~, ,
.. . ,
(5.3)
. .. ,a1), then
S2).
A similar situation exists for the double-shift-only logic or the tripleshift-only logic if modulo four or modulo eight carry logic is used, Multiplication schemes using multiple shift lengths require additional logic to shift the carries. This oomplexity must be considered against the complexity required for the generation of the multiplicand multiples needed when the multiplier logic has only a single multidigit shift. Other multiplier configurations are presented and evaluated by MacSorley [all. 166
NUMBER SYSTEMS AND ARITHMETIC
5.3 Multiplication for Complement Coded Operands
Multiplication is normally executed in a digital machine for operands expressed in either the magnitude plus sign code or in the two’s complement code. Multiplication for one’s complement coded operands has the characteristic disadvantage that end around carries may be propagated a t any step in the multiplication process. The assimilation of end around carries can be accomplished by using a double length arithmetic unit, or the carries may be accumulated and added to the low order product at the end of the multiplication cycle. Early algorithms for two’s complement multiplication [8,581 were also characterized by the need for multiplicative corrections either before or after the main multiplication cycle, Complement coded multiplication using Booth’s [7] method requires no corrections before or after the main multiplication cycle. It is not necessary t o know the sign of either of the operands and the product appears in the correct two’s complement form. Booth’s method is the same as ordinary multiplication except that the multiplier is coded using the signed digit code. A model provided by Garner [23]can be used in the study of complement coded multiplication. 5.4 Nonstandard Multiplication Logic
I n any cyclic multiplicative group every nonzero element can be expressed as a power of a generator of the group. Groups of prime order, p , are cyclic and every element of the group has the property: up =
(5.4)
1.
Furthermore, any element other than the identity is a generator of the group. Thus, for a, b, g E a: a = g*a (5.5)
b = gib where i, is called the index. Obviously, ab iab = l i a + i b l p
(5.6) =
giaband (5.7)
Thus multiplication is accomplished by the modulo p addition of indices. This type of multiplication has been studied for Mersenne primes by A. Fraenkel [18]. I n general, the characteristics of the conversion process required between standard binary and index notation cancel the inherent advantages of this scheme. Mitchell [as] has given an algorithm for approximating the base two logarithm of a binary number. The characteristic of the logarithm is equal to the base two logarithm of the weight of the most significant 167
HARVEY L. GARNER
digit. If digit positions are numbered 0, 1, 2, . . . to the left of the point and -1, -2, . . . to the right, then the digit position of the most significant digit equals the characteristic. The binary number excluding the most significant digit is the approximation for the fractional part of the logarithm. Example8 ; lOg~(OO101.) 10.01 10g,(01000.)
=
11.000
l ~ ~ ~ ( O l 1 1 1 .11.111. ) The algorithm yields results with no errors if n = 2’ for integer values of k. Straight line interpolation is obtained between these values of n. The absolute error in the logarithm is less than 0.086. The algorithm for obtaining the inverse logarithm is obvious. Multiplication and division operations using the logarithmic approximations result in errors as large as -11.1% and 12.5%, respectively. 6. Division
Ideally the division x/d = q should require no more additions or subtractions than the multiplication x = dq where d is coded in canonical signed digit code. In other words, division should be the exact inverse of multiplication. The machine division process has three parts: initialization or standardization, quotient generation, and remainder determination. In this section we consider only the problem of quotient generation, Appropriate initialization to obtain x,, as defined in Section 2 is assumed. 6.1 Nonrestoring and Restoring Division
In order to simplify the discussion it will be assumed that both the dividend and the divisor are positive numbers. For the binary case, quotient generation by the restoring division method is described by : where
x,+~ = 2x, - q,+&, j = 0, 1 , . . .,n - 1 2 4 = partial dividend at the start of the j t h step Xj+I = partial remainder after the j t h step x o = dividend x,, = remainder (final) d = divisor
Q 168
=
(ql,.
. .,q,,) = quotient.
(6.1)
NUMBER SYSTEMS AND ARITHMETIC
Here qj is selected so that d > xj+l 2 0. It follows that qj+l E (0, 1} if xo < d. If the condition xo < d is not satisfied, then a larger set of quotient values is required. This is not desirable. A t each step, x;+l = 2xj - d (6.2) is obtained. If x;+, 2 0, then qj+l = 1 and xj+l = xi+l.If x;+, < 0, then qj+l = 0 and restoration is required:
+d
xj+1 = ~ ; + 1
After n steps, Xn
=
xn2-k
=
+ . . . q,)d + . . . + 2-kqn)d.
2”x0 - (2”-lq1 x02n-k - (2-”-‘-’q1
(6.3) (6.4) (6.6)
Thus restoring division requires n subtractions, n one-digit position shifts, and m additions. If all possible quotients are equally probable, then m = n/2 since restoration is required whenever qj+l = 0. The division algorithm is dependent on the comparison of the sign of xi+1. Exact sign determination requires complete assimilation of all of the carries of the partial remainder. Nonrestoring division eliminates the restoration step. Equation (6.1) defines the recursive process but qj+l is selected such that I xj+l I < d. If xo < d, then qj+, E { - 1, l}. So at each step the divisor is either added or subtracted from the partial dividend. The divisor is subtracted when 2xj > 0 and added when 2xj < 0. If 2xj = 0, the process may be terminated if this condition can be detected. The resulting quotient is expressed in a signed digit code. This particular signed digit code is neither canonical nor minimal and is maximal with respect to thenumber of one digits. Nonrestoring binary division requires n additions or subtractions, n single-digit shifts, and precise sign determination. The type of chart shown in Fig. 2 is due to Robertson [56]. On this chart the y axis represents the values of x ~ +The ~ . x axis represents the values for 2xj. If d > xo, then -d < xj+l < d and -2d < 2xj < 2d and qj+l E {-1, l}. The line 2 5 - d = xj+l is designated qj+l = 1, since qj+l = 1, if 0 < 2xj < 2d and the remainder is The line 2xj + d = xj+lis designated qj+l = - 1, since this is the correct quotient value for -2d < 2xj < 0, and the remainder is x ~ +Two ~ . examples of the relationship between x ~ +2xj, ~ , and qj+l are shown by the dotted lines in the second and fourth quadrants. Together, the two examples completely characterize the division of Qd by d. If xo = i d , then 2x0 = &€; q1 = 1 and x1 = - i d . For the second step 22, = -&€, qp = -l,andx, = +d.Forthethirdstep2xa = #d,q, = l,andx, = i d . This continuesforj = 4 , . . . ,n - 1. 169
HARVEY L. GARNER
FIU.2. Binary nonrestoring division.
The chart also shows the need for accurate sign determination in order to specify the correct quotient digit. The basic division process can be easily generalized for fixed base number systems with base r . For restoring division j = 0 , 1, . . . , n - 1 xj+l = rxj - qj+&, (6.6) < d. If x o < d, then where qi+l is selected such that 0 S qj+l E (0, 1 , .
. . , r - l}.
For base T nonrestoring division, Eq. (6.7) defines the recursive process and qj+l is selected such that -d < I xj+l I < d. If zo < d, then: qj+l € { - r 1,. . . , - l , l , . . . , r - l}. One might suspect that the set of quotient digit values for base r nonrestoring division is redundant. This is indeed the case. Even digit values can be removed is 2 % r . If 2 % r , then the quotient set qj+l € { - r 1, -r 3 , . . . , - 2 , 0 , 2 , . * , T - 1) is sufficient. This can be seen by studying and extending the chart given in Fig. 3. Because of the redundancy in the set of quotient values, there is always two quotient choices except for -d < rxo < d, rxo > (-r l)d, and rxo > ( r - 1)d.
+
+
+
.
+
6.2 Generalized Nonrestoring Division
Let the division process be defined by Eq. (6.7) and consider the quotient set qj+1 € { - - P 1 , . . , , - l , O , + 1 , . * . , r - 1). The effects of the degree of redundancy in the quotient digit set has been studied in detail by Robertson [56].
+
170
NUMBER SYSTEMS AND ARITHMETIC
Smaller sets of quotient digits may be used if the range rxj is limited. Let I rxj I max = krd where 0 < k I 1. Then -krd I rxj I krd and -kd I xj+l 2 kd. I n particular, if x o I kd then k 2 x,/d = q. The condition q 5 k does not restrict the division process to problems for which the quotient is less than or equal to k. It does dictate that scaling will be used. Given 1 > q = xo/d > k, if x o is shifted one digit position to the right, then q’ = x(r)-l/d < k. The machine obtains q’ and converts to q since q’r = q. In general, the division techniques discussed in this section are most easily implemented in systems using floating point representation. I n such a system the representation of d is always standardized and xo is shifted to scale q. Reference to Fig. 3 will show
FIU.3. Nonrestoring division bam 4.
that k 2 4. If k < 4, then there exists values of rxi for which no value of qj+l exists which will keep kd 4 xj+l < kd. It is desirable to select only those values of k for which the horizontal line representing the upper bound of xj+l and the vertical line representing the upper bound of rxj intersect the line xj+l = rxj - fd. Then kd = rkd - f d so
Since k 2 +,it follows that r-1
f>-.
Adivision process is specified iff and r are given, The minimal quotient is the set of integers { -f, . . . , - 1, 0, 1, . . , ,f }. Notice that the range of rxj covered by a given qj+l overlaps the range covered by qj+l 1 and qj+l - 1 unless k = 4. A given qj+l may be selected as the quotient digit
+
171
HARVEY L. GARNER
+
if qj+,d - kd < xj 5 qj++l kd. Let H be the magnitude of the overlap in the values of rxj between two succeesive values of qj+l:
Substituting k from Eq. (6.7) yields
H
=
a r(- 12
- 1).
(6.10)
Truncated dividend and divisor representations are used to speed up the quotient determination process. The truncated dividend consists of the t high order digits. Complete carry aesimilation is executed only over the t high order digits of the truncated representation. The truncated representation serves as a multi base indicator digit, Also, since the divisor is normally in standard form, it is advantageous to compare the partial dividend against predetermined constants rather than the divisor proper. The above schemes are used only in the process of the quotient determination. The partial remainder calculation must use the nontruncated dividend and divisor representations. Robertson [56] has developed a formula for edimating the precision required for quotient determination. Let
a<xj
Aa
(6.11)
b g d i b + A b
(6.12)
where a and b are the truncated values of xj and d, and A a and Ab are the maximum errors due to truncation. Robertson has shown that the uncertainty in x , + ~due to the truncation of xj and d is: AX^+^ = d ( a +b b Ab
+
=
a[$+?(
b 1
+b-lb-lAbdb)].
(6.13)
Correct quotient determination requires that r Axj+, be less than the magnitude of the overlap of successive quotient digits in xi: r Axj+l
< a(f
r-1
- 1)
(6.14)
Thus:
Aa 172
(6.16)
NUMBER SYSTEMS AND ARITHMETIC
Assuming a fractional interpretation, the left side of Eq. (6.16) is maximized if b = dmin= r - l ; also 2f - 1 amax= -. (6.16) 2r This assumes the standardization of the divisor,
If r Ab
) < -r--2fl1.
(6.17)
2f - 1. <-
(6.18)
Ab
rAa+2f--1( 2
l+rAb
< 1, then:
2f - 1 Ab 2
rAa+----
r - 1
For the example r = 4, n = 2 ; a < 1/66, indicating that seven binary digits are required for the comparison. For r = 10, n = 7; a < 11297, indicating that three decimal digits are sufficient for the truncated partial dividend and divisor representations, 6.3 SRT Division
Consider binary division for f = 1; i.e., k = 1. The Robertson diagram is shown in Fig. 4. One set of rules for quotient determination is the following : qj+l = 1 if 2 5 2 0 pj+l
=
qj+l
=
0
-d
< 2xj < d
(6.19)
if 2xj < 0.
-1
It is desirable to choose qj+l
if
=
0 if possible. If qj+l
=
0 only a left shift
Ix'+'
FIQ.4. SRT division.
173
HARVEY L. GARNER
is required at the j t h step in the division process. This can be accomplished by the following rules for quotient determination : qj+l
=
1
if 2xj 2 d
qj+l
=
0
if
qj+l
=
-d 5 2xj
if 2 5
-1
(6.20)
< 4.
In general, a full comparison of 2xj is not desirable. Fast division methods require a fast comparison procedure. The divisor is standardized, so for a fractional interpretation dmln= #. Consider the following set of rules for quotient determination: qj+l
=
1
if 2 5 2 #
qj+l
=
0
if
qj+l
=
-1
=
K
-4 2 2 5 < # if -22. 3 < -# = - K .
(6.21)
This set of rules defines SRT division, which was proposed independently by D. W. Sweeney, see [41], Robertson [56], and Tocher [7l]at about the same time. SRT division employs a comparison constant K = # in the quotient determination. The rules for SRT division have the effect of standardizing the partial remainder by shifting over leading zeros if the partial remainder is positive and leading ones if the partial remainder is negative as expressed in two’s complement code. Example :
j
a
=
0.1101
At step a, 2xj
=
0.000110
2xj
Pr+ 1
0.000110 0.001100 0.011000 0.110000 1.1 11000 1.110000 1.100000 1 .oooooo 1 * 101000 1.010000 0.001000
0 0 0 1 0 0 0
-1 0
-1 0
Operation Shift Shift Shift Shift, subtract d Shift Shift Shift Shift, add d Shift Shift, add d Shift
Since 4 < d < 1, one would expect the eficiency of the SRT method to be a function of the magnitude of the standardized divisor. Freiman 174
NUMBER SYSTEMS AND ARITHMETIC
[I91 has shown that the SRT method produces minimal but noncanonical signed digit coded quotient representations if Q I Id I I 4. The average length of shift for this range of divisors is 3. If d = $, the quotient code is canonical. The overall average length of shift for the SRT method is 2.60. The average shift length is one greater than the average number of consecutive quotient digits having the value zero. A binary division algorithm has been given by Reitwiesner [54], which yields quotients in canonical minimal form. However, this method requires a full length comparison of the divisor and the partial dividend at every step. It may be desirable to modify the SRT method so that complete carry assimilation is not required prior to the quotient determination, Robertson [55] has proposed the use of a comparison constant K = 4. An approximate value of 2xi is obtained by truncating the unassimilated representations of 2xi and assimilating, with a fast carry circuit, the truncated representation o f t bits. The sign bit is included in the t bits. The truncated and untruncated partial dividends are related by 2xj 5 2x;
+241.
(6.22)
Substituting into Eq. (6.22) gives =
1
if 2xj’ 2 3 - 2 4 1
qj+l =
0
if
qj+l qj+l
=
-1
if
=
K
+ 2-&l < 2xj < 3 - 2-@1 -2xj’ < -& + 2-&1. -3
(6.23)
The smallest t which gives a good set of rules for quotient determination is t = 3. Thus K = 4, and at each step in the division process, the carries are assimilated in the three high order digit positions to obtain the indicator digit xi’. 6.4 Modified SRT Division
The basic SRT division method can be improved. This has been discussed by Wilson and Ledley [75], Freiman [19],Metze [45], and MacSorley [all. The basic difficulty with the SRT method is that sometimes the shift is one digit too long and at other times one digit too short. That is, a shorter or longer shift will yield a longer sequence of zero quotient digits. Wilson and Ledley give ten rules involving the five most significant bits of both the divisor and the partial dividend. The sign is excluded since the Wilson and Ledley paper uses the magnitude plus sign notation, and complete assimilation is assumed so the sign is never in doubt. Let 6 be the length of the shift required to standardize the 175
HARVEY L. GARNER
partial dividend. If d 2 Q, then for certain combinations of values of a? and 2xi, a shift of 6 + 1 is needed rather than 6. On the other hand, if 8 5 d < Q, then a shift of 6 - 1rather than 6 will sometimes be needed. A standardizing shift of 8 digits can be used to obtain the same partial remainder as the method using 6 1, and 6 - 1 shifts is d, and 2d are available, Obviously, the subtraction of +d from the standardized partial dividend yields the same remainder as the subtraction of d from the standardized remainder shifted one position to the left. Likewise, the subtraction of 2d from the standardized partial dividend leaves the same remainder as the subtraction of d from the standardized dividend shifted one digit position to the right. MacSorley discusses a divisor circuit using two adders. If d > Q, then the partial dividend is obtained by a 6 digit shift which normalizes the remainder. The two adders are used to obtain
+
v,
&?I
(6.24)
- a.
(6.26)
r;+l = 26xj r,+l = 26x,
The adder containing the remainder requiring the largest 6 for standardization is selected for the next step. Similarly, if 5 d < 2, then the multiples ld and 2d are used. Metze has used Freiman’s analysis of the SRT method to obtain a class of modified SRT divisions. The division algorithms given by Metze have the same performance characteristics as the Wilson-Ledley algorithm but the procedures involved are quite different. SRT division with K = 9 yields minimal quotients for Q 5 I d I 5 9. On a Robertson digram (Fig. 4), ad 5; K 5 #d for this range. The trick is to keep the boundary defined by #d and gd fixed while changing the range of d and K. We have
+
K
= 6d,ower
(6.26)
K
= adupper.
(6.27)
Let &ow,r = Q, then K , = Q and dupper= f%. This process may be continued in both directions until comparison constants for < d < 1 have been obtained. The first step in this division process is the determination of the divisor range. Associated with each range is a specified comparison constant. The division process then proceeds using the standard SRT process with the special constant. Minimal signed digit coded quotients are obtained over the range of the standarized divisor because of the modified comparison constants. The precision of the quotient determining process is a function of the number of regions used. Metze gives one algorithm with four divisor 176
+
NUMBER SYSTEMS AND ARITHMETIC
regions requiring divisor determination precision of 2 4 and the quotient determining comparison with a precision of 2P. A second example is given with five regions. In this case, both divisor range determination and the quotient determining comparisons require a precision of 2-4. 7. Residue Number Systems
The concept of residue classes is very ancient as evidenced by the existence of the Chinese remainder theorem. Recently residue number systems have been studied extensively to determine whether such systems can be used effectively for digital computation. The initial study of a machine number system based on residues for an integer interpretation is due to Valach [72].Other studies of interest are found in Aiken and Semon [ I ] ,Garner [21, 22, 241, Svoboda and Valach [66],Svoboda [65],and Szabo [68].Reference [66]is of particular interest since Svoboda considers a rational fraction number system interpretation using separate residue encoded representations for the numerator and the denominator. 7.1 Basic Characteristics
A residue representation for the number A consists of the set of = 1, . . . , n ; where I A I is the least positive residue of A modulo mi. By definition:
I A I n?l for j
[A/mj]is the largest integer part of the quotient q = A/mj. The least positive residue has the property 0 5 I A I mi < mi. The residue number system is a linear, weighted mixed base number system. As with all mixed base number systems, a shift is not a meaningful operation. Residue number systems may be either redundant or nonredundant and complete as determined by a choice of the digit bases. A number system is a residue number system if and only if the 0 mod M . The cardinality of the digit weights are such that mjpj residue number system is equal to the product of the base cardinalities. However, the cardinality of the interpretation set is equal to the least common multiple of the base cardinalities. Integer interpretation with a scale factor of unity is the natural interpretation of the residue number system. Any other interpretation requires scale correction for multiplicative operations. As indicated above, such correction cannot be accomplished by a simple shift. Complement coding, signed digit coding, or magnitude plus sign coding may be used to obtain both positive and negative number representation. 177
HARVEY L. GARNER
The main advantage of the residue number system is that all carries are congruent zero to the modulus of the system and therefore do not appear. Addition can be executed with complete independence of digit positions on account of the absence of carries. Multiplication need not be executed by repetitive addition but can be obtained in a one-step process. I n the j t h digit position the sum is defined by I I a I m j I Y I m! mj and the product is I I 2 I m, I Y I m, yj* A machine arithmetic structure with an integer interpretation 1s normally considered less desirable than a structure with a fractional interpretation. An integer interpretation requires a check for multiplicative overflow, A fractional interpretation requires a check for multiplicative underflow. A conventional computer can be used in the integer interpretation if low order products are retained in place of the usual high order rounded products. A conventional machine stripped of overflow detection and used in an integer interpretation is a reasonably accurate model of the innate system properties of residue numbers. Consider the limited class of problems with the following characteristics: (1) integer interpretation, (2) only addition, subtraction, and multiplication are required, (3) the sign and magnitude of the result are known to within a multiple of the modulus. For this class of problems, overflow detection and sign detection are not required and the results obtained have no round-off error [69]. Thus a technique with limited application exists for executing computation without round-off error. Many important algorithms such as matrix inversion can be given in a form suitable for integer computation. I n a fractional interpretation round-off occurs because low order products are dropped in the rounding process. The integer computational mode described above permits no round-off error, but magnitude and sign information are lost. It was shown in Section 2.2 that the residue number system is a member of a large class of number systems which includes a mixed base number system with conventional carry properties. The conventional mixed base number system requires carry propagation but possesses partition properties which permit a one-step sign detection scheme. On the other hand, addition in the residue number system has no carries and is executed in one step. The partition properties of the residue number system are such that sign detection must be obtained by some recursive process. Sign detection schemes for the residue number system involve a transformation from the residue number system to the mixed base number system. Many different techniques exist for accomplishing this transformation. However, all techniques are recursive and the information required for the sign determination is not obtained until the last step. The detailed logic for sign detection has all of the characteristics of the carry propagation process of the mixed 178
+
I
I
NUMBER SYSTEMS AND ARITHMETIC
base number system. It appears that the process of addition and sign detection is essentially recursive. At the extremes represented by the residue and conventional mixed base number systems, the recursive property if associated with either sign detection or addition. Number systems such that both addition and sign detection are recursive exist, but appear to require more complicated logic than either the residue number system or the mixed base number system. For general computation the major disadvantages of the residue number system are associated with sign detection, detection of additive and multiplicative overflow, division, and scaling. Algorithms for each of the above operations are known. In most cases the complexity of the algorithms leaves much to be desired and the possibility of a reduction in complexity is doubtful. Detailed discussions for the solutions to these problems is found in Garner [24],Keir [29],and Szabo [67,681. 7.2 Applications
The use of the residue number system in special and general purpose computers has received considerable attention. At least one small general purpose, stored program computer with a residue arithmetic unit has been constructed [51]. Cheney [I21 considers the design of a digital correlator using residue arithmetic. Analog signals are converted to binary code and then to residue coding. The correlator obtains
n:-l
mi, and N , the number of The number system modulus, M = samples, are chosen equal. The reason for this is that the division of a residue coded number by mi is a reasonable process. The correlator is apparently no more complex than a correlator using standard arithmetic, and the speed of computation is increased by a factor of ten. This factor is due to the fact that multiplication and addition are executed in one time step in basic residue arithmetic. The correlation requires N multiplications. The speed factor would be even higher if conversions between binary and residue code were not required. Standard binary coding does not require a binary to residue conversion. Thus the incorporation of fast multiply techniques in the design of a binary arithmetic unit would obtain a substantial speed improvement. On the other hand, the existence of a direct analog to residue code converter would substantially increase the performance of the residue unit. Guffin [26] has proposed a special purpose machine, using a drum 179
HARVEY L. GARNER
storage and residue arithmetic to solve systems of linear equations. The Gauss-Seidel method is modified slightly so that the requirement for division is removed. A quick examination of the method will show that two elements in the same column but different rows of the matrix can be made identical using multiplication rather than division. This change does not alter the basic convergence properties of the algorithm. The proposed special purpose machine obtained one complete iteration of a linear system with 128 unknowns 30 times faster than an IBM 704; 600-kilocyclelogic is used. This is one-half the clock rate of the IBM 704. The speed-up is due to two factors: (1) The special purpose organization in which all of the calculations
for one iteration are accomplished during one rotation of the drum. (2) The fast multiply characteristic of residue arithmetic. Merrill[43] has proposed a modification to the conventional computer so that either conventional arithmetic or residue arithmetic can be performed. This is accomplished by breaking the carry chain at several points so that the number representation is partitioned. The inclusion of end around carry logic permits a partition consisting of k consecutive digits to execute addition modulo 2' - 1. A nonredundant residue number system requires that the residue digit bases must be relatively prime by pairs and the partitions must be selected subject to this condition. Only one of the partitions need not have end around carry correction. The time required for residue addition is, at most, equal to the add time of the largest partition. Residue multiplication is executed as a sequence of additions and shifts with three essential changes in the logic. A t each step in the multiplication process, the conditional addition of the multiplicand to the partial product is determined for each section by the appropriated multiplier digit in that section. Carries, as in addition, do not propagate out of partitions. A left shift of the multiplicand is such that for each left shift the high order digit in any partition is destroyed. These changes in logic are sufficient to yield a single length product partitioned in the same way as the two operands, and the binary number in a given partition of base mj is the modulo mj product of the numbers in the corresponding partition of the operands. Merrill considers a particular partitioning with four partitions consisting of 7, 7 , 6 , and 5 bits. Residue addition is estimated to be three times faster than ordinary addition and multiplication is estimated to require three residue add times. Ordinary multiplication is listed as requiring twelve ordinary add times. The time required for full carry propagation is allowed for in each conventional multiplication step requiring addition. On the average only one-half of the multiplication 180
NUMBER SYSTEMS AND ARITHMETIC
steps require addition. On the basis of the above data, Merrill considers the calculation required to obtain r iterations of the modified Gauss-Seidel algorithm for a linear system of equations with n unknowns for two computers. Computer A can execute both residue and conventional arithmetic. Computer B is capable of executing only conventional arithmetic. The ratio of the computation time for the two computers is
The operations required to convert the coefficients of the linear system from binary to residue code have been included in the count of the operations required by computer A. It is seen that t A / t B < 1 if n > 2. For reasonably large n:
It has been proposed [68] to use residue encoding to reduce the storage required for the evaluation of f ( x ) , a polynomial by table lookup. One standard technique requires that a range value be stored for each domain value. Assume that a residue number system exists which includes both the domain and the range with bases m,, . . , . m,. f( I x I ,J m, must be stored for each distinct class I x I m,. I n general, there exists mi such classes. If the argument is in residue form then n accesses to the residue encoded table obtain the range value expressed in an n digit residue code. This technique obtains a substantial reduction in the number of bits required for the storage table. The reduction of storage is due to the fact that for the standard table x t)f(x) and for the residue table I x I m, t)I f( I x I ,,J I mj. If the average ratio of bits for one residue digit to the bits for the conventional binary representation is k, then k < 1 and the ratio of the bits required for the two systems is:
I
I
x;=l
n
Ic C mi bits (residue code) -~ bits (conventional)
iimi
i-1
(7.5) '
The price paid for the saving in storage is n accesses and perhaps conversions to and from the residue number system. MacLean and Aspinall [40] have proposed a decimal adder using a residue number representation for each decimal digit with bases two and five. 181
HARVEY L. GARNER
A serious study [38] has been conducted to determine if a special purpose computer for missile guidance should use residue or conventional arithmetic. The final decision was in favor of conventional arithmetic because conventional techniques more than adequately met the requirements and, hence, there was no justification for using the more complicated residue techniques. It is reasonable to expect that the development of improved computational techniques involving residue concepts will continue. The machine designer should add the basic concepts of residue coding and arithmetic to his bag of tricks. However, residue arithmetic is not the panacea for all computer design problems. It seems quite improbable that residue arithmetic will ever replace conventional arithmetic logic in the general purpose computer. Augmentation of conventional arithmetic with residue arithmetic appears reasonable and this concept should receive further study. 8. Digit by Digit Computation
It has been the custom to design computers with the basic arithmetic operations of add, subtract, multiply, and divide. However, due to the various advances which have been made in storage devices, microprogramming has become a practical possibility and many present-day small computers use some type of microprogramming control. The methods described in this section obtain for a particular class of functions more rapid computation than can be obtained using the conventional arithmetic structures controlled by subroutines. Microprogramming enhances the possible implementation of these techniques but is not an absolute requirement. The processes can be obtained by extensions of conventional control logic. Finally, the requirements of many special purpose computers, and in particular computers used for coordinate transformations, necessitate the modification of the basic arithmetic structure to obtain efficient logic for the generation of trigonometric functions. 8.1 Pseudo Division and Multiplication
Meggitt [42] has devised algorithms, termed pseudo division and pseudo multiplication, involving modified division and multiplication operations and stored constants, for the computation of log(1 + y/x), tan -1(y/z), G,xev, tan y and xgaon a digit by digit basis. The last three functions are essentially the inverses of the first three and this is reflected in the algorithms by interchanges of pseudo multiplication and pseudo division operations. Figure 6 is the flow diagram for the modified divider. Notice that if the 182
NUMBER SYSTEMS AND ARITHMETIC
set (MI = 0.9x (Precomputed constont 1
I
ot squorr root
I Test i icn
t
i>n
FIG.5 . Flow diagram for modified dividers. Initially ( A ) represents y and ( B ) represents x. The pseudoquotient is obtained in &. This routine is used for (1) division, (2) Part 1 of log[l (y/x)] calculation, (3) Part 1 of tan-'(y/z) calculation, and (4) calculation.
v%
+
183
HARVEY L. GARNER
divide branch is followed, then the flow diagram for conventional decimal restoring division is obtained. The term pseudo division is due to the facility for modifying the divisor by the addition of ( M ) ,the content of a special modification register. Modification occurs after each permissible subtraction of the divisor from the partial dividend. No modification occurs for the restoration step. The different algorithms are characterized by the various operands or constants used for the modification constant, N. The algorithm for log(1 y/x) is a typical example of the pseudo multiplication and division processes. The basic method is due to Briggs, and requires the determination of digits qj such that:
+
y
+x =x
Given qj,j = 0, . . . ,n, then log(1
( +3
log 1
-
=
+
n
1-0
(1
+ lo-j)?.
(8.1)
+ y/x) is obtained from 2 q j log(1 + 10-j). ,:o
The values for log(1 10-j) are stored. The computation required to obtain the sequence of qj is basically an iterative process which is obtained by a pseudo division. The digits qo, . . . , q,, are generated sequentially. Assume that digits qo, . . , qj-l have been determined. Let (a) be the trial value of qj and
.
j-1
(1
+ lo-')%
1
(1
+ 1O-j)a - 1
(8.3)
Then qj = a, 0 5 a, < 10, and a, is chosen to yield the smallest positive y.: As in conventional machine division, a, may be determined by an examination of ya for the sequence a = 0, 1, . . . , 9. It is thus desirable to express y:+l in terms of yi as follows:
y:+l
=
y,j -
lo-'$:
(8.4)
+
xoj lo+:. (8.5) As in conventional division the balance y:+l is reduced by the computation and the scaling required to maintain accuracy is obtained by the substitution =
2:
=
y2101.
(8.6)
The iterative process for the determination of qj is then defined by
184
'-
z:+l
= za
2:+1
=
:x
j
+ lO--'x$
NUMBER SYSTEMS AND ARITHMETIC
Clearly, Yl+l =
Yjp,
xi+l
=
%,f'
zl.1
=
lOZj,.
and therefore
The process is completely defined by the addition of the initial conditions : 2," = y (8.11) xo" =x.
(8.12)
The difference between the pseudo division and ordinary division is due primarily to Eqs. (8.5) and (8.9). If these are modified so that
=x,"+x xa.1
=
(8.13)
x
(8.14)
then Eq. (8.7) defines ordinary restoring division for y/x. Meggitt [42] has studied the requirements of register length, computation time, and computational precision obtained using pseudo multiplication or division processes. These studies were conducted for decimal arithmetic. The techniques, however, are by no means limited to decimal arithmetic. The studies indicate that the input registers in the arithmetic unit normally consisting of n decimal digits must be extended to n + 2 decimal digits. A requirement for 2n 1 decimal digit accumulator exists for some algorithms. This is not a serious problem since accumulator registers are normally double length. The functions log(1 + y/x), tan-l(y/x), and xer can be computed in essentially the time required to execute three decimal multiplications. The computation for tan y requires a time equivalent to four multiplications. The square root algorithm is the conventional algorithm but is included in Meggitt's paper because it is another example of pseudo division. Meggitt suggests that sin B and cos 6 should be obtained using the square root:
+
sin 0
=
cos e
=
(8.15)
JYa;
ya'
(8.16)
The precision obtained by the various algorithms is impressive. The errors are essentially due to round-off and since round-off errors tend 185
HARVEY L. GARNER
to compensate, it is usually the case that the last digit computed has no error. Maximum possible error can be bounded and these results are given in Meggitt's paper. All too often the power of a given computer is measured exclusively in terms of the performance for basic arithmetic operations. Meggitt has shown how relatively small modifications in the basic arithmetic logic can provide significant increases in the computation rate, not because the subroutine is executed faster, but because the subroutine is no longer required. 8.2 The CORDIC Trigonometric Computing Technique
CORDICis the acronym for a special purpose digital domputer used for rotation and vectoring. The basic technique is due to Volder [73]. In the rotation mode, solutions are obtained for
+ x sin 8)
(8.17)
xf = k (xcos 0 - y sin 8).
(8.18)
yf
=
k (y cos 8
Vectoring obtains the solutions for v
=
k d x a + ya
(8.19)
0
=
tan-ly/x.
(8.20)
The computational process for vectoring is essentially the inverse of the process used for rotation. The CORDIC technique is not an incremental technique. Incremental techniques are basic to the operation of digital differential analyzers. The basis for the CORDIC technique is found in the well-known identities: sin(8 do) = sin 8 cos d8 cos 8 sin d8 (8.21)
+ + cos(8 + do) = cos 8 cos d8 - sin 8 sin 40.
(8.22)
These identities also form the basis for the well-known incremental algorithm for the computation of the sine and cosine. Usually A 8 is small and the approximations cos A 8 = 1 and sin 8 = A0 are employed. It is desirable in those cases where d0 is constant to choose A 8 such that sin d8 = T - for ~ base T arithmetic. Multiplication by sin d8 is then obtained by a simple right shift of m digits. The iteration formula for small d8 is:
+ dOvj = sin(8 + do), v)+~ = vj - de uj = cos(e + de), uj+l = uj 186
u,, = sin 8,
(8.23)
cos 8,.
(8.24)
vo
=
NUMBER SYSTEMS AND ARITHMETIC
We now continue the development of the CORDICtechnique. Equations (8.21) and (8.22) are divided by COB Ad to obtain sin(d
")
=
sin d
+ cos 0, tan Ad
(8.26)
'9
=
cos 8 - sin 0, tan Ad.
(8.26)
Ad
COB
cOs(e 4cos Ad
This suggests the iteration:
tj+l
= -%K-= uj
cos A d
+ vj tan Adj
for j
=
0, 1 , .
..
(8.27)
and
-
wi'l
-
"+l
co,' Adj
=
vj - uj tan Adj
(8.28)
j+l
sin do,
u,
=
vo
= COB
uj+, = sin
vj+,
B,,
=
(8.29)
+ 2 AS,)
cos( 6,
(8.30)
t-1
j+l
d
The choice of Ad,
=
=
C Adi. i-1
(8.31)
f 9 0 is required t o obtain va
C Adi > - 180. i=l
180" 2
A minor complication follows since tan 90" avoided by redefining t, and wl:
=
00.
+ v o sin Ad,
t, = u, = u, cos
Ad,
w, = v1
AO, - u, sin Ad,
= v o 00s
(8.32)
This complication is
= =
(8.33)
f vo
(8.34)
uo
and tj+l and wj+,are defined by Eqs. (8.27) and (8.28) for j After n iterations
=
1,
. . . ,n. (8.36) (8.36)
k
=
~ C O Adi S
.
(8.37)
1-l
1 87
HARVEY L. GARNER
Thus the nth iteration yields sin(8 + 8,) and cos(8 corrective factor if
+ 8,) except for a (8.38)
The corrective factor lc is a constant if n is a constant. Thus the sequence represented by Eq. (8.38) must always contain n terms to avoid the necessity of having more than one correction constant. A different statement of this restriction is that for all 8 the computation must consist of exactly n iterations. The sign of Adi has no effect on the corrective constant since the cosine is an even function. The multiplication by tan A Bi for i > 1 is obtained by a right shift of i - 1 digits, if (for binary arithmetic) A8, is such that tan Alli = 2-'+3. If 48, = f 90" and Ad, = f ta11-~2-'+~for i = 2, , . . , n, then the series represented by 8.38 can represent any angle between +180 and -180 with an error less than A 8,. For this set of A B,, (8.39)
The ooding of the angle 8 provides an additional complication. For the CORDICoperation it would be possible to use a representation A, for 8 consisting of n bits having the property: ai = 0
if
Adi 2 0
=l
if
A8,<0
.
(8.40)
fori = 1 , . . , n . However, arithmetic operations such as addition between two angle representations in this code are not possible since the weight of the ith digit is doi = tan-12-Gl. The conversion from A, to a consistently bmed code can be accomplished by addition if stored values are provided for each doi. Conversion may occur concurrently with the iteration process. A 8 is specified initially and is stored in the angle register. Let 8 = 8,. At each step 8j+l = Oj Adj, j = 1, . . . , n. The sign of A8, is chosen opposite to that of Bj so that the magnitude of the number in the angle register is reduced at each iteration step. The algorithm for the generation of sin(8, 8) and the cos(8, 8) becomes a rotation algorithm if the initial conditions are chosen as
+
+
u,
=
r sin 8,
=
y
+
(8.41)
(8.42)
NUMBER SYSTEMS AND ARITHMETIC
After n iterations:
t,
=
+
kr sin(8,
2
doi)
=
kg
(8.43) (8.44)
Vectoring is the inverse of resolving. I n this case, the x and y coordinates are given, and r and 8 are to be determined: u,
=
x
=
r sin 8
(8.45)
V,
=
x
=
r cos
8.
(8.46)
8
+ 8, + C Aei
(8.47)
+ cn Aei
(8.48)
*-1
e + e,
t-1
wn = kr t,
=
0
}
if
2 Aei)
(e + e, + a - 1
=
0.
(8.49)
The argument
(8
+ 0, +
t-1
Aei
1
(8.50)
can be forced to zero if the proper sign is chosen for 8, and each Adi. R and 8 are unknown but R sin 8 = y is given. It is observed that the choice for 8, which transforms all angles to the right half plane is:
8 in quadrants I and I1 then 8,
=
-90"
0 in quadrants 111 and I V then 8, = +goo.
The initial angle 6' is in quadrants I and I1 if y 2 0 and in quadrants I11 or IV if y < 0. The rotation by f90" requires an interchange of x and y with appropriate change in sign. If y 2 0 then -x --f y and y -+ x. If y < 0 then x -+ y and -y -+ x. As a result of the 90" rotation all x 5 0. Let B + Bo = B,, then: sin(81 + COB A8,
=
R sin 8, + R COB 8, tan A8,
=
fx=~tanA8,
= u,c
+ vl tan Ad,
=
t,.
(8.61) 189
HARVEY L. GARNER
Now v1 = x 5 0; hence, the sign of AO, should be chosen opposite to the sign of u1 or t , in order to minimize the magnitude oft,. This choice of sign also insures w, 2 0 since:
w, = vl - u1tan Ad,.
(8.62)
The choice of the sign of AOi opposite to ti a t each step in the iteration process suffices to force t,, to zero and w,, to kr. The CORDIC technique is particularly well suited to serial logic because the required shifts can be obtained from appropriate taps on shift registers used to store wiand ti. A third shift register is required to store 8. Three adders are required: one for the O operations and two adders to determine ti and wi.There is, of course, no reason why this technique cannot be used with parallel logic. If multiple adders are used. the time required for vectoring or resolving is equivalent t o one multiplication time. If only one adder is used, the time required is equivalent to three multiplication times. The basic CORDICtechnique may be extended to include other trigonometric functions. The logic required may also be used to convert between two consistently based number systems. The CORDICtechniques were developed for a special purpose computer but are certainly not restricted to this class of computers. An evaluation of the CORDIC sine and cosine algorithm can be obtained by a detailed comparison of this algorithm with the standard Chebyshev polynomial approximations for a specifia accuracy. The Chebyshev approximations have the same form as the Taylor approximations except that the coefficients are different [27]. It appears that the CORDICtechniques often are more efficient than conventional algorithms and should be given consideration even when subroutine control is used. NOMENCLATURE (IN ORDER OF APPEARANCE)
By definition a equals b a is an element of A a is congruent b modulo m means that a, b, m, and t exist which satisfy a = b mt A is the set containing a, b Digitwise modulo two addition Greatest common diviser of x, y Euler's function = {o, 1, * ., M-1) The factor group AIB Tho map a from A to B The cardinality of set A (the number of elements contained in set A ) a E A corresponds to b E B
+
+
.
NUMBER SYSTEMS AND ARITHMETIC
bla b%a b =IaJEl
b divides a ; a = bq b does not divide a b is the least positive residue of a modulo m ; a = b b<m that 0
El
The largest integer part of -,a = a b If b then a 1 is by definition equal to - 1 The absolute magnitude of a
-
1.1
El
b
+ r; a , b, r
+ mt such
r 0, and T
>b
REFERENCES 1. Aiken, H., and Semon, W., Advanced digital computer logical design.
2.
3. 4. 6. 6.
Wright Air Develop. Center Tech. Rept. No. WADC-TR-59-472, Harvard Univ., Cambridge, Massachusetts, 1959. Arnold, R . F., Linear number systems. Tech. Note No. 04879-8-T, Univ. of Michigan, Ann Arbor, Michigan, 1962. Also to appear in the Journal of the Society of Induetrial and Applied Mathematics. Ashenhurst, R. L., and Metropolis, N., Unnormalized floating point arithmetic. J . Assoc. Computing Machinery 6 , 415-429 (1959). Avizienis, A., Signed-digit number representations for fast parallel arithmetic. I R E Trans. Electron. Computers 10, 389-400 (1961). Avizienis, A., On a flexible implementation of digital computer arithmetic. Intern. Federation Inform. Proceas. Socs. Conf., Munich, 1960. Bedrij, 0.J.,Carry-select adder. I R E Trans. Electron. Computers 11,340-346
(1962). 7. Booth, A. D., A signed binary multiplication technique. Quart. J. Mech. Appl, Math. 4, Pt. 2, 236-240 (1951).
8. Burke, A. W., Goldstine, H. H., and von Neumann, J., Preliminary Discussion of the Logical Design of an Electronic Computing Instrument. Inst. Advanced Study, Princeton Univ., Princeton, New Jersey, 1946. 9. Burla, N., Some logical problems in the design of a high-speed adder for parallel binary digital computers. M.S. Thesis, The Technion, Israel Inst. Technol., Haifa, Israel, 1960. 10. Burtsev, V. S., Accelerating Multiplication and Division Operations in High-speed Digital Computers. Inst. Exact. Mech. and Comp. Tech., Moscow, 1968. (In English). 11. Chamberlin, G. P., The extended-digit number system for high-speed computer arithmetic. M.S. Thesis, Moore School Elec. Eng., Univ. of Pennsylvania, Philadelphia, Pennsylvania, 1962. 12. Cheney, P. W., A digital correlator based on the residue number system. I R E Trans. Electron. Computers 10, 63-70 (1961). 13. Daly, W. G., and Kruy, J. F., A high-speed arithmetic unit using tunnel diodes. IEEE Trans. Electron. Computers 12, 503-511 (1963). 14. Estrin, G., Gilchrist, B., and Pomerene, J., A note on high-speed digital multiplication. I R E Trans. Electron. Computers 6, 140 (1956). 16. Estrin, G., Organization of computer systems: the fixed plus variable structure computer. Proc. Western Joint Computer Conf. pp. 33-40 (1960). 16. Estrin, G., Bussell, B., Turn, R., and Bibb, J., Parallel processing in a restructurable computer system. IEEE Trans. Electron. Computers 12, 747-755 (1963).
191
HARVEY L. GARNER
17. Everett, R. R., and Swain, F. E., Whirlwind I Computer Block Diagrams, Vol. I. Report R-127-1 of the Digital Computer Lab., Mass. Inst. Technol., Cambridge, Massachusetts, 1947. 18. Fraenkel, A. S., The use of index calculus and Mersenne primes for the design of ahigh-speed digital multiplier. J . Assoc.Computing Machinery 8,87-96 (1061). 19. Freiman, C. V., Statistical analysis of certain binary division algorithms. PTOC. I R E 49, 91-103 (1061). 20; Garner, H. L., Finite non-redundant number system weights. Tech. Note No. 04879-6-T, Univ. of Michigan, Ann Arbor, Michigan, 1962. 21. Garner, H. L., Error checking and the structure of binary addition. Ph.D. Thesis, Univ. of Michigan, Ann Arbor, Michigan, 1968. 22. Garner, H. L., The residue number system. I R E Trans. Electron. Computers 8, 140-147 (1969). 23. Garner, H. L., A ring model for the study of multiplication for complement codes. I R E Tram. Electron. Computers 8, 26-30 (1069). 24. Garner, H. L., Arnold, R. F., Benson. B. C., Brookus, C. G., Gonzalez, K., and Rosenburg, D. P., Residue number system for computers. A . F . Systems Command. Aeron. System8 Div. Tech. Rept. No. ASD-TR-61-483, Univ. of Michigan, Ann Arbor, Michigan, 1961. 26. Gilchrist, B., Pomerene, J., and Wong, S. Y., Fast carry logic for digital computers. I R E Tram. Electron. Computers 4, 133-136 (1966). 26. Gu&, R. M., A computer for solving simultaneous equations using the residue number system. I R E Tram. Electron. Computers 11, 164-173 (1962). 27. Hamming, R. W., Numerical Methode for Scientkte and Engineers. McGrawHill, New York, 1962. 28. Heijn, H. J., Representation of switching functions and their applications to computers. Phillips Ree. Rept. 16, Chapters 6 and 7 (1960). 29. Keir, Y. A., Cheney, P. W., and Tannenbaum, M., Division and overflow detection in residue number system. I R E Tram. Electron. Computers 11, 601-607 (1962). 30. Kilburn, T., Edwards, D. B. G., and Aspinall, D., Parallel addition in digital computers, a new faat carry circuit. Proc. IEEE 106, Pt. B, 464-466 (1969). 31. Kilburn, T., Edwards, D. B. G., and Aspinall, D.. A parallel arithmetic unit using a saturated-transistor fast-carry circuit. Proc. IEEE 107, Pt. B, 637684 (1960). 32. Lehman, M., A comparative study of propagation speed-up circuit in binary arithmetic units. Intern. Federation Inform. Proceas.Sow. Conf., Munich, 1960. 33. Lehman, M., The minimization of assimilations in binary carry-storage arithmetic units. IEEE Tram. Electron. Computere 12, 409-410 (1963). 34. Lehman, M., High speed multiplication. IEEE Tram. Electron. Computers 6 , 204-206 (1967). 36. Lehman, M., and Burla, N., Skip techniques for high-speed carry-propagation in binary arithmetic units. I R E Tram. Electron. Computers 10, 691-698 (1961). 36. Lehman, M., and Burla, N., A note on the simultasleous carry generation system for high-speed adders. I R E Trans. Electron. Computers 9, 610 (1960). 37. LeVeque, W. J., A note on complete reeidue systems. Am. Math. Monthly 70, 844-846 (1963). 38. Lockheed Aircraft Corp., Lockheed Missiles and Space Co. Reamch on Automdic Computerllectronice,Vol. 111.System design research,A. F. System Command,Aeron. System Division Tech.Rept. No. RTD-TDR-83-4173,Palo Alto, California, 1063.
192
NUMBER SYSTEMS AND ARITHMETIC
39. Lucal, H. M., Arithmetic operations for digital computers using a modified reflected binary code. I R E Tram. Electron. Computers 8, 449-458 (1959). 40. MacLean, M. A., and Aspinall, D., Decimal adder using a stored addition table. Proc. IEE (London)106B, 129-135 and 144-136 (1958). 41. MacSorley, 0. L., High-speed arithmetic in binary computers. Proc. I R E 49, 67-91 (1961). 42. Meggit, J. E., Pseudo division and pseudo multiplication processes. I B M J. Rea. Develop. 6 , 210-226 (1962). 43. Merrill, R. D., Jr., Improving digital computer performance using residue number theory. IEEE Tram. Electron. Computers l a , 93-101 (1964). 44. Metrolpolis, N., and Ashenhurst, R. L., Significant digit computer arithmetic. I R E Tram. Electron. Computers 7 , 265-267 (1958). 45. Metze, G., A class of binary divisiom yielding minimally represented quotients. I R E Trans. Electron. Computers 11, 761-764 (1962). 46. Metze, G., A study of parallel one's complement arithmetic units with separate carry or borrow storage. Tech. Rept. No. 81, Digital Computer Lab., Univ. of Illinois, Urbana, Illinois, 1957. Also Ph.D. Thesis, Univ. of Illinois,
Urbana, Illinois. 47. Metze, G. and Robertson, J. E., Elimination of carry propagation in digital computers. Proc. Intern. Conf. Inform. Processing, Paria, 1959. UNESCO/NS/ ICIP/G.2.10, pp. 389-396 (1959). 48. Mitchell, J. N., Jr., Computer multiplication and division using binary logarithms. I R E Tram. Electron. Computers 11, 512-517 (1962). 49. Morgan, C. P., and Jarvis, D. B., Transistor logic using current switching and
50. 51.
52. 53.
routing techniques and its application to a faat-carry propagation adder. Proc. IEEE 106, Pt. B, 467-468 (1959). Nadler, M., A high speed electronic arithmetic unit for automatic computing machines. Actu Technica (Czech. A d . Sci.) 6 , 464-478 (1956). Radio Corporation of America, Aerospace Systems Division, Modular arithmetic computer research. A. F. Systems Command, Aeron. Systems Division Tech. Rept. No. AL-TDR-64-86,Dayton, Ohio, 1964. Rm, T. R. N., The general properties of finite weighted number systems. Ph.D. Thesis, Univ. of Michigan, Ann Arbor, Michigan, 1964. Reitwiesner, G. W., Binary arithmetic. In Advan. Computers 1, 232-308
(1960). 54. Reitwiesner, G. W., The determination of carry propagation length for binary addition. IRE Tram. Electron. Computers 9, 35-38 (1960). 55. Robertson, J. E., Redundant number systems for digital computer arithmetic. Eng. Summer Conf. Notes on Topics in the Design of Digital Computing Machines, Univ. of Michigan, Ann Arbor, Michigan, 1959. 56. Robertson, J. E., A new class of digital division methods. I R E Tram. Electron. Computers 7 , 218-222 (1958). 57. Robertson, J. E., Theory of computer arithmetic employed in the design of the new computer at the University of Illinois. Eng. Summer Conf. Notes on the Theory of Computing Machine Design, Univ. of Michigan, Ann Arbor, Michigan, 1960. Also File No. 319, Digital Computer Lab., Univ. of Illinois, Urbana, Illinois, 1960. 58. Robertson, J. E., Two's complement multiplication in binary parallel digital computers. I R E Tram. Electron. Computers 4, 118-119 (1955). 59. Rozenberg, D. P., Algebraic properties of residue number systems. Ph.D. Thesis,Department of Electrical Engineering, Univ. of Michigan, Ann Arbor, Michigan, 196 1.
193
HARVEY L. GARNER
60. Salter, F., High-speed transistorized adder for a digital computer. I R E Trans. Electron. Computers 9,461-464 (1960). 61. Sear, E., Kilomegacycle tunnel-diode logic circuits. Intern. Solid-state Circuita Conf., Philadelphia, 1962. 62. Sklansky, J., An evaluation of several two-summan binary adders. I R E Tram. Electron. Computere 9, 213-226 (1960). 63. Sklamky, J.,Conditional sum addition logic. I R E Tram. Electron. Computers 9,226-231 (1960). 64. Smith, J. L., and Weinberger, A., Shortcut multiplication for binary digital computer. Natl. Bur. St. (U.S.), Circ. 691,Sect. 1, 13-22. 65. Svoboda, A., The numerical system of residual classes in mathematical machines. Intern. Conf. Inform. Proceaeing, Park, 1959. UNESCO/NS/ICIP/ G.2.10, pp. 419422 (1959). 66. Svoboda, A., and Valach, M., Rational numerical system of residual classes. In Stroje ru1 Zpracowcinl Informacl, Sbornik V, pp. 9-37. Nakl. CSAV, Praha, 1957 (in English). 67. Szabo, N. S.,Sign detection in nonredundant residue system. I R E Trans. Electron. Computere 11, 494-500 (1962). 68. Szabo, N. S.,and Tanaka, R. I., Report on residue (modular) arithmetic survey. Tech. Rept. No. AF 33 (657)8777 for (Aerospace System Division, Dayton, Ohio), Lockheed Aircraft Corp., Lockheed Missiles and Space Co., Palo Alto, California, 1963. 69. Takahasi, H., and Ishibaahi, Y., A new method for “exact calculation” by a digital computer. Inform. Processing Japan 1, 28-42 (1961) 70. Taub, A. H., Gillies, D. B., Meager, R. E., Muller, D. E., McKay, R. W., Nash, J. P., and Robertson, J. E., On the designof avery high-speed computer. Rept. No. 80,Digital Computer Lab., Univ. of Illinois, Urbana, Illinois, 1957. 71. Tocher, K. D.,Techniques of multiplication and division for automatic binary computers. Quart. J . Mech. Appl. Math. 11, 364-348 (1968). 72. Valach, M., Vznik kodu a Eiselnb soustavy zbytkovjtch t?id (Origin of the code and number system of residual classes). In Stroje na Zpacowdnl Injormack, SbornEk 111. Nakl. CSAV, Praha, 1956. Translation: Liaison Office, Technical Information Center MCLTD, Wright-Patterson Air Force Bwe, Ohio, F-TS-l0126/V. 73. Volder, J. E.,The CORDICItrigonometric computing technique. I R E Tram. Electron. Computers 8, 330-334 (1959). 74. Weinberger, A., and Smith, J. L., A one microsecond adder using megacycle circuitry. I R E Trans. Electron. Computere 5 , 67-73 (1956). 75. Wilson, J. B.,and Ledley, R. S., An algorith for rapid binary division. I R E Tram. Electron. Computere 10, 662-670 (1961).
194
Considerations on Man versus Machines for Space Probing P. L. BARGELLINIS The Moore School of Electrical Engineering, University of Pennsylvania, Philadelphia, Pennsylvania
. 196 . 197 204 . 206 . 208 216 . 218
1. Introduction 2. Human and Machine Intelligence 3. Problem Definition in Engineering Terms . . 4. Summary of Information Handled by Man and Machines . 6. Information Capacity of the Human Channel; Acoustic and Visual Stimuli . 6. Somesthetic Communication . 7. Data Processing by Machines . 8. Comparison of the Bit Rate in Manned and Mechanized System. . 9. Considerations on the Communication Links . . 10. Possible Solutions and Recommendations 11. Conclusion . . . Acknowledgments . Bibliography
.
.
221 222 224 226 226 226
1. Introduction
I n the hostile environment of space, man needs very elaborate lifesupporting equipment which includes: an oxygen supply for breathing, food for sustainment, means to remove or to process the waste products of metabolism, temperature and humidity controls, protection against radiation, etc. Precautions must be taken to guarantee that, for the entire duration of a mission, conditions adverse to the sustainment of life or leading to possible physical damage, or which might impair the astronaut’s performance, be avoided. Especially during the re-entry period into the atmosphere in a nondestructive return to earth, a fundamental characteristic of manned missions, the above conditions become highly critical and difficult to achieve. As the weight of the astronaut and of the life-supporting equipment requires considerable extra power and weight in the booster rocket, it is +This article waa written when the author wm with Aerospace Corporation, Systems Research and Planning Division, Lo8 Angeles, California, while on leave of absence from the University of Pennsylvania.
195
P. L. BARGELLINI
of the utmost importance to determine whether a human observer could really accomplish more in the course of a space mission than certain unattended instruments which would not require the life-supporting equipment nor should they weigh aa much as the man plus life-supporting equipment combination. Clearly, unless the mission objectives as well aa the details of the unattended instruments are defined and analyzed, the above questions cannot be settled. The purpose of this study is to investigate some of these questions and to offer tentative suggestions and recommendations on the basis of the evaluation of the special conditions arising in manned space missions. The main objective of these missions is to set up manned or unmanned orbital vehicles capable of gathering information about ground, sea, air, and space activities of another power in times of peace or war. Any conclusion reached in this manner would not be directly applicable to different types of space missions, such as lunar or planetary explorations, etc., although some of the established criteria could be extended with modifications to those missions. In spite of certain statements to the contrary, it seems rather unwise to consider man’s principal function in space to be that of performing maintenance and repair roles. If these were the principal reasons to have man in space, one might well say that the original engineering planning of the mission was poor. Reliability should be assured first by proper design with sufficiently reliable components, streamlining, and simplification to the utmost and by the introduction of adequate redundancy. Certainly the presence of man and his unique capabilities should be exploited fully to permit equipment simplification where practicable. After having eliminated the dull chores of maintenance and repairs, the capability of man as a space observer should be the subject of close scrutiny. In many caaes, what is important is not direct visual observation but rather indirect observation of the outputs of instruments activated by sensors of various kinds. It is generally this type of observation which would help to reach decisions of fundamental importance not only in space, but ultimately at ground bases. Thus, on earth or in space, man is bound to rely heavily on physical instruments. Man, obviously limited when restricted to the use of his senses alone, acquires a totally different order of capability when he appropriately uses certain instruments. I n a full symbiosis of man and instruments, the question is where to place the man in the link: on board or on the ground with communication links providing remote eyes and hands as required in each case, and with what types of instruments? In this part of the analysis, besides the question of human eyes versus instruments, the most important question concerns the definition of the 196
MAN VERSUS MACHINE FOR SPACE PROBING
communication problems and, in particular, the evaluation of the delays encountered in each system. In what follows, the argument will be reviewed in its philosophical nature first; recommendations on the basis of possible engineering compromises as required in certain space missions will be offered later. 2. Human and Machine Intelligence
It has been pointed out that in the history of human civilization it is possible to distinguish three major phases related to three specific processes: (a)the process of transforming and handling matter, ( b ) the process of transforming and handling energy, and ( c ) the process of transforming and handling information. The man versus machine argument is an old one that flared up again after the invention and widespread use of large, high-speed digital computers. Many studies on this fascinating subject exist; however, although they invariably start in a seemingly objective manner, they soon show a tendency to degenerate into various kinds of pseudophilosophical discussion. There seems to be little ground for controversy when the marked inferiority of man is pointed out whenever he is called to confront machines in any of the first two of the three above mentioned processes. Indeed, man’s ego is not hurt when, for instance, a locomotive is watched pulling a heavy train over a grade or, similarly, when an electric power generating station is visited. As a matter of fact, modern man is rather thankful not to be one of the thousand slaves who might be called to replace a locomotive, a crane, or some other machine. Setting at about 75 watts the average power that a man can generate with his own muscular force, the same amount of power can be identified with that of a machine-slave or robot. I n this manner a quantitative feeling is obtained about the extremely large number of slaves which each modern man has at his disposal; e.g., one finds that: 1 vacuum cleaner = 1 robot 1 power mower = 40 robots 1 motorcycle = 100 robots 1 automobile (European) = 400 robots 1 automobile (American) = 2000 robots
All this immense benefit without, of course, the headaches that the handling of human slaves would certainly generate. Clearly there were very few robots before the first industrial revolution; thus, a very interesting result is obtained by comparing the growth of the world’s population with the growth of robots throughout time. Prior to the invention of the steam engine and, later, of other forms of heat machine, 197
P. L. BARGELLINI
the only existing robots could be identified in certain crude forms of water wheels, windmills, and especially sailing vessels. I n this manner Table I is obtained. TABLEI GROWTH OF ROBOTS Year A.D.
1600
1700
0.4 10-6
10-4
1800
1900
0.9
1.6
1960
~~
Population ( x 100) Robots ( x 100)
0.6
10-1
1
3.0 16.0
Perhaps the most thought-provoking result from Table I is that the world’s population now should be about 6 times as great as it is if slaves were to be used in lieu of robots. In regard to the third phase of the human civilization mentioned before, i.e., the transformation and handling of information, although a similar approach to the problem is possible, the subject is soon found to be highly controversial for a number of reasons. Several measures are available to document the tremendous expansion of the communications process. Since the major interest in connection with this study is centered around electrical communication and information processing by electrical means, some measure of the progress may be obtained by considering the increase throughout the years of the number of telephones in the world. This analysis is outlined in Table I1 in which the world’s population figures are given again. TABLEI1 GROWTHor TELEPHONES Year A.D. Population ( x loo) Telephones ( x lo6)
1900 1.6 1
1930
1960
2.0
3.0 90
30
From Tables I and I1it appears that while the world’s population has increased in recent years around 1 % a year, robots and telephones have undergone a fourfold increase. Telephones are not, of course, the only means of communication in the world, other types of electrical and also nonelectrical communication, e.g. the press, oould be taken as examples to illustrate the information explosion. The tremendous expansion of the press and the diffusion of printed material, the appearance of mass communication systems, such as aural and visual broadcasting, are well-known events. Yet none of these 198
MAN VERSUS MACHINE FOR SPACE PROBING
events is looked at with worry or suspicion; as a matter of fact they are generally regarded as capable of bringing ever increasing benefits to mankind. However, as soon as the attention is shifted to the fairly recent appearance and simply fantastic development of the electronic computer, excitement and marvel are generated, accompanied by preoccupation with possible dire repercussions in the immediate future (automation) and even worse fears for the more distant future. High-speed digital computers came into existence during the latter part of and shortly after World War 11; after the first electromechanical machine of 1944, all-electronic computers were conceived and built, capable of performing with great accuracy and at high speeds huge amounts of arithmetical operations. The best way to appreciate the computer explosion is to consider Table 111,which gives an idea of the growth of the number of machines and of the financial size of the computer market. TABLEI11 GROWTHOB COMPUTERS Year A.D. Number of Computers Financial volume growth ( $1
1940
0 0
1950
1960
10
3000
25 x 10'
0.5 x 100
Table I11 illustrates the fact that a new industry has been created running into billions of dollars as a result of the invention of the high-speed digital computer; this industry did not exist twenty years ago. Although computers can be regarded as processors of information, it is difficult to establish a single measure of their information capacity; in effect, speed of operation and memory size are the two fundamental quantities on the basis of which a figure of computer information capacity may be obtained. Yet there is no agreement on this subject on account of the many possible different definitions and interpretations of such a measure, and also because of its dependence upon other factors than speed of arithmetical operations and memory capacity. It is well-known that both speed of operation and memory capacity have increased by several orders of magnitude from the early computers of the late forties to present-day computers. On the other hand, tremendous advances have been accomplished in the volume density of components; first, by moving from the electromechanical computer to the all-electronic version with vacuum tubes, and more recently, going over to solid state circuits with transistors and diodes, and, finally, integrated circuits all the way into molecular electronics. 199
P. L. BARGELLINI
A feeling of elation is experienced by man when he is relieved from the drudgery of time-consuming straight arithmetical computations. For instance, the addition of ten thousand decimal numbers, each with ten figures, which takes place in less than 1 second in a modern computer, would require about 4 days of work of 8 hours each by a single computing clerk, aside from the fact that, while the computer’s answer can be made arbitrarily precise by design of the machine, the computation by man may always contain an error for which additional time and effort should be spent to locate it. Huge amounts of bookkeeping operations, such as those required by the complex banking and financial world, are today routinely carried out by computers; furthermore, scientific problems requiring vast amounts of computations can now be attacked because modern computers are available. The same problems had previously been formulated but not solved because computers were unavailable. All these facts are well-known and seemingly indicative of a better state of things for mankind. Sooner or later, however, the feelings of elation become mixed with other feelings of a different nature: first, it is clear that presentday computers applied to many industrial fields are responsible now, and will be responsible to a greater extent in the future, for the displacement of masses of workers. This is the problem of automation with its deep social and political implications; yet, it is not the ultimate problem with which one is confronted when a decision has to be made between a man and machine for the performance of a given task. The situation becomes embarrassing to man because the challenge is not directed toward his ability to perform certain routine tasks; the attack now, or in the near future, seems to be directed against his supreme asset of possessing an intelligence which until recent times had not been challenged. The evolution of the computer seems to definitely endanger the so far, unique, position of man in the world. A problem of this size and depth cannot be resolved within the obvious limits of a single paper, yet the presentation of certain considerations may be worthwhile in order to understand what the problem is with reference to specific situations. Two schools of thought exist today which can be identified with extremely opposed points of view. One school insists on the superiority (absolute, spiritual, etc.) of man because of an assumed exclusive capability of his, among all living creatures and, afortiori, among all nonliving things taken either singly or in large aggregates to carry out thought processes. The other school claims that, on the contrary, it is possible to conceive machines with various degrees of intelligent behavior i.e., automata capable of carrying out processes attributed until recent times exclusively to man. The fact that very strong limitations still exist today to the 200
MAN VERSUS MACHINE FOR SPACE PROBING
construction of such machines is not a fundamental objection, but simply an indication of a state-of-the-art condition bound to advance with time. The nervous system, for instance, has been described as a field assembly of nerve cells interconnected by fibres along which electrical impulses travel from sensor areas to cells or groups of cells which react to the electrical impulses or to certain chemical reactions produced by them. Eventually the electrical impulses along the nerve cells, or their physicochemical counterparts, reach the decision areas of the brain where the recognition of the stimulus pattern occurs. As a consequence of these causes and actions, reactions take place resulting in a flow of commands from the brain to the motor parts of the body in accordance with what is deemed to be at any specific time the “optimum fit” response of the individual to the environment. Since some of these actions appear to follow an on-off pattern typical of the decision-making and memory elements of a digital computer, an analogy between the brain and the computer was attempted on the hypothetical grounds of mathematical models common to both. It was on the basis of such tenuous reasoning that computers were called “giant brains” shortly after their spectacular debut; the cliche caught up with the popular imagination, misled as it was by an ill-informed and distorting press. It has always been clear to the expert that present-day computers are not in the least giant brains; neither in the sheer size of their memories nor in their actions upon instruction by man along a fixed pattern or, at best, a limited number of possible different patterns. It is also clear today that the brain is not that simple a machine, that it is not exclusively binary or digital in its operation, and as a result of recent studies, it has been hinted that the nervous system might well operate along mathematical processes still undiscovered by man. By the time the brain regained its stature, so to speak, different and far more complex models of it were proposed and analyzed. Then new machines were invented, different and more complex than digital computers, with the specific intent of performing those functions typical, if not of the human brain, at least of the nervous system of lower animals. The succeses obtained so far in this field are considerable; therefore, the man versus machine argument came up again for a more serious and deeper analysis. In the light of the recent advances, it seems no longer possible to consider these discussions as a repetition of the machinist versus vitalist clashes of the seventeenth century. Truly, history repeats itself; but, in addition, an upward evolutionary trend accompanies its cyclic repetitions; Leibnitz’s speculations about reasoning machines are surely closer to reality today than in his time. 201
P. L. BARGELLINI
Up to this point, research on machine or artificial intelligence has been presented as if it were generated exclusively by simulation of human information processing activities. Along this line of thinking, computers are simpler tools whose value and potential should be assessed in terms of the human cognitive ability. Hence, there exists the need for a more precise understanding of this ability and of all the activities related to it (physical, physiological, and psychic). It should be pointed out, however, that a different approach has been vigorously followed by other researchers interested in obtaining machines capable of intelligent behavior for their own sake, i.e., independently of their possible similarity to the human nervous system and its hypothetical cognitive activities. The end results and goals are clearly the same for the two schools, but the means used throughout the intermediate steps are quite different. The former approach is often designated with words such as bionics, neural net models, perceptrons, e t a ; the latter approach is designated instead with words such as complex cognitive models, machine bearing, artificial intelligence, etc. The question of real importance to engineers is then: where do we stand today? It is known and accepted by both the vitalist and the machinist schools that existing computing machines by far outperform man in speed of computation and capability of producing error-free results, It is equally known and generally accepted that machines can outperform man at certain simple games, especially if their human opponent is not clever at remembering the specific rules of the game and the various resulting possibilities. Certainly a human being feels a bit awkward after having been beaten by a machine in a nim or tic-tactoe game; at chess the outcomes are still debatable but the possibility exists of man being beaten by a machine. On the other hand, man feels reassured to find out that, at least for the time being, there are many cases of information processes in which he can out-perform present-day computers even of the most advanced type. This situation is encountered in pattern recognition, language translation, learning processes, decision processes such as medical diagnoses, legal judgments, etc. There is little argument between the two schools about the present state of the art and certain specific results. There is no agreement, however, on the interpretation of what is expected to come next; here the twisting of the lines of reasoning becomes very involved. The respective “reasons” will not and cannot be repeated here; suffice it to say, that a typical argument given by the supporters of man is: “Machines can think but they cannot think creatively.” Most probably this type of reasoning leads nowhere; it is perhaps reminiscent of certain Middle Ages disputes aimed at the definition of the sex of the angels. 202
MAN VERSUS MACHINE FOR SPACE PROBING
The accent then is shifted to the definition of “thought”, probably an equally elusive concept. It has been said that the human being is the only computer produced by amateurs; aside from theological questions, and in spite of its facetiousness, the statement has a rather precise meaning. A basic question could be: would such a “computer,” after years of instruction and maturity and by coupling his efforts with those of similar “computers” having passed as he did through ages of evolution, generate a “computer” or perhaps a ‘(super computer” different from those that amateurs have been generating since the appearance of life (human or animal) on earth? Would it be possible to give definite form to the Greek myth of an all-knowing Athena being born from Zeus’s brain? Furthermore, would such an entity be capable of reproducing itself with or without the possibility of added evolution? Also, once the gift of evolution were bestowed upon the “machine,” would it be conceivable that its rate of evolution might become greater than that of its creator?The argument now is no longer centeredabout thought and intelligence, it is slanted rather on living versus dead matter. Of course, people show a ready contempt for the latter, devoid as it is of the “mystery” of life, but perhaps many people should be reminded that a spiritualist of modern times, Fr. Teilhard de Chardin, has contributed with the deepest thoughts to the ennoblement of “dead” matter. Today the question is perhaps clearer than ever: man, having succeeded in harnessing matter and energy and having also succeeded in directing and controlling large amounts of information, is afraid in all cases of one thing, i.e., of the eventuality of not being capable of remaining forever in full command of the situation. Whether mankind might be wiped out in a total atomic holocaust or might fall prey to supercomputers remains a question of consequences differing only by shades of detail but ultimate in their nature. One is unavoidably confronted with philosophical arguments that do not seem a t first to help in the solution of engineering problems. However, besides the intense fascination emanating from philosophical questions, engineers can learn much by reflecting on them. For the specific purposes of space missions, it seems safe to assert today that, given the present still very incomplete understanding of the nervous system of man and considering the existing realization of neuron-type networks and systems which, although exciting, remain very crude, it is out of the question to regard their introduction in any part of the mission, i.e. either in space or on the ground, even as a partial substitute for man. However interesting and thought-provoking certain accomplishments of bionics may be, it is clear that a long time must pass before man’s senses and brain will be replaced, even partially, by neural machines. 203
P. L. BARGELLINI
Similar conditions exist with machine intelligence of the nonbionic type, although the progress in the field of cognitive models theory, i.e. intelligence obtained independently of attempts to simulate the processes in the brain is unquestionably more striking. The reasona for the advances here are probably due to the earlier beginning of this type of approach and its mathematical abstractness, which requires little or no work across boundaries of different scientific disciplines. From the early work on the general problem-solving computer programs, game programs, and question-answering machines to more recent research on elementary perceivers-memorizers, geometric theorems proving computers, and heuristic programs for baseball playing, investment portfolios, etc., a clear pattern of continuous advance emerges. But the hardware employed (type of computer), the program preparation phase, and the overall time to reach decisions are disproportionate (i.e., too big, long, expensive, etc.) in relation to the results so far achieved. In conclusion, although forecasts of this kind are always difficult and dangerous, after consideration of the most optimistic statements and hopes of workers in this field, the point of view should be reconfirmed that, for space missions, computer intelligence either of the bionic type or of the cognitive model type is inapplicable today, and will not become a competitor to human intelligence before something like five to ten years from now or more depending on the desired degree of sophistication in the function to be performed. 3.
Problem Definition in Engineering Terms
In the study of complex systems, one encounters manifold combinations of machines and men without well-defined boundaries. A t a given stage of human evolution and for the corresponding state of the art of machines, all of which have been invented by man at least until now, it is important to recognize the fundamental fact that there cannot be unique solutions in the symbiosis of man and machines. On the contrary, there are always different solutions characterized by various degrees of participation by man in the function of the machine, and vice versa. It may be worthwhile to give a few examples. In the field of transportation, man’s role in steering a vehicle or a convoy, while essential on today’s roads, is unnecessary when the steering is automated by means of rails; tomorrow even on roads the function of steering may be taken away from man if a suitable guidance system is incorporated in the road and the vehicles. Experimental versions of driverless cars have been demonstrated and to bring systems of this kind into common practice and usage is mostly a question of economics. 204
MAN VERSUS MACHINE FOR SPACE PROBING
In the same field, man’s participation in the matching of the torque versus speed characterists of an engine to the motions of a vehicle, as dictated by road conditions, traffic patterns, and other constraints, is markedly different according to whether an automatic transmission or a manually operated gearbox is installed in the vehicle. Clearly, while many solutions exist at a given time, certain solutions considered acceptable one day will be replaced by different ones at a future time as a consequence of further engineering advances. In the field of air transportation, automatic piloting and navigation, including take-offs and landings, offer another outstanding example of the evolutionary trends. In the field of communications and information handling, there are many examples of different combinations of man and machines; computer programming and usage are probably an outstanding example in which several highly qualified men feed information into a machine or receive information from it. I n the simpler case of a high-speed telegraph communication channel, one may find several operators reading out coded tapes and transferring the information via ordinary typewriters on sheets of paper like telegram forms, etc.; yet the entire system can also be automated by the use of teletype machines and so on. Another example is encountered in the telephone exchange with all its possible degrees of automatization, and it is appropriate indeed to recall that the first digital computer was electro-mechanical and resembled very markedly an automatic telephone exchange. Engineers are of course very proud of their achievements and mankind in general has derived considerable advantage from the advances of their art. By and large, one should expect that information processing of all kinds will tend toward greater and greater amounts of mechanization whereby information will be acquisitioned with greater reliability than ever before. Space missions with their complex requirements will be no exception to the above mentioned trend. As the art advances, man, the ultimate user of information, will eliminate himself as much as possible from any intermediate function which can in principle be mechanized. Yet a distinction should be made between a trend and the existing state of the art. Recommendations in favor of or against the use of man as operator in any kind of mission, and consequently inclusive of all space missions, should be based on the following steps: (a) RBsum6 of notions concerning information handling by man or machine (present state-of-the-art conditions) ( 6 ) Definition of the steady-state information capacity of the human channel (c) Definition of the inputs available to that channel with some reflections on the human channel transient input characteristics 205
P. L. BARGELLlNl
( d ) Review of information handling by machine (present state of the art) ( e ) Comparison of the results from points ( b ) and ( d ) as above (f) Aspects of reconnaissance space missions (9) Human and instrumental capabilities in ground reconnaissance from space (h) Considerations on the communication links ( i ) Outline of possible solutions and recommendations The emphasis of the present state-of-the-art conditions should be noted; consequently, any recommendation eventually received should be understood to depend on the above mentioned conditions. Future systems that will be different as a result of the invention of more sophisticated machinery cannot be discussed. 4. Summary of Information Handled by Man and Machines
Certain statements will be listed in what follows based upon what is known and generally accepted today The following advantages can be mentioned in favor of man as an information processor: (a) Availability of a variety of input-output devices ( b ) Memory capacity higher by several orders of magnitude than those of the most advanced machines (c) Logical structure and sophistication superior to those of any existing (or conceivable in the near future) computer ( d ) Self-programming characteristics (e) Adaptability characteristics (f) Self-repair characteristics, physical and nonphysical, conscious or unconscious, with consequent reliability unknown in machines On the contrary, man is handicapped in comparison t o the machine for the following reasons: ( a ) Need of motivation to achieve maximum efficiency ( b ) Tendency to physical fatigue and psychics1 distraction (c) Unreliability in the mechanism of memory accession ( d ) Slowness of processing and decision-making (although compensated by the large memory and adaptability characteristics) ( e ) Inability to make available for the purposes of recording, checking, and error correcting any of the intermediate steps (because of man’s fundamental ignorance or unconsciousness about what is going on inside himself) (f) Lack of exactitude in the output format with possible misinterpretation whenever additional information processing is required It is natural to think of combining the above characteristics in an optimum man-and-machine symbiosis with the intent of overcoming 206
M A N VERSUS MACHINE FOR SPACE PROBING
the disadvantages and exploiting the advantages of each in the best manner. This process is already taking place in several instances and with various degrees of sophistication; the question remains of defining what the optimum should be in a specific situation. Although some thought has been given to this area, much remains to be covered; assuredly man is so self-conscious and self-centered that he seldom thinks of changing himself for the benefit of machines. Most machine design is done for the purpose of serving man and not vice versa; yet, in spite of the fact that ultimately the information should be made available to man, the possibility exists of investigating whether it might pay off to adapt man to machines, at least in some measure. This technique is practiced, for instance, in the training of man to perform specific functions with the help of machines: in the field of information processing, the most outstanding example is possibly the introduction of artificial languages intended for computer use; in other fields the training occurs of more intuitive reactions, e.g., when learning to drive an automobile or to fly an airplane. Our input sensors can be better educated : high-speed reading techniques are an example of these possibilities; greater accuracy and speed can be gained in operating certain forms of computers, such as slide rules or similar devices, by sheer training. Thought has been given also to the possibility of using computers to improve man’s thinking; this possibility is founded on the recognized fact that, if the unused neurons of the nervous system could be brought into play, man’s intelligence could be multiplied by a factor of the order of a few tens. Since it is known that only about 4% of the neurons in the brain are normally used, the other neurons could be asked to take a more active role with the aid of computers. It has of course, been suggested that symbiosis methods be used not only between man and machine, but also between certain animals (dolphins) and machines, as well as between man and animals. The problem is always one of communication, and dreams regarded as wild today may become reality tomorrow; what is needed is a better understanding of nature (the brain) in order to design not just extremely fast adding machines as present-day computers really are. When the fundamental processes which control the acceptance and manipulation of information as they occur in the brain are better understood, it may be possible to reduce them to machine form capable of superior performance of a totally different order of magnitude from the present ones. In the meantime, of course,present machines are instrumental in helping to solve some of the problems that arise in the postulation of very complex models of the brain. This is about all one can say today. One feels the beginning of a new era and the forms of the present-manmachine combinations are by necessity very crude and simple. 207
P. L. BARGELLINI
5. Information Capacity of the Human Channel; Acoustic and Visual Stimuli
The uncertainty (entropy) function: I)
H(4
= -
CPi 1% Pi
(-1
was proposed by Claude Shannon as a logical measure of the average information output per symbol from a source characterized by s distinct and mutually independent symbols with individual probabilities pi. The definition can be extended to sources whose symbols are subject to conditional probabilities as well as to individual probabilities. When the logarithm is taken with the base 2, the unit of the information measure is the bit per symbol; when symbols are chosen with a certain rhythm, the above quantity may be modified to express information rate, i.e., average information content per unit time (e.g., bits/second). As a communication system involves a source and a link (channel) connecting the source to an ultimate destination, the capability of the system to handle information depends not only on the source output but also on constraints in the channel which may modify the original uncertainty (information) function or rate. In order to clarify the concept of capacity for a human channel, three definitions of channel capacity encountered in information theory will be briefly reviewed. The first definition applies to the case of a codechannel with fixed constraints:
C = Zim log N(T) bits/sec ~ + m
T
where N ( T )is the number of possible signals of length T as allowed by the use of a code characterized by fixed constraints (e.g., Morse code). This definition is sometimes referred to as that giving the capacity of a noiseless channel. The second definition applies to noisy channels in which statistical regularity is maintained by the source of information and by the noise; channel capacity can then be expressed as the maximum value of a difference between two terms describing statistical averages related respectively to the a priori and a posteriori uncertainties, or formally:
C = max [ H ( z )- H(z/y)] bits/sec where H ( z ) is the source information output rate, and H(z/y) is the equivocation rate. The difference by itself, prior to its maximization, is the transinformation, a quantity always greater then zero except when the equivocation, which is a measure of the information loss due 208
MAN VERSUS MACHINE FOR SPACE PROBING
to the noise in the channel, equals the source information itself; then no information transfer occurs between the source and the destination because the noise wipes out entirely the source output information. Channel capacity can be reached, at least theoretically, when the source information is coded in such a way as to counteract the noise in the channel in an optimum manner; then, if the source output is not increased beyond channel capacity, information is transferred faultlessly from source to destination in spite of the noise in the channel. However, if the source output information is increased beyond channel capacity, the excess information is lost because of the appearance of an amount of equivocation which equals exactly the excess information fed to the channel by the source. The third definition of channel capacity relates to the transmission of electrical signals of band width B over a noisy channel with additive Gaussian noise of average power N provided the signals of average power S are optimally coded:
C
=
B lg(1
+ S / N ) bitslsec
Channel capacity is therefore either a measure of the information transferred in a noiseless situation, where the fixed constraints of the code channel limit the information flow, or a measure of the information in a noisy situation after coding has been introduced to that degree of sophistication, and no less, which would compensate for the noise, provided channel capacity is not exceeded. In practice none of these conditions takes place; the definition based on a noiseless channel has limited value because all physical channels are noisy; second, no practical coding method is known which allows complete counteraction of the effects of the noise. Finally, even were ideal coding feasible (i.e., acceptable in terms of the very complex equipment required), it would fall short of its goal as soon as the statistics of the noise changed from an originally assumed model. Therefore, the practical engineering meaning of channel capacity is that of a transinformation rate accompanied by a measure of residual uncertainty in the received message; this uncertainty may be expressed in terms of error probability for the case of digital transmissions, or in terms of some convenient measure of loss of fidelity for the case of afialog transmissions. The human channel is no exception to these rules; as a physical channel it is noisy and, although evolution may have provided highly complex codes for certain forms of human communications (e.g.,speech), the possibility of errors always remains. Hence, the second definition of channel capacity is applicable, provided the definition is interpreted in the sense explained above which involves a transinformation rate with a sufficiently but not vanishingly small probability of error. 209
P. L. BARGELLlNl
The five senses of man-hearing, sight, smell, taste, and touch-can be regarded as specific mechanisms which make him receptive to certain stimuli. All five senses are important to the life of the individual and the conservation of the species, but the f i s t two play obviously a predominant role in human communication, although in other animals the situation may be different. It is interesting to note that only for aural communications is man provided with an active transmitter as well as a receiver; in visual communication, on the contrary, man has no active transmitter at his disposal. All he can do is to modulate by certain actions of his, such as motions, gestures, writing, painting, etc., the energy of an external light source. Truly, a similar action can be exerted on sound sources, such as in music playing, but the voice production mechanism of man is a unique transmitter, and it is interesting to point out the absence of a corresponding visual transmitter. Some measure of the human channel capacity has been obtained by monitoring speech communication and reading processes. With plain language equally well-known to the speaker and to the listener, speech communication occurs normally a t rates between 70 and 260 words per minute. The measure of the information capacity can be obtained either directly in terms of the characteristics of the sound waves and their information-carrying elements (formants), or more readily, although indirectly, on the basis of the word or letter information content for the language used. The two approaches lead significantly to similar results; assuming that the average English word contains five letters and attributing to each letter an average information content around two bits, with the above mentioned word rates, one finds that the lower and upper bounds for the human channel capacity obtained in this manner are, respectively, 10 and 30 bits/sec. I n considering writing upon dictation, one obtains bounds which are widely different because of the physical channel cascaded with the human channel. By physical channel is meant here the type of writing instrument, the material on which writing is done, and last, not least, the code used. Clearly, writing by hand on hard stone with hammer and chisel does not lead to high rates; however, as soon as pencil and paper or a good typewriter are used, the rates increase considerably. As far as the code used for writing is concerned, going from ordinary writing to shorthand, or from an ordinary typewriter to a stenotype machine, makes it possible to increase the rate even further. Using ordinary script, rates not higher than 4-6 bits/sec (about 30 words/min) are possible; but, with shorthand, it is possible to achieve rates around 20 bits/sec (100 words/min) with training or even higher rates after special training. I n reading, the situation is similar and the effective information rate depends primarily on training as well as on whether or not it is re210
MAN VERSUS MACHINE FOR SPACE PROBING
quested to pronounce the read material loudly. Experimentally, bounds between 100 and 1000 words/min were found, corresponding to information rates of 20 and 200 bitslsec. The latter figure must be taken with great caution because of the difficulty in measuring objectively the information truly assimilated in high-speed reading experiments; thus the lower bound seems to have a greater significance. Additional experiments on aural perception have been performed by having professional piano players play musical notes taken from random sequences. I n these experiments, channel capacities of slightly over 200 bits/sec were found. Investigations in the field of visual communication have been performed with the intent of measuring “Gestalt,” or perception of form, and to relate the Gestalt concept to binary selections. In tests, after a certain steady picture had been presented to several observers, a different picture was flashed for a controllable short time after which the first picture reappeared. The pictures were presented on a cathode ray tube, and, by using TV standards, the flashed picture could be shown for a precisely determined time interval corresponding to one or several frames durations. The observers’ impressions were recorded immediately after the experiments. The results indicated that, with exposure time equal to or less than \s sec, the observers had been aware of the flashing but could not tell what they had seen. Increasing the exposure time resulted in certain central details being recognized first and eventually further details by means of either direct observation or inference from previous observations. In this manner, a measure of speed of perception only is obtained. To relate speed of perception to channel capacity, pictures of objects were used or contours corresponding to certain picturable nouns of the basic English alphabet of 1000 words. Under the hypothetical assumption that the recognition of an object is equivalent to the assimilation of 10 bits of information ( g log, IOOO), the information rate was found to be between 20 and 40 bits/sec. The assumption just mentioned is highly questionable; in effect the recognition time would vary considerably, depending on the number of the objects depicted on a single slide as well as on the position of the objects and their nature. Training of military personnel for ship and airplane identification indicates that, especially after some time, the capability of a human observer to recognize certain contours and silhouettes becomes very sharpened. Exposures as short as 1/600 sec are sufficient for a trained observer to recognize given patterns. Therefore, setting the human visual channel capacity a t around 50 bitslsec seems questionable. Another interesting possibility, which confirms the above viewpoint, is offeredby subliminal perception. Possibly a more significant approach 211
P. L. BARGELLINI
can be obtained in terms of the minimum number of elementary areas, their brightness, and their time sequence in a TV transmission. Here the original information is usually converted from a two-dimensional spatial representation to single-valued time functions; in black and white picture transmission, the space or time- dependent function is light intensity (luminance signal); in color transmission, two additional functions must be used (chrominance signals). The band width required depends essentially on the desired detail and the time sequence of individual frames; if motion has to be preserved, one frame must follow another in a manner to exploit the phenomenon of retinal persistance of vision. With reference to the currently used television standards in the U.S.band width, requirements are up to 4-6 Mc. Although all current TV systems make use of an analog.signa1, it is possible to quantize the luminance and the chrominance signals; little is known about the effects of quantizing chrominance information, but a considerable amount of data on quantized black and white T V is available. The results are interesting in the sense that the eye has been found to be quite tolerant of large amounts of quantization noise; for instance, black and white TV pictures with as few as 16 luminance levels are quite acceptable. I n black and white TV, band width is given by: B = K +An8N where K = utilization factor, A = aspect ratio, n = number of lines, and NJ = number of frames/sec. When reference is made to single pictures, the frame sequence frequency factor NJ may be set equal to unity and band width is determined by the detail requirements only. When the linear resolution is expressed in terms of the ratio between the total picture linear size and the minimum linear size to be resolved, the number of elementary areas in a square picture is the above ratio squared, and the maximum frequency is at least one half the number of elementary areas (for unity aspect ratio and unity utilization factor.) For N , elementary areas and a half-tone reproduction (i.e., two values of luminance) the number of possible patterns is clearly:
P, = 2N" In this case, therefore, the information in bits acquired after a picture has been recognized is the number of elementary areas itself if all the patterns are equally probable. For pictures with s possible levels of luminance in each area and N , elementary areas, the number of possible patterns is: P, = sNa 212
MAN VERSUS MACHINE FOR SPACE PROBING
Therefore the information acquired when a specific pattern is recognized (assumingagain that all patterns are equally probable) is:
As mentioned previously, experiments with quantized black and white pictures have indicated that s = 16 results in a situation quite acceptable to the human eye. Thus, Table I V is indicative of the conditions encountered in various forms of TV service (for unity aspect ratio). TABLEIV CONDITIONS ENCOUNTERED IN VARIOUS FORMSOF TV SERVICE
Definition
Very high High Medium Low
Number Number of Information Band width of elementary content per (with lines areas (aspect frame (16 levels) 30 framee/sec) ratio = 1) 1000 500 180
1,000,000 250,000 32,400
60
3,600
4 x 106bite 1 x lo6 bits 0.1296 x lo6 bits 14.4 x 10' bits
15 x 1Oecps 3.75 x 106cps 0.486 x lo6 cps 64 x lo* cps
The above results apply only to the case of equi-probable patterns for all possible picture configurations. Since ordinary pictures exhibit a large amount of redundancy, the possibility of band width compression exists. Many studies are found on this subject and, although TV band width compression techniques have not been, and probably will not be adopted in broadcasting for practical and economic reasons, the above mentioned possibility remains attractive for special applications. Very recent studies have confirmed the possibility of compressing TV signal band width by factors varying from 6 to about 17; these numbers give an indication of the amount of redundancy as it occurs in images; yet, even taking redundancy into account, Table IV could be cited to support the common saying that a picture is worth a thousand words. In aural communication the effective amount of information can be described in terms of two parameters : band width and distinguishable levels of sound intensity. Both quantities are significant for all sorts of sound transmissions where the human ear is the ultimate receiver; therefore, the following statements are valid for speech communication or other types of sound transmission. In speech transmission, percentage articulation is decreased by reducing the band width as well as by introducing quantization noise, when the sound is converted from its 213
P. L. BARGELLINI
original analog form to a multilevel digital form. After a process of conversion back to analog form, the remaining distortion is designated as quantization noise because of its statistical nature. Numerous tests have indicated that with a band width of the order of 4 kc, human speech is perfectly acceptable, i.e., intelligible and Gestalt-recognizable when not less than 7 binary digits are used, i.e., 128 levels in the quantization process. In this manner, the input information rate is:
R
= 2Blog,s = 8000
x 7
= 66,000
bitslsec
Therefore, the ratio of the input information rate to the human channel capacity for speech: 66*ooo ~- - 1.86 x lo9 30
is a rather large quantity for which some justification must be found. What interpretation can one give to this ratio and its high value? A large amount of work has been done in this area and part of the work is directed toward band width compression schemes for speech communication. With fairly simple schemes, band width compression ratios between 4 and 4 can be reached while, with considerably more sophisticated approaches, compression ratios between & and & are possible (Vocoders). In all these schemes, however, a Gestalt loss is encountered, accompanied by a decrease in articulation, especially under unfavorable SIN conditions. Certain forms of nonlinear distortion like that due to amplitude clipping can be applied to speech without impairing intelligibility too seriously, yet a reduction of Gestalt content always occurs, i.e., the naturalness of speech is lost. All arguments seem therefore to lead to the conclusion that human speech communication is indeed a remarkable system; the natural transmitter (the vocal apparatus) and the natural receiver (the human ear) and their complex characteristics have been studied in great detail, but truly no complete, satisfactory theory is yet available because very little is known about what happens between the ear and the brain. The most astounding feature is that the system is capable of performing at a rather modest information rate (say 30 bitslsec) using a band width much greater than the minimum required on the basis of the above mentioned information rate. It is now hinted that speech is a spreadspectrum communication system; all talkers and listeners alike use the audio spectrum between, say 40 and 4000 cps to exchange information which is basically characterized by “control signals,” operating on the vocal chords, the oral and nasal cavities, whose spectrum is much narrower than 4000 cps. Whenever a listener is affected by more than 214
MAN VERSUS MACHINE FOR SPACE PROBING
one transmitter (talker), it is the energy of one specific transmitter which constitutes the signal, all the other transmitters which occupy the same channel contributing only to noise. Yet, even under such adverse conditions, communication is possible although the intelligibility is reduced in proportion to the decrease in SIN. It is interesting to find out what the value of SIN should be for an ideal (Shannon) communication system. As a first approximation, this can be done from the expression:
C
=
B log2(l
+S/N)
using the value of 30 bitslsec for C, and 4000 cps for B ; the value of SIN is 5.2 x or about -23 db. Experiments have shown that, although normally satisfactory human speech communication (with intelligibility not less than 85% requires something around + 10 db for SIN, communication is still possible at SIN = - 5 with intelligibility of randomly chosen words up to 50%) and with recognition of a speaker’s talk in plain language at SIN of - 10 db. The discrepancy between the above mentioned values of SIN resulting from experiments and the value computed from Shannon’s equation can be tentatively explained in terms of the nonideal coding occurring in speech against random noise generated either from a noise source or from the superposition of many speech waves, as well as in terms of additional secondary circumstances. In speech as well as in other forms of sound communications, the significant quantity is the steady flow of information from transmitter to receiver which is measured by the human channel capacity and amounts to only a few tens of bitslsec. Although the spectrum of the signal in the physical channel between the speaker’s mouth and the listener’s ear occupies 4000 cycles, the effective information spectrum covers a few tens of cycles only. I n effect the speech wave as produced by the vocal apparatus should be regarded as a burst-type, or controlled, carrier wave modulated in frequency as well as in amplitude by the very low frequency motions of certain parts of the voice apparatus itself. In this manner the original information band width is spread over a much wider spectrum, enhancing reception to an adequate receiver (the ear and the brain) even under adverse signal-to-noise ratio conditions. In visual perception, the situation is different; as noted previously, there is no transmitter to begin with; all one can do is to modulate with gestures, by writing, painting, operating switches, etc., the energy coming from the outside world and to observe visually scenes in which the modulation is represented by natural as well as man-made effects. Until recently, light energy was available exclusively in the form of 215
P. L. BARGELLINI
incoherent radiation and it is usually this radiation which is utilized to transmit pictures from one point to another. The advent of lasers has opened new vistas and possibilities of incalculable effect. I n visual perception, although the eye and its associated mechanism display a certain inertia to picture changes, it is clear that the amount of information gathered in the process of recognition of a given picture is quite high whenever the pictures to be recognized differ substantially from (i.e., are sufficiently uncorrelated with) those previously shown. The quantitative results indicated at the end of the previous section should be kept well in mind in order to understand and appreciate the capabilities of a human observer. In space applications, aural and espeOially visual channels will maintain their fundamental importance. After special training, superior performance will be obtained and the use of convenient codes end displays should help to reach optimum conditions. 6. Somesthetic Communication
Human ears and eyes are highly developed sensory organs; thanks to their complex actions and relations to the brain, man gathers the greater part of the information about his environment. Ultimately, however, a point is reached where the two senses are saturated and no further increase in channel capacity can be expected from them; by then making use of the remaining sensory channels of a human being, it is possible to increwe the total information rate. Apart from the condition of possible saturation and overload of the visual and aural channels, other conditions might exist which would limit or even reduce to nil the information rate through sight and sound organs. Especially when the external physical channel itself might be lacking or the individual’s senses might be adversely affected either temporarily or permanently (e.g., blindness and deafness), the availability of additional communication channels is very useful. Although the usage of the senses of smell or taste appears to be limited to the evaluation by trained experts of the taste of certain beverages or foods and the smell of certain odors, it is possible to think of communication in these manners among humans; yet no serious attempt is known to have taken place in either direction. On the other hand, when tactile sensations are considered, aside from literary flights such as Aldous Huxley ’s “feelies” and some crude tactile communications (in some not so ancient horse-drawn buses, passengers would signify their intention of getting off at the next stop by pulling a rope running above their heads along the vehicle and attached to one of the driver’s legs!), the possibility of using tactile sense either directly or 216
MAN VERSUS MACHINE FOR SPACE PROBING
indirectly via kinesthetic sensations is very real and could be used for practical purposes. (The kinesthetic sense involves the recognition of position, of active and passive movement, of resistance to movement, etc., arising from sensations in muscles, tendons, joints, and on the skin.) In this field a certain amount of work has been done and some progress is noticeable; conventionally, the two senses are designated under the single adjective “somesthetic”-consequently, whenever tactile or kinesthetic Sensations are used for communication purposes, one refers to a somesthetic communication system. The two senses upon which somesthetic communication is based react to a wide range of stimuli: thermal, mechanical, electrical, and chemical or some combination thereof. Communication can thus be established with the active participation of the subject or simply by his passive reaction; for instance, in Braille reading, information is acquired by active participation of the reader through the motions of the fingers across the special alphabet’s dots, while the excitation of the skin of a subject, such as might occur in over-the-skin writing or in similar cases, constitutes an example of passive participation. Various methods using vibrations to supply information directly through the skin have been suggested and used; in the case of a direct excitation the information rates have been found to be invariably low: cutaneous communication using Morse code has been found possible, but when one exciter only is used the rate will not exceed a few words per minute. By using multiple exciters a t different locations on the subject’s body, the rate has been pushed up to around 40 words/min. Thermal excitations involve large time constants and consequently low information rates; electrical excitations are critical because of the relatively small gap between the minimum excitation necessary for communication and the excitation resulting in pain to the subject. In conclusion, one might say at this time that optimum methods of stimulating the somesthetic senses are still imperfectly defined, and add that very little is known about the “optimum” coding for this form of communication. In spite of the above statements, it seems possible that an additional information capacity could be ultimately gained of the same order of magnitude (a few tens of bits/sec) as that naturally offered by aural communication. In space missions, where it is essential to maximize the function of a human operator, i t may be advisable to make use of somesthetic techniques to relieve the visual and the aural senses of the operator. After all, it should be realized that such techniques might be regarded as a sophisticated extension of the time honored manner of flying by the seat of one’s pants. The sense of olfaction offers additional communication channels to 217
P. L. BARGELLlNl
man. Aside from the well-known sensitivity and alertness of electrical engineers to the smell of burnt insulation, serious attempts to investigate this channel have recently been noted. Again it is conceivable to use this sense, or channel, for specific purposes in space missions. 7. Data Processing by Machines
Information handling will be discussed here exclusively from the point of view of electronic data processing; i.e., information is supposed to be fed to machines in some predesigned form or code, assimilated by them, and then processed, i.e., sorted out for possible destination to different addresses, combined with other information, changed in form or code, stored, and eventually delivered. Any form of “intelligence” to be identified in any of these processes will be limited to built-in feedback actions, error detection, and correction techniques. From early telegraph systems to modern electronic data processing machines, the gap is certainly wide but only on a quantitative scale. It is convenient to assess it in terms of transmission speed (rate of information transmission) and probability of the errors which may take place during the transmission. From the slow data rate of early line and radiotelegraph systems (20-30 words/min) the introduction of wide band radio and wire circuits made possible higher and higher speeds; thus a good H.F. telegraph circuit can handle traffic a t speeds up to 200-300 words/min, while the character error probability remains 10- 3 or less under favourable propagation conditions. I n coaxial cable circuits as well as in microwave line-of-sight or troposcatter circuits, the availability of wider band widths and the reduction of the propagation vagaries permit the reaching of even higher rates (thousands of words per minute). Eventually, the capacity of the channel may be found to exceed the rate of the source; it is at this point that a distinction is usually drawn between single human sources and sinks of information (person-to-person communication) and human group communication (machine-to-machine communication). It is well-known that electromechanical teletype machines are capable of carrying out an entirely satisfactory man-to-man type of communication where the rate is essentially limited to the source and destination ultimate information rate. This fact is true with either telegraph messages or with digitalizedvoice transmission; however, as soon as the need to establish communication between groups of men arises, the physical limitations of the electromechanical teletype are readily evident. Then the next advance was in the form of the all-electronic teletype machine with a transmission speed of perhaps 10-20 times that of its electromechanical counterpart. 218
MAN VERSUS MACHINE FOR SPACE PROBING
Today the need to establish direct machine-to-machine communication links between computers located at different points has created a new interest in data transmission and set new goals of performance for them. I n the meantime, the potential possibilities of digital transmission per se have been duly recognized (signal regeneration a t intermediate repeaters, introduction of man-made redundancy for error detection and correction, capability of adaptation to secure forms of communication, sorting and combination of messages, storage, etc.). With regard to speed of transmission, present-day electronic data processing (EDP) equipment operates in the neighborhood of a couple of thousand bits/sec, and advanced systems are in the planning stage for speeds between 10,000 and 50,000 bits/sec. Thus in terms of words per unit time, present systems operate a t around 10,000 words/min while future systems will operate between 100,000 and 500,000 words/ min. I n this connection it may be of interest to recall that in 1949 an ultrahigh-speed facsimile system had been effectively instrumented (ULTRAFAX by RCA). ULTRAFAX operated a t 100 lineslinch and was capable of transmitting printed material a t rates between 1,000,000 and 1,500,000 wordslmin, the system employed electronic analog techniques largely borrowed from TV practice, and used photographic film and FM modulation and detection of a rather conventional form. To develop the proper feeling for what can be accomplished with speeds of transmission in these ranges, it should be realized that it was possible with ULTRAFAX to transmit some 500 pages of printed material (i.e., a rather large book) in about one minute. Since the system was instrumented with no regard to the statistics of the message, the bit content per letter should be set equal to about five; then for the above mentioned operating range of 1,000,000-1,500,000 words/min, the bit rate goes up to around 416,000-584,000 bits/sec, still 10 times higher than the EDP systems of the not too distant future. ULTRAFAX was of course wasteful of band width, since standard FM techniques were used with no special effort to economize in this sense; the R.F. channel band width was around 4-6 Mc. The system has been mentioned here because of the need to emphasize the large potential content of TV-like pictures and the need for wide bands whenever such pictures must be transmitted over radio channels. The truly outstanding characteristic of modern electronic data processing is not the speed but rather the extremely low bit error probability obtainable. A bit error probability of 10- regarded as acceptable for a conventional telegraph channel, is totally unacceptable in machine-to-machine operation; here one strives for a t least something around 10-6, or better 10-7. An ultimate bound (perhaps around 10-9) can be determined by the permissible time outages 219
P. L. BARGELLlNl
for channel maintenance rather than by the electrical noise of the circuit, With the still higher speeds of transmissions deemed necessary in the future, the fundamental question remains that of determining how fast one can transmit in a given channel. An answer to this question formulated in terms of Shannon’s channel capacity has a theoretical significance only, and nothing more; an answer in engineering terms is dictated rather by the amount of instrumentation acceptable at the transmitting and receiving ends in either the noiseless or the noisy case. In the noiseless case, channel capacity (i.e., speed or rate of transmission) is potentially infinite even when the original channel band width is finite. In effect, for both idealized (Le., constant amplitude and time-delay characteristics over a finite band width) and “real” but noiseless channels, the rate of transmission may be arbitrarily increased, provided adequate equalization techniques are introduced in the frequency domain. In either case, intersymbol interference can be reduced and ultimately eliminated; the circumstance that the original band width has been thus expanded is irrelevant to anyone willing to accept the complexity of the added equipment. In practice, however, i.e., for “real” but still noiseless channels, the rate of transmission is adjusted sufficiently below the so-called Nyquist rate, 2B, where B is a somewhat arbitrarily defined band width. I n the noisy case, which is truly the only real one, the limit is set up by modulation-demodulation and/or coding-decoding equipment complexity and cost, on the one hand, and data rate with a small yet finite error probability in the received message, on the other. The situation is often further complicated by the presence of nonadditive noise (time-varying channels) and by multipath conditions which contribute to the reduction of the transmission rate. Practical values of rates of transmitting information with a sufficiently small error probability are today, and will probably remain in the future, a fraction of the theoretical channel capacity. With transmissions in the microwave portion of the electromagnetic spectrum, the ultimate practical values of the data rate will not exceed a few tens of megabits per second for signal-to-noise ratios between 1 and 10. At lower signal-to-noise ratios the data rate goes down rapidly, and at higher signal-to-noise ratios the data rate goes up slowly on aocount of the logarithmic function. Therefore, it must be concluded that higher rates will be possible only after techniques utilizing the optical part of the spectrum are fully developed, provided of course that channels are available in terms of the specific situation involving propagation factors, beam acquisition, tracking, etc. 220
MAN VERSUS MACHINE FOR SPACE PROBING
8. Comparison of the Bit Rate in Manned and Mechanized Systems
It is clear from the previously stated facts that, with respect to speed of transmission and raw data processing, mechanized systems outperform manned systems by several orders of magnitude. The ratio of the bit rate for the two systems may fall anywhere between 10 and 106; therefore, any judgment based on transmission speeds seems only to lead toward the elimination of man as a continuous intermediate active element of a link. The question remains open, however, with regard to the two following possibilities: (1) should a man be assigned the function of an overseer mainly with the intent of reducing failures or intervening after their occurrence, or (2) should a man be used as a decision element capable of adjusting the transmission rate to the amount of unexpectedness and novelty in the collected information? With reference to the first possibility, it should be noted that when a mechanized system operates at its full capacity the overseer is an idle figure whose action after possible failures is no better than what could be achieved by automatic control devices. After the occurrence of a failure, it is hard to conceive any form of human intervention which could not have been carried out by means of built-in redundant subsystems or parts becoming active either automatically or on ground command. With reference to the second possibility, it is obvious that, in order to act as a controller of the data flow, the human operator should somehow acquire himself the knowledge of the data; clearly, then, it might be desirable t o operate the system a t a variable data rate. When some control action on the data rate by the human operator is desirable, two alternatives exist. It is possible to have direct reconnaissance by the human eye through magnifying devices, or t o have indirect reconnaissance whereby the human eye is called upon to identify differences and cues from photographic, radar, or infrared data having gone through some processing before the human observation takes place. In the first case, a decision is reached in the brain of the observer with the help of instruments but with the inherent capacity of the human channel; in the second case which involves processing by instruments before human observation, the decision to be reached eventually by the observer will be characterized by different values of acquisition time, resolving power, and time delay. In both cases, the final judgment will be in the hands of a man, or more probably a group of men, located at some remote command post; thus the local human observer would assume the function of gathering information and relaying it aa effectively as possible to the ground. On the other hand, when the local 221
P. L. BARGELLINI
observer is given the freedom to reject or to choose part of the collected information, the possibility arises of losing forever the data which might have been helpful in reaching a final decision. Acquisition time, resolving power, and decision time delay are important fundamental quantities which need to be redefined and clarified in order to evaluate their separate and combined interplay effects on the final goals of the mission; such clarifications and definitions are required before any attempt is made to reach a conclusion in regard to the possible superiority of one system over another. The above mentioned fundamental quantities will be better assessed with reference to the definition of the specific functions of reconnaissance in a general sense which follows in the next section, 9. Considerations on the Communication Links
For communications involving space-to-space as well as space-toground links, the radiofrequency spectrum between 100 Mc and 30 kMc can be used advantageously on account of the characteristics of the channels, inclusive of the external noise, and of the available hardware (transmitters, antennas, receivers, etc.). Frequencies higher than 30 kMc, either in the millimeter wave region or in the infrared or visible light region, will possibly play an important role in space-to-space communications but most probably a very limited one in space-toground communications on account of the exceedingly high transmission losses in the atmosphere. Furthermore serious hardware problems are still encountered at millimeter waves; hence it seems safe to concentrate the attention on the microwave spectrum only. In terms of the figures discussed in the previous section, it appears that for any link between two stations the potential availability is of the order of three thousand separate (i.e., time-or frequency-multiplexed) oneway channels with practical data rates up to 1 megabit/sec in each channel, provided adequate signal-to-noise ratios are established. There are clearly circumstances working against this hypothetical situation: first of all, since the microwave spectrum is not for the exclusive use of space communications, a serious interference problem is expected between space and ground systems involving communications. radar, navigation aids, broadcasting, etc. In practice, channel capacity can be obtained at microwave frequencies for the transmission of data at a rate up to a few megabitslsec, provided line-of-sightconditions and adequate signal-to-noiseratios are established either directly or by means of relay stations from one terminal station to another. Needless to say, the above statements should not be interpreted in the sense that more than ample channel 222
MAN VERSUS MACHINE FOR SPACE PROBING
capacity is available for each space-to-ground, or, even worse, space-tospace link. Here the state-of-the-art limitations are heavily felt; it is true that Telstar, Relay, and Syncom have been successful,but aside from the fact that they were all experiments and not operational systems, the maximum channel capacity never exceeded about 3 megabitslsec with one intermediate relay station only and with both terminal stations on the ground, i.e., with high power transmitters, large antennas, and highly sophisticated receivers. Communications between space vehicles have been accomplished so far to very limited extents, possibly the only example of communications of this type has occurred when the Russians had two astronauts on different space vehicles; from the scanty information available it seems that voice contact was established over ranges of a few tens of miles. This fact is, of course, easily confirmed by a simple check computation based on reasonable hypothetical transmitter power, antenna and receiver characteristics, and type of modulation. I n all the above cases the information rates were rather small, i.e., a few tens of bitslsec for the beaconing or homing applications and also in voice communications. The problem of satellite-to-satellite communications, basically simple in terms of the fundamental physical parameters involved, is far from solved today. For conditions in which amounts of data have to be reliably transmitted from one satellite to other satellite relay stations, and eventually to the ground, the problems of transmitter power, receiver noise, antenna mechanics, station acquisition and tracking, immunity to jamming, and many others remain formidable. With low- and medium-altitude orbiting satellites the problem of maintaining radio contact from one satellite to another is particularly difficult on account of the requirements for mechanically or electronically steerable antennas of sufficient gain. These requirements are of two kinds: size of the antennas and means of aiming them in the proper direction, which for the above mentioned case is continuously variable, and transmitter power limited by presently available power supplies. With synchronous satellites the antenna problems are somewhat eased but the transmitter power limitations still remain; as a matter of fact they are more heavily felt on account of the greater range involved. Furthermore, since surveillance has to be done by means of low orbiting vehicles, the problems of transmitting the information gathered by such vehicles to the synchronous satellite relays remain the same as mentioned above. Hence, i t should be pointed out that relaying data from one space station to the ground via other stations is still a thing of the future. To evaluate band width requirements, i t should be realized that with 223
P. L. BARGELLlNl
12,000 TV scanning lines the number of elementary areas would be 144 x 106 per picture; even with as few as 16 levels of light intensity,
the number of bits per picture is 676 x lo6. Clearly, such a rate requires R.F. band widths not available today either in space or on the ground. 10. Possible Solutions and Recommendations
In view of man’s supreme ability a t pattern recognition, the following statements may be in order: (a)Man in space would be capable of reducing sources of the random errors in the pointing of infrared devices, etc. (b) Man in space would decide, better than any presently known instrument, whether to use payload sensors, thus reaching optimum performance for specific conditions. (c) Man in space would monitor the output of various receivers and sensors. ( d ) Man in space would be able to conduct simultaneous scientific experiments, take measurements in parallel with the specific operational observations. The nonroutine sequence or coincidence of events could thus be noted for immediate or future action. (e) Man in space would be particularly useful in reducing information, extracting those rare and most precious bits which could then be transmitted to ground bases in a format requiring band widths feasible within the present state of the art. In this phase of the operation, man in space appears to have little or no competition from instruments. (f)Man in space could, to a certain degree, adapt himself and the mission with its many complex devices to new situations which may arise. These situations might involve: checking and adjusting the instrumentation on board for optimum performance; priority-type decision in case of unforeseen events, etc. Aside from the extra payload and the precautions to be taken in order to decrease the astronaut’s risks, the most serious problem is that of finding out much more precisely how long man can endure permanence in space. Man has already conquered space, first with inanimate probes, then with probes carrying higher and higher forms of life up to the day on which man himself was hurled into space. Although there are and will be space missions that can be planned for unmanned operation, the trend for complex, highly sophisticated functions seems to be in the direction of manned missions, especially for cislunar flights. The extra weight of the astronaut, or astronauts, and of the lifesupporting equipment will not change for a given time of permanence in space. Rocket capability is on the increase, on the other hand; con224
MAN VERSUS MACHINE FOR SPACE PROBING
sequently, the payload situation will be eased in the future. The question is, of course, whether the additional payload should be human plus life-supporting equipment or instruments. 11. Conclusion
Although it would be presumptuous to attempt specific predictions about complex man-machine systems, it is clear that much additional work across boundary lines between diverse fields such as biophysics, communication theory, and computer engineering is required. It appears highly recommendable to expand laboratory simulation work in which most of the environmental conditions expected in the course of a space mission could be duplicated and controlled. Clearly definite areas of superiority of the machine or of man are identifiable. Some of these areas have been discussed; others need additional investigation. Yet what is urgently needed is not sheer accumulation of data concerning separate entities, but rather the establishment of sound conceptual models of entire systems: I n the meantime, i.e., in the imminence of other and more sophisticated manned space missions, i t is unavoidable that the final answer to specific points may well be provided by testing “in vivo.” ACKNOWLEDGMENTS The author is grateful to several members of the technical staff at Aerospace Corporation who helped him in various phases of his work. The support and guidance of Dr. J. B. Woodford, Jr. are especially acknowledged. BIBLIOGRAPHY 1. Abramson, N., Braverman, D., and Sebestyen, G., Pattern recognition and machine learning. IEEE Tram. Information Theory 9, No. 4, 267 (1963). and Joos, G., Wisaemchfttliche Photographie. &ad. 2. Angerer, E. Verlagsges. Leipzig, 1969. 3. Bennett, E., Degan, J., and Spiegel, J., Human Factor8 in Technology. McGraw-Hill, New York, 1963. 4. Bernard, E. E. and Kase, M.R., Biological Prototype8 and Synthetic Syatema, Vol. 1. Plenum Press, New York, 1962. 6. Bond, D. S. and Duke, V. J. Ultrafax. RCA Rev. 10, No. 1, 99 (1949). 0. Culvertson, J. T., The Mind8 of Robota. Univ. of Illinois Press, Urbana, Illinois, 1963. 7. Eden, M., Human information processing, IEEE Tram. Information Themy 9, No. 4, 263 (1963). 8. Feigenbaum, E. A., Artificial intelligence research. IEEE Trans. Information Theory 9, No. 4, 248 (1963). 9. Gavenman, E. K. and Amerio, V. C., Applications and feasibility of
v.,
intersatellite communication links. Nat. Symp. Space E‘lectronica Telemetry, Miami Beach, Florida, Sept. 1962 p. 2.2. PGSET Record, I R E .
225
P. L. BARGELLlNl
10. Gill, A., Pattern recognition. I R E T T ~ M Information . Theory, 7 , No. 3, 133 (1961). 11. Golcty, M., The bio-morphic development of electronics. Proc. I R E 50, No. 5, 628 (1962). 12. Haller, G. L., Our state of mind in 2012 A.D. PTOC.I R E , 60, No. 5, 624 (1962). 13. Hawkins, J. K. and Munsey, C. J., A natural image computer, Proc. ONR Symp. on Optical Proc~Singof Information, Wmhington, D.C., Oct. 1962 p. 233. Spartan Books, Baltimore, 1963. 14. Herscher, M. B., et al., Investigation of image analysis techniques using neuron networks. RCA Defense Electron Prod. (Appl. Ree. Interim Eng. Rep. No. 2, AF-33 (616) 7734, (1962). 15. Holahan, J. Aperture Radar Space Aeronautics 40, No. 6, p. 88, Nov. 1963. 16. Hopkins, C. O., Determination of human operator functions in a manned vehicle. I R E Trans. Human Factore Electron, 1, No. 2, 45-55 (1960). 17. Infrared Issue, PTOC. I R E 47, No. 9, Sept. (1969). 18. Kelly, D. H., Information capacity of a single retinal channel, I R E Trans. Information Theory, 8, No. 3, 221 (1962). 19. Krulee, G. K., Information theory and man-machine systems. Operations Rea. SOC.Am. 2,320-328 (1964). 20. Levine, D. Radargrammetry. McGraw-Hill, New York, 1960. 21. Licklider, J. C. R., Man-computer symbiosis. I R E Trane. Human Factore Electron, I, No. 1, 4-11 (1960). 22. McCormick, E. J., Human Engineering. McGraw-Hill, New York, 1957. 23. Maron, M. E., Design principles for an intelligent machine. I R E Trane. Information Theory, 8 , No. 5 , 179 (1962). 24. Mechanization of Thought Proceaees, Vols. I and I1 (Nat. Phys. Lab. Symp. No. 10, 1958). H.M.S.O. London, 1959. 26. Minsky, M., A selected descriptor, Indexed bibliography to the literature on artificial intelligence. I R E Trans. Human Factore Electron, 2, No. 1, 39-55 (1961). 26. Montgomery, W. D. and Broome, P. W., Spatial filtering. J. Opt. SOC.A m . 6 2 , No. 11, 1259 (1962). 27. Perry, W. Scouting battlefields of tomorrow. Electronics, 33, 40, Nov. 18 (1960). 28. Povejsil, D. J. et al, Airborne Radar. Van Nostrand, Princeton, New Jersey, 1961. 29. Price, J. R. and Karlin, J. E., Reading rates and information rates of a human channel. Bell System Tech. J . 24, No. 2, 493 (1957). 30. Quastler, H. Studies of human channel capacity. Proc. 3rd Symp. Information Theory, London 1965, p. 361, Academic Press,New York, 1966. 31. Ronchi, V., Optics N.Y.U. Pmse, New York, 1957. 32. Rosenblatt, F., Principles of Neurodynamh. Spartan Books, Washington, D.C., 1962. 33. Rosenfeld, A., Eliminating the human observer. Electronics, 36, No. 43, 45 (1962). 34. Second I n t e r n d i o d Congreae on CybeTneth, LiBge, 1958. Proceedings publd. by Assoc. Intern. Cybernetique, 1960. 35. Singer, J. R., Information theory and the human visual system. J . Opt. SOC.Am. 49, 369 (1959). 36. Skolnik, M. I., ~ntroductionto RadaT Systems. McGraw-Hill, New York, 1962.
226
MAN VERSUS MACHINE FOR SPACE PROBING
37. Sziklai, G. C., Some studies in the speed of visual perception. I R E Tram. Information Theory ( P B I T ) ,2 , 125 (1956). 38. Tufts College, Handbook of Human Engineering Data, 2nd ed. U.S. Naval Training Device Center, Port Washington, L.I., New York, 1960. 39. Turing, A. M., Can a machine think? in The World of Mathematics. (James R. Newman, ed.). Vol. 4, p.2070. Simon & Schuster, New York, 1956. 40. von Neumann, J. The general and logical theory of automata, in The World of Mathematics. (James R. Newman, ed.). Vol. 4, p.2099. Simon & Schuster, New York, 1956. 41. Wells, H. N., Jr., Global surveillance system, System requirement study 178, Technical feasibility. Systems Eng. O&e, Directorate of Labs. Final Rep., Proj. 7987, Task 17607. Wright-Air Development Division, WrightPatterson AFB, Ohio, 1960. 42. Woodson, W. E., Human Engineering h i d e for Equipment Designers. Univ. of California Press, Berkeley, California, 1960.
227
This Page Intentionally Left Blank
Data Collection and Reduction for Nuclear Particle Trace Detectors HERBERT GELERNTER I 6 M , Watson Research Center, Yorktown Heights, New York
. . . . .
1. Introduction 229 2. BubbleChambers 231 236 3. The Data Reduction Problem for Bubble Chambers 3.1 Scanning 237 3.2 Measuring 238 3.3 Geometric and Kinematic Reconstruction 241 3.4 Physical Analysis 242 , 246 4. Advances in Automatic Data Analysis for Bubble Chambers 4.1 The Spiral Reader 246 4.2 The ScanningMeesuring Projector (SMP) 248 4.3 The Mechanical Flying Spot Digitizer (FSD) 266 4.4 The Precision Encoding and Pattern Recognition Device (PEPR) 262 4.6 BeyondPEPR 268 4.6 Bubble Chamber Event Libraries 269 6. Sparkchambers 270 6. The Data Problem for Spark Chambers 277 279 7. Filmlee Operation of Spark Chambers 7.1 Vidicon Spark Chamber Systems 219 7.2 The Sonic Spark Chamber 286 287 7.3 Wire Chambers 8. Some Other Particle Trace Detectors 290 8.1 The Current Distribution Chamber 290 8.2 The Microwave Discharge Chamber 291 8.3 The Scintillation Chamber 292 8.4 The Filament Chamber 292 9. On-Line Data Processing in Physics 293 Bibliography 294
.
.
.
.
.
. .
.
. . .
.
.
. .
.
.
.
.
.
. .
.
.
. .
. . . . . . . . . . . . . . . .
. .
1. Introduction
To assert that the fast digital computer has revolutionized the practice of high energy physics is undoubtedly to be guilty of overstatement, but not very much so! For if computer techniques have not been an indispensable ingredient in the new particle pie that feeds 229
HERBERT GELERNTER
current theory concerning the fundamental forces and entities of the physical universe, the use of computers has at least increased the rate of accretion of the necessary experimental data by several orders of magnitude. Indeed, it is difficult to conceive of how most of the newly identified “particle-resonance states” that comfortably fit the slots predicted by the presently popular theory of “the eight-fold way” might have been discovered in a world innocent of computers. The fact is, credit for the recent almost dizzying progress in particle physics must be shared among many ideas, things, and people. The importance and the impact of the introduction of computer techniques in the use of the nuclear bubble chamber and more recently the spark chamber, two star performers in the aforementioned class, are the subject of this review. It was in the exploitation of the former of these, the bubble chamber, that the realization painfully emerged among high energy physicists that the computer was as much a part of their experimental setup as the accelerator or the bubble chamber itself, and that the data-processing requirements of the experiment must be treated on an equal footing with the others. This may seem a simple truism to the sophisticated, computer-nurtured physics graduate student today, and indeed it is! Nevertheless there are many who will remember the resistance encountered in the not so distant past by any proposal that a given amount of money be invested in the computer processing of experimental data that may have cost many times that much t o acquire. Spark chamber practitioners benefited immensely from the experience of the bubble chamber physicists. The spark chamber was born with a congenital direct data channel, as it were, and physicists have not been slow to take advantage of this good fortune. At the earliest technical symposia on spark chamber physics, it was pointed out that photographically acquired data would easily yield to existing automatic film scanning and event reconstruction and analysis techniques, and soon after it was shown that the intermediate photographic step could be dispensed withentirely. Instead, thedirectly digitized spark chamber data would be immediately analyzed by an on-line computer, or else stored on magnetic tape for later processing. And now, some five years after the completion of the first useful spark chamber experiment, to insist upon the use of film as the acquisition medium for a high data rate experiment marks one as something of a nonconformist. As we have intimated, the plan for this paper is to discuss the automatic collection and reduction of data first for bubble chambers, and then for spark chambers. Finally, we shall mention some other interesting, but not yet widely used nuclear particle trace detection schemes. We beg the indulgence of the nuclear physicists who might
230
DATA COLLECTION AND REDUCTION
happen upon this paper, for among its intended readers will be some who will appreciate learning what a bubble chamber or a spark chamber is, and how it is used. For these, a certain amount of descriptive material has been included. The computer specialist, on the other hand, must bear with us while we labor certain points obvious to him,for not all physicists have yet accepted computers as more than a replacement for the preferred but fast-disappearing cheap labor of the graduate student . 2. Bubble Chambers
It has been rumored that Nobel laureate Donald Glaser conceived of the bubble chamber while observing the trail of an acrobatic tadpole in a pitcher of warm beer. This story is probably untrue for, had those circumstances actually occurred, Glaser would more likely have invented something like a fluid version of the diffusion cloud chamber instead of the bubble chamber, since the analogy is closer with the former than with the latter. Glaser’s Nobel laurels are a consequence of the fact that he did invent the bubble chamber, in spite of the tadpoles in his beer. Bubble chambers work because a superheated liquid tends to start boiling along the trace of ions left by a charged particle that has rapidly traversed the liquid. Just how this happens is not yet clearly understood, although physical intuition easily leads one to accept the assertion as plausible. Glaser’s early theoretical analysis was based upon the assumption that bubble growth would be caused by the trapping of like-charged ions on the walls of tiny nucleating bubbles formed throughout the liquid as a result of statistical thermal fluctuations. These bubbles normally collapse and recondense (somewhat in the manner of virtual pair creation and annihilation due to vacuum fluctuations in electrodynamics). The mutual repulsion of the trapped charges, however, would prevent the collapse of the bubbles, causing them instead to expand beyond the critical size for spontaneous continued growth in the superheated liquid. While Glaser’s theory had the virtue that it came up with the right answer to the question, “Will it work?”, and thus set him to get it working, it failed to describe the phenomenon quantitatively. Attempts to understand the physics of bubble formation in terms of local heating of the liquid due to the energy loss of the ionizing particles were more successful, but not completely so. Lacking a suitable theory, the track-forming properties of different liquids under different operating conditions must be determined experimentally. This has been done to the extent that, given the momentum of the particle (from measurement of the curvature of its track in a known magnetic field), one can 231
HERBERT GELERNTER
often identify particles of differing mass from bubble density measurements (i.e., from a count of the number of bubbles per unit track length). A bubble chamber, then, is essentially a vessel containing a suitable liquid, means for establishing the superheated state of the liquid, an entrance port for the accelerator beam, and a transparent window to allow the light of a high-intensity short-duration flash in, and the image of the bubble-defined ionizing particle tracks out, that they may be photographed by at least two and more often three cameras to allow the precise stereo reconstruction of any nuclear events captured in the chamber. The most useful chamber medium by far is liquid hydrogen, which presents a pure target of free protons to the accelerator beam (molecular and atomic binding is negligible at energies of interest in high energy physics). Within the limits of the effectiveness of the beam particle selection system and the purity of the hydrogen, the experimenter can be certain of the identity of the interacting particles in a liquid hydrogen bubble chamber. The ascendancy of the hydrogen bubble chamber as the prime experimental tool in high energy physics is due largely to the pioneering work of the Alvarez physics group at the Lawrence Radiation Laboratory of the University of California in Berkeley. After Luis Alvarez and J. G. Wood showed that a bubble chamber could be made to work under properly controlled operating conditions, even though the internal surfaces of the vessel were not perfectly smooth and clean (until then, it was thought that any dirt or irregularity in the chamber would become a nucleus for bubble formation aa soon as the liquid was transformed to the superheated state, thus desensitizing the chamber before bubbles could form along the ionized particle trace), the design and construction of large bubble chambers, hitherto felt to be unrealizable, became a possibility. Under Alvarez's direction, the firat liquid hydrogen bubble chamber large enough to show, with high probability, the tracks of most'of the products resulting from an interaction in the Bev (billion electron volt) energy region was developed at Berkeley. The 72-inch long by 20-inch wide by 14-inch deep chamber was first operated early in 1969, and has been in almost continuous use since. The long and distinguished record of discovery in high energy physics creditable to the Alvarez chamber and its users is not likely to be equaled for some time to come, for it was almost four years before the next of the currently maturing crop of five or six large hydrogen bubble chambers went into operation. Of the other substances that have been pressed into service as bubble chamber media, the most important are liquid deuterium, propane, the halogenated hydrocarbons, and xenon. Deuterium , of
232
DATA COLLECTION AND REDUCTION
course, offers the best available compromise when the desired target particle is the neutron. The heaviest liquids are characterized by very short radiation lengths (the average distance a photon will travel in the medium before materializing into an electron-positron pair). They are, therefore, most useful for the study of unstable particle decay events, where y-radiation appears somewhere in the decay product chain. Short radiation length increases the probability that all of the photons connected with an event, which are not detectable unless they materialize, will in fact produce electron-positron pairs within the volume of the chamber. Also, in the heavier, and therefore denser liquids, the geometric mean free interaction path is shorter than in the lighter ones. This engenders two significant benefits. First, the probability that an interaction or decay product will stop in the chamber is increased, so that range measurements may be used for the accurate determination of particle energies. Second, smaller mean free path means that the number of events produced in the chamber for a given number of beam tracks and fixed chamber length will be greater. Consequently, less time and film will be required to photograph a given number of interactions. Propane, the most commonly used liquid other than hydrogen, is of intermediate density. As indicated by its formula, C3H8,it is rich in “free” protons, so that it can also be used to study beam-proton interactions, although it is often difficult to separate these unambiguously from the beam-carbon collisions. The fact of its shorter radiation length and mean free path, and its simplicity of use when compared with the cryogenic techniques required for the liquid hydrogen chamber, accounts for its popularity. In addition, the carbon nucleus “contamination” in propane has upon occasion been turned to advantage, for carbon is an excellent polarizer of particle spin, and the asymmetry in the angular distribution of the interaction and decay products that have undergone a subsequent scattering by a carbon nucleus yields useful information concerning the nature of the forces being studied. Bubble chambers may be, and almost always are, operated in a strong magnetic field. When a magnetic field is used, the momentum of a charged particle may be determined by measuring the curvature of its trajectory in the field, and, from the direction of curvature one may discover the sign of the charge. For momentum meaaurements, short radiation length in the chamber medium is a liability, because multiple coulomb scattering introduces a statistical uncertainty in the curvature of a track whichgoes as the inverse square root of the radiation length, and inversely with the particle velocity. The magnetic field, therefore, is sometimes dispensed with when the heaviest liquids are used in relatively low energy experiments, for
233
HERBERT GELERNTER
the low precision with which the curvature can be measured in this case makes it useless for momentum determination (although the sign of the particle charge may still be inferred from the direction of curvature). Fortunately, the combination of heavy liquid and low energy is also that which increases the probability for a particle to stop in the chamber, so that an alternate method for the determination of particle energies is more likely to be available when the need is greatest. As a corollary to the preceding discussion, we may point out a further advantage of liquid hydrogen as a bubble chamber medium, namely, the fact that its long radiation length permits extremely precise momentum determination from curvature measurements. Let us enumerate now those properties of the bubble chamber, some of which we have already described above and others not yet mentioned, that have resulted in its immense popularity among high energy experimental physicists. (1) The geometry of a nuclear interaction that has occurred in a bubble chamber may, in general, be reconstructed from stereo photographs to a high degree of precision; since the detection medium is also the interaction target, the exact point in space where the interaction between beam and target particle occurs may be measured; the vertex of an unstable particle decay that has taken place in the chamber may be similarly reconstructed and measured. (2) The fact that bubble chambers may be conveniently operated in high magnetic fields makes momentum and sign determination of charged particles easy. (3) Experimenters may choose bubble chamber liquids that offer pure proton or deuteron targets to the beam. (4)After recording an event, the chamber may be ((reset” for the next event in as little as 1 second, so that data may be accumulated very rapidly. ( 5 ) The relatively high density of target nuclei in the chamber assures a relatively high probability that a beam particle will interact within the chamber. (6) Bubble chambers may be built large enough to contain and display all the products of a nuclear interaction most of the time. (7) Suitable bubble chamber liquids exist in sufficient variety to satisfy the special requirements of a large class of high energy physics experiments.
One need not wonder, then, at the fact that data reduction for bubble chambers has become a critical problem only a few years after Glaser’s invention recorded its first track. Indeed, at the time of writing, 234
DATA COLLECTION AND REDUCTION
bubble chambers throughout the world are generating perhaps eight million stereo photographs each year, conservatively estimated. Three years hence, the volume will quite probably have doubled. The importance of the problem is such that it has attracted the attention of some of the most ingenious of contemporary high energy experimental physicists, with the result that some of the systems developed for its solution are remarkable for their ingenuity and sophistication. 3. The Data Reduction Problem for Bubble Chambers
Data gathered by the experimental physicist in the course of a bubble chamber experiment comprise a collection of photographs of the chamber in sets of stereo views (usually three views per set, but sometimes two or four), together with the list of fixed parameters for the run (i.e., beam type and momentum, chamber magnet current, etc.). Each stereo set, exposed a few milliseconds after the passage of a burst of perhaps fifteen to thirty beam particles through the sensitized chamber, will display the curved (because of the magnetic field) tracks of the beam particles, some of which will have interacted with target nuclei to produce (hopefully) nuclear events of interest to the physicist (Fig. 1). With the fully processed film in hand, one may identify the following broadly defined steps in the analysis of the data, Scanning, in which the film is examined for the occurrence of “interesting” events. All interactions so identified are classified according to their topology (i.e., number of prongs in the primary vertex, whether there exist secondary vertices or decays associated with the primary interaction, and so on), and the frame number and location of the event are noted as input for the next step, measuring. The culled film passes on to the measuring phase, in which, for each view of the stereo set, the coordinates of a number of points are precisely measured for each track associated with every event designated as interesting, and transferred to some intermediate storage medium (punched cards or magnetic tape, for example) together with the measurement of a set of reference markings for each view (the socalled chamber Jiducials) and the indicative data for the event. The latter consist of the fixed parameters for the run and all other information necessary to identify the specific event and to chronicle its history in the analysis process. The storage medium containing the digitized measurements and indicative data is then transferred to a computer for the next step, geometric and kinematic reconstruction. Output from this first computation stage is a complete kinematic and dynamic profile of each event, listing momenta of each track segment, space angles between all tracks originating at a common vertex 235
HERBERT GELERNTER
(including the inferred neutral tracks joining the point of production with the decay vertex), energy and momentum balance terms for each vertex, together with an error matrix, containing individual and correlated probable errors for every measured and computed quantity in the event profile. Each event has been tested against the set of possible physical interpretations that satisfy the reconstruction within
Fro. 1. K- intermtion in hydrogen to produce a Z+ and x - . T h e I;+ then decaya into a x+ and a neutron, which leaves no visible trace in the chamber. The reader ia invited to identify this event in the acanner’a instruction sheet (Fig. 2). (Courtesy of the University of California Lawrenue Radiation Laboratory).
236
DATA COLLECTION AND REDUCTION
the limits of the error matrix, and the most likely hypotheses are listed in the order of goodness of fit. These are then the primary data for the final stage of the process, the physical analysis of the ensemble of discrete events. Here, the collected output of the experiment is analyzed statistically in the light of the objectives of that particular physics experiment. It is only in this last phase that the experimental physicist truly assumes the role suggested by his title, for in every previous stage of the process, from the actual production of the film using an existing and operating bubble chamber to the completion of the reconstruction computations, the experimenter may, if he chooses, almost remain merely an interested onlooker while the relentless machinery of Big Physics takes over. (An exception to this state of affairs may occur when newly processed film is being scanned for the first time. I n this case, the physicist often participates actively in the scanning stage, so that any especially interesting or unusual event may not escape his attention.) The actual sequence of events in the data ahalysis for a real experiment will often depart at one point or another from the procedure above. The highly compressed description of necessity ignores the many feedback loops between stages (remeasuring, for example, when an event fails some test at a later stage), and slights the fact that all but the most straightforward of experiments require the introduction of special procedures to satisfy special requirements. Let us, therefore, re-examine each of the steps in bubble chamber data analysis in somewhat greater detail before describing some of the systems currently in use or under development for the solution of the problem of data reduction for bubble chambers. 3.1 Scanning
The scanning process is essentially one of pattern recognition, although, as we have suggested above, a little sophistication in physics goes a long way toward providing insurance against the loss of the rarely occurring but highly rewarding event that announces the arrival of a new particle, or else establishes the existence of a new interaction or mode of decay for an older one. Predictably, the search for interesting events is the only phase of the data analysis procedure that has not yet been successfully automated. It is likely, however, that at least a partial solution to this problem is in the immediate offing. (The planning of the experiment and the final physical analysis and interpretation of the data are of course excluded from the automation arena.) Fortuitous extraordinary events aside, the physicist planning the 237
HERBERT GELERNTER
experiment generally wishes to have the film examined for all identifiable occurrences of a particular class of interaction. He will therefore prepare a set of instructions for the scanners (usually physics graduate students or people of similar background, but the author knows of one divinity student who apparently derived much pleasure and some income as a scanner). The instructions include diagrams for every predictable topology at the specified beam energy for the sought-after event class, as well as event acceptance criteria (some will always be rejected for one reason or another-for example, events occurring in a part of the chamber where fluid turbulence is expected will be ignored if they are not of the extraordinary category), and hints and clues for establishing the identity of doubtful cases (Fig. 2). Bubble chamber film is generally examined with the aid of special purpose scanning tables. These devices, constructed with high quality optical systems, make it possible for the scanner to project all of the stereo views, either separately or superimposed in any combination, upon a large white table top. Small chambers are projected to full size; larger ones are generally reduced somewhat. Scanning tables for large long chambers are usually supplied with movable film carriages. In home position, the full chamber is projected onto the table. The scanner may bring any part of the image to his station a t the end of the table by moving the film carriage. Every interesting event that has been discovered in scanning is classified according to event type, and information that will be useful to the computer programs to follow is added to the indicative data. In addition, instructions are prepared for the guidance of the measurer in the next step. This “working up” process (“sketching,” in the parlance of the Alvarez physics group) requires somewhat more skill and experience on the part of the scanner than does the initial event recognition. It is therefore sometimes performed as a separate step by a more qualified person after the film has been culled by less experienced scanners.
3.2 Measuring I n photographing bubble chambers, the demagnification factors in general use vary from perhaps 12 to 18. The high precision of the optical systems employed, the strict control and correction of systematic errors, and the inherent accuracy of the bubble chamber itself as a nuclear particle trace detector place extreme requirements on the accuracy and precision of the track-measuring system adopted for bubble chamber work. The need for such accuracy cannot be too strongly emphasized. Especially in the case where unseen neutral particles are involved in the interaction, small errors in the measurement 238
DATA COLLECTION AND REDUCTION
'\
\-
CHARGED Z'S
IT+'\
/ I
/
=+
1"-
"Z"
#rite
In flight
b-
/Kink
/I"+
"V"
Xz
1 A
"A" "A;'
I (invisible 2 )
if proton stops
k \
\ "-
if proton stops
\
\
5' / '
\
FIU.2. Scanner's instructions. Excerpt from instructions prepared by Landis, Rosenfeld, and Solmitz for scanning K-p interactions for Z particle production. Solid and broken lines signify heavily and lightly ionizing particles, respectively. Having recognized one of the specified configurations, the scanner adds its designation, indicated below, to the indicative data for the event. (Courtesy of the authors.)
of visible tracks can be vastly magnified when computing the kinematic reconstruction of the neutral ones, often making i t impossible to distinguish the correct interaction from a number of energetically clustered possibilities. It is reasonable, therefore, if not absolutely necessary at the usual demagnifications employed, to 239
HERBERT GELERNTER
measure accurately down to the random error introduced by the uncontrollable distortion of the gelatin image resulting from the rigors of film processing. (Uniform shrinkage of the film is of course not a problem, since each frame carries its own measurement standard, the image of the reference marks, or fiducials, that are engraved on the glass window of every chamber.) This uncertainty, of the order of 2-4 microns on the film, determines the desired accuracy of any system intended for the measurement of bubble chamber film. Clearly, the track-measurement techniques traditionally used on cloud chamber photographs, arc templates for track curvature, and a ruler and protractor for linear and angular measurements, are ludicrously inadequate for bubble chamber physics. To meet the strict measurement requirements, special high-precision projection microscopes capable of f 2 p accuracy were developed at a number of different high energy physics laboratories. Typically, such an instrument operates in the following way. The image of the chamber is projected from behind onto a translucent screen, upon which a reference mark has been engraved at the optical axis of the system. A point on the film is measured by moving the film stage to bring the magnified image of the point into coincidence with the reference mark. The x and y coordinates of the stage are digitized by encoding devices attached to the stage drives, and the coordinates are automatically punched onto cards or paper tape. Supplied with the film and a scanner’s guide to the interesting events contaiped therein, the task of the measurer (be he/it human or machine) is the following. Consulting the guide, the measurer locates the next event to be immortalized on (let us say) punched cards. The indicative data accumulated by that event to date are entered onto the cards, together with additional data having to do with that particular instance of measurement. The measurer then proceeds to measure and transfer to the cards first the coordinates of the fiducials in each stereo view, and then those of a number of points (perhaps ten) on a t least two stereo views of each track of interest. The first steps toward automation of the tedious and time-consuming measurement process were taken by the Alvarez physics group at Berkeley. Their projection microscopes were equipped with closedloop servo-controlled slits, which automatically centered the crosshairs on the track to be measured when the track was brought into range of the servo scan. Once centered, the servo would maintain that condition while the operator “drove” the cross-hairs along the track, punching coordinates at suitable intervals along the way. These instruments, dubbed Frankenateins, after J. F. Frank, the engineer largely responsible for their development, were supplied
240
DATA COLLECTION AND REDUCTION
with “steering wheels” to adjust the angular orientation of the crosshairs in order to optimize the servo response. Projection microscopes, servoed or not, while adequate for the earlier years in the history of bubble chamber physics, were soon outdistanced by the immense outpouring of film to be measured as one after another new bubble chamber went into operation. We shall return to this subject later, in considering the factors that led to the all-out effort that has been and is being expended to achieve fully automatic bubble-chamber data reduction. In the meantime, we shall be content with our deck of cards or roll of paper tape containing the digitized indicative data and coordinate information for each event of interest, and pass on to the next step in the data analysis process. 3.3 Geometric and Kinematic Reconstruction
The punched cards generated by our measurer now become grist for the computer. After all track coordinates are normalized with respect to their corresponding fiducials, each track is reconstructed in space from its stereo views. Ideally, the trace of a charged particle, in a uniform magnetic field and in a medium to which the particle lost no energy and transferred no momentum, would be a uniform circular helix. One may visualize the space curve by imagining a cylinder such that the axis is oriented in the direction of the magnetic field, and for which the cross section in a plane normal to the field is a circle. Then the path described by the particle. will be a uniform helix lying on the surface of the cylinder such that the projection on the axis of a sequence of successive points along the helix is a sequence of successive points along the axis.) The radius of the helix is determined by the intensity of the magnetic field and the charge and momentum of the particle. While some geometric synthesis programs assume the ideal conditions in practice, most fit the space curve with higher order polynomials to cope with the reality of non-uniform fields and bubble chamber liquids of non-zero density. One version of a program developed by the Alvarez physics group, for example, fits the projection normal to the magnetic field with a fourth order polynomial, and the axial projection with a polynomial of the third order. The reconstructed event is then analyzed by the kinematic subprograms, which compute the energy and momentum balance for the interaction, and then attempt a least squares fit of the event to each of a, set of hypothetical interactions that might have produced the observed configuration of tracks. The determination of the set of possible interaction hypotheses is generally derived from data introduced during the “sketching” phase of the scanning step. At this 241
HERBERT GELERNTER
stage, events that fail to satisfy any of the hypotheses within reasonable limits are marked for remeasuring. Cases that continue to fail after having completed the circuit several times are carefully examined to determine whether they ought to be rejected as poor data or enshrined as new particles. As was the case in the development of automatic measuring systems, the fist complete and extensive geometric and kinematic reconstruction programs were born at the Lawrence Radiation Laboratory of the University of California at Berkeley. Two independent systems of programs were written for the IBM 709 and its successor computers; one by the Alvarez hydrogen chamber group, supervised by A. Rosenfeld, with the close collaboration of many others, but primarily J. Snyder (of the University of Illinois), F. Solmitz, and H. Taft (of Yale University), and a second by the Powell propane chamber group under the direction of H. White. Each of these systems has spawned its own line of descendents at different laboratories, at least inspired by, if not directly adapted from, one or the other of the parent programs. While there are important differences in detail between the two systems (which tend to be mitigated among their numerous descendents), they perform the same functions, and have the same major subdivisions. Each has a geometric reconstruction program (called PANQin the former, and FOQin the latter system), each has a kinematic program (KICKand CLOUDY,respectively), and each has a number of programs and subprograms to assist the physicist in his analysis and interpretation of the experimental data. These latter programs we consider to be part of the next and final step of the data reducbion procedure. 3.4 Physical Analysis
In this stage, except for the rare case of the single event that establishes the existence of a sought-after particle, interaction, or decay mode, the emphasis shifts from the individual event (that maintained its integrity throughout the earlier stages of data reduction) to the ensemble of interactions that constitute the experiment. The entire output of the experiment is here statistically analyzed to the end that inferences and conclusions may be drawn, histograms and angular distributions graphed, and Dalitz plots plotted. It is in this area of bubble chamber data-analysis programming that most of the recent effort has been concentrated, and such has been the progress that the physicist may now select a series of programs and subroutines, assembled in a kind of physics problem-oriented language, that will accept as input the library of events generated by the earlier stages 242
DATA COLLECTION AND REDUCTION
and produce as output CRT (cathode-ray tube) plots of the final results suitable for publication in the paper reporting the experiment. The data analysis segment of the Alvarez group program system consists of a section called EXAMIN(which still treats events on an individual basis, and is in fact the section that makes the final choice of physical hypothesis from among those suggested by the kinematic reconstruction program), and another called SUMX(the remarkable routine that converts the data summary tape produced by EXAMIN into a graphic display of any desired variables measured by the experiment). White’s equivalent of EXAMIN is called FAIR.His group has no graphic display program yet, having directed most of its recent effort toward the integration of the FOG-CLOUDY-FAIR system with the Hough-Powell flying spot digitizer (to be described below in greater detail) in order to develop a fully automatic bubble-chamber dataanalysis system. In planning his programs, White was quick to recognize the importance of adequate event-library and bookkeeping facilities, as one graduate student after another was displaced from his office by racks of exposed bubble chamber film. He therefore made these an integral part of the FOG-CLOUDY-FAIR system, a feature that contributed much to its popularity in other laboratories. I n the Alvarez group system, this function is performed by a separate program, LINGO,which was written after the completion of the major event-processing programs. Both Berkeley groups, and a number of others at different laboratories, have written additional special purpose programs and subroutines to perform functions out of the mainstream of the data-analysis sequence, or else to extend the automaticity of computer processing to areas still relegated to human skills or judgment. Some of these will be discussed below, where appropriate, when our attention turns to the more fully automatic systems currently in operation or under development. One written by the Alvarez group, however, deserves special mention here. Called QUEST,it is a modification of PACKAGE (the name given to the combined version of the PANGand KICK programs), which provides for running communication and on-line intervention at any stage of the geometric and kinematic reconstruction computation. QUESTis most useful for the an,alysis of events of unusual topology or those that repeatedly fail to satisfy any of the standard interaction hypotheses, even after remeasurement. In use, the physicist will sit at an on-line typewriter and communicate to the program any departures from the standard analysis procedure. The results at each stage of the computation may be requested by the operator via the typewriter, and he may, in turn, respond with additional instruction for the program. The program is of course written to 243
HERBERT GELERNTER
permit the computer to engage in some other activity while awaiting action on the part of the physicist. The sequence of operations described above is essentially that which emerged in 1959, when the great bubble-chamber population explosion to come was first recognized as a future certainty. With a number of minor improvements, it is the same system under which most bubble chamber film was being processed at the time of writing (including a substantial amount being produced in the Soviet Union). The more automatic systems are rapidly coming of age, however, and by the publication date of this volume, the balance will very likely have shifted, with the newer devices generating most of the output. When one considers the fact that the 1959 “nonautomatic” data-analysis procedure represented an immense advance in sophistication and efficiency over the earlier cloud chamber techniques, it is reasonable to question the scientific and economic justification for the huge effort and sums that have been invested in the search for a fully automatic event recognition and analysis system. Let us examine, therefore, the situation facing a large bubble chamber facility in 1960, when the decision waa first taken to launch a serious all-out assault on the problem. At that time, a single Frankenstein servo-controlled digitizing projection microscope, measuring the scanned film generated by three to five scanning tables operating simultaneously (the number of tables depending upon the frequency of occurrence of the particular event being sought), could be expected to prepare computer input tape for perhaps 24,000 events per year when operated three shifts a day, and taking into consideration the fact that about 16% of the events processed required remeasuring at least once. A single large hydrogen bubble chamber operating on a rather relaxed schedule was expected to produce at least a quarter of a million events a year of prime interest, and perhaps another million a year worth measuring. A propane chamber, by virtue of its shorter mean free interaction path, could produce several times that number. Since each Frankenstein with its satellite scanning tables required about five people per shift to operate, the data-processing facility necessary to keep up with the full prime output of a single large hydrogen chamber was estimated at about ten Frankensteins, forty scanning tables, and perhaps 200 full-time people, including a maintenance engineer for each shift, and supervisory and clerical personnel. Since its inception, the system has been substantially improved. For example, the scanning table operator now generates punched cards that control the measuring projector, automatically finding each event-containing frame on the film, and moving the cross-hairs to the primary vertex and fiducials of 244
DATA COLLECTION AND REDUCTION
each view. But while these modifications have just about doubled the throughput for the same quantity of hardware and personnel, the maasive data-handling installation still necessary for just one chamber borders on the excessive, even when it is constrained to processing only prime events. A fortiori, the facility required to do justice to all deserving events produced at a large accelerator laboratory, where several bubble chambers might be in continuous and simultaneous operation, is just too terrible to contemplate. Throughout the evolution of the system, when compared with the costs of the scanning-measuring phase, computation costs per event have been relatively small, certainly in time if not so much in dollars. It waa inevitable, therefore, that the scanning-measuring bottleneck become the focus of the initial and most intensive efforts to achieve an order of magnitude improvement in event-processing capacity and efficiency. We turn now to the consideration of these labors, the fruits that they have borne, and their promises in the bud of things to come. 4. Advances in Automatic Data Analysis for Bubble Chambers 4.1 The Spiral Reader
A t the risk of appearing to repeat ourselves, we must once again credit the Lawrence Radiation Laboratory at Berkeley with the firat significant effort to find a superior alternative to the Frankensteinbaaed system described above. Almost simultaneously with the development of the servo-controlled digitizing projection microscope, Bruce McCormick, then with the Alvarez physics group, addressed himself to the problem of achieving a much higher degree of automation in the measuring process by tying an on-line digital computer to the measuring device. McCormick’s machine, known as the spiral reader, is for the most part a special purpose instrument, designed for the very rapid and efficient measurement of single-vertex multiprong events. Recalling that the bubble chamber is operated in a uniform magnetic field, the beam track ending at the vertex and the interaction products originating there appear, to good approximation, as a number of circular arcs of rather large radius emanating from a point. The operation of the spiral reader is based upon the fact that, if one scans along the circumference of each of a number of concentric circles centered on the vertex, the angular displacement of the intersection of a track originating or ending a t the vertex with each of the circles goes linearly with the radius of the scanning circle, providing that the latter is small compared with the radius of the track arc. The principle of the spiral reader is realized in a precision digitized projection microscope on which one positions not a cross-hair, but 245
HERBERT GELERNTER
rather the center of an opaque rotating scanning disc that has inscribed on it a set of transparent radial slits, each of which is about ten bubbles long and one third of a bubble in width. The slits are inscribed along a radius of the disc and are equi-distant from one another and from the center of the disc. Some means is supplied (different in different versions of the device) to sample the light passing through each of the slits, converting the variations in intensity into a pulse train such that any pulse generated by any slit may be correlated with the angular position of the slit, determined precisely with the aid of a coaxial Baldwin optical encoder. This angle, together with the accurately known radius of each scanning circle, determines the coordinates of the track segment with respect to the center of the disc, whose coordinates in turn are precisely measured by the digitizing projection microscope. The present version of the spiral reader is largely the work of J. Russell, who assumed responsibility for the final development of the system. In using this machine, the operator carefully positions the image of the vertex of the event being measured on the center of the scanning disc (Fig. 3). The on-line computer then collects and stores, for each slit, the digitized angular coordinates of every lightabsorbing mark on the film that falls within the narrow annulus scanned by the slit. These will include not only the sought-after track segments emanating from the vertex, but also track segments not connected with the event, blobs and electron spirals on the film, and noise due to dirt, film scratches, and the like. Since the increment in radius from one scanning circle to the next is constant, a track segment associated with the vertex of interest will produce a pulse at each of the scanning radii (with an occasional miss due to gaps in low-ionization tracks) such that the difference in the angular coordinates of the track pulse at successive radii is constant. An IBM 709 program named FILTER, written by D. Innes, then makes a systematic search of the stored coordinates for sequences of pulses that satisfy the condition above for true event tracks. The program is able to discriminate against pulse sequences that masquerade as vertex tracks due to the coincidental configuration of non-event tracks and noise, and extrapolate through track-obscuring areaa of the film. It will extrapolate as well through a gap in the Bequence with the imposed constraint that the loss not exceed one slit in length. The FILTER program, which has proved highly effective in separating star-type interactions from the background tracks and noise in which they typically occur, converts the reader into an elementary patternrecognition machine, and possibly represents one of the earliest practical 246
DATA COLLECTION AND REDUCTION
( b)
FIQ.3. Spiral reader. The single-vertex event illustrated in (a) is measured by centering the reader disk on the vertex of the projected negative image, as in (b). Portions of the photomultiplier output for selected annular ring scanning slits are displayed in (c). The FILTERprogram identifies particle tracks (circular arcs) emanating from the vertex by recognizing the linearly increasing angular displacement of the track pulse with increasing radius of the slit annulus. (Courtesy of the University of California Lawrence Radiation Laboratory and Professor B. McCormick.)
applications for such a device outside the realm of alpha-numeric character recognition. But despite the fact that the spiral reader has to date successfully measured several hundred thousand single-vertex events at perhaps 2-3 times the rate possible with Frankenstein, it is most unlikely that another such machine will ever be built, for its limitations are many (for example, very short prongs are difficult for the machine to defect, and generally require special treatment in any case to measure their length),and its relative efficiencyin measuring 247
HERBERT GELERNTER
complex multivertex events is low. The machine’s nonadaptability to completely automatic operation without an on-line human to locate the vertex and position the scanning disc is an additional drawback. It is an especially serious one in this instance, for the single-vertex events that are the spiral reader’sforte will undoubtedly be the easiest kind for a fully automatic pattern-recognizing device to discover, and hence characterize the first class of experiment that will be assigned to such systems that are already in an advanced stage of development. 4.2 The Scanning-Measuring Projector (SMP)
Even before the spiral reader became fully operational, it had already ceded its distinction as the exemplary on-line human-pluscomputer-assisted automatic measuring device to an ingenious new approach to the problem which, while suffering the same limitation as the reader in adaptability to completely automatic event recognition, yields a system of far greater versatility and general usefulness at significantly lower cost. The Scanning-Measuring Projector (SMP) was conceived by Alvarez in 1960, when the upper bound on the capacity of a Frankenstein-based facility was already in sight, while the development of the spiral reader had bogged down, and while proponents of fully automatic bubble-chamber data-analysis systems were prudently hedging their promises of Utopian physics with overly generous (so they thought) estimates of the time and money required for their development. The design philosophy of SMP is based upon the following simple and highly reasonable premises. First, as the processes of the scanning and measuring phases were separately speeded up, the so-called “handover” time, which includes that spent in physically moving the film and its accompanying indicative data from the scanning table to the measuring device as well as the time required to rediscover each event that had already been identified in an earlier pass through the film, began to consume an increasingly significant fraction of the overall analysis time, and a nonproductive fraction at that. In order that the operator be able to find the event to be measured, a human-controlled measuring device requires adjunct facilities equivalent to a separate scanning table. If only it were possible to construct the measuring device cheaply enough to be able to afford its use as a scanning table as well, the scanner-to-measurer handover time could be completely eliminated-which introduces the second premise. The greater part of the cost of a digitized servocontrolled projection microscope resides in the very high-precision measuring stage (reproducibly accurate to one part in 20,000), and in the electronic servo system. If the high precision and accuracy required 240
DATA COLLECTION AND REDUCTION
could be achieved by cascading two no less accurate, but rather less precise (and therefore less expensive), measurements, and if the track servo function could be assumed by a track filtering program in an on-line digital computer a t a sufficiently low cost per event, the requirements of the first premise would be satisfied, and one will have gained as a bonus the benefits accruing from the availability of the on-line computer. The fact is, it is already more economical to do digital track following with a filter program in a large-scale computer than to rely on an analog servo. By designing a facility whereby the measurements generated by a number of separate devices are multiplexed for simultaneous processing in a single computer, which has also been multiprogrammed to perform independent computations when its services as a track follower are not in demand (for example, the geometric and kinematic reconstruction of fully measured events), the criterion of low track-following cost per event is easily met. The high precision at low cost is achieved in SMP by measuring with rather low precision (to a least count of 1/128 cm in each coordinate) relative to some one of an array of precisely positioned “bench marks” which are situated a t the vertices of a 1 centimeter square lattice. Identifying the particular bench mark in the approximately 2 by 3 foot array that is correlated with the measurement clearly requires another measurement, but one of quite low precision. The cascaded measurements produce a final measuring precision of about 80 p on the projected image, which is equivalent to about 6 p on the film. At first glance, the result seems nothing to become elated about; after all, we had become accustomed to a least count of 2 p in a Frankenstein measurement. As we shall soon see, however, digitizations of track points come so much more quickly and cheaply on SMP than with a projection microscope, that it is easy to get enough points to bring the final statistical uncertainty in the position of the track down to that commonly achieved with Frankenstein. The scanning-measuring projector looks much like a conventional scanning table with its white slab top replaced by movable opaque white mylar sheets. Called the “window shades,” this arrangement permits the operator to position a 0.6-cm diameter measuring aperture smoothly and continuously anywhere on the image of the bubble chamber which has been projected on the surface of the sheets (Fig. 4). The major shade is mounted on large rollers so that it is free to move along the length of the projected image (toward and away from the operator). It covers the entire 2 by 3 foot scanning surface except for a 1O-cm wide gap running the width of the projected image. I n this gap is mounted a carriage holding a transverse minor shade which rolls freely in a direction orthogonal to that of the major shade. 249
HERBERT GELERNTER
The measuring aperture, mounted in a carriage on the minor shade, partakes freely of the combined motion of both shades. The approximate coordinates of the aperture are easily determined from the position of each shade on its rollers, which is in turn measured with
FIG.4. Scanning and measuring projector and “window shades.” The rack directly behind the scanning table contains the electronics for the unit illustrated in (a). T h e structure to the left of the table contains the film transport mechanism and the optics for the device, while the typewriter to the right enables the operator to maintain constant real time communication with the track filtering and measuring program. The “window shade” aperture carriage described in the text is also the scanning surface. It is illustrated in greater detail in (b). (Courtesy of the University of California Lawrence Radiation Laboratory.) 250
DATA COLLECTION AND REDUCTION
the aid of narrow rectangular index slots punched along one edge of each shade that are sensed photoelectrically. Mounted below and moving with the aperture is a drum rotating at 1200 rpm. The drum carries a periscope positioned so that the segment of image falling through the aperture is displaced along the constant-length arm of the periscope. The rotation of the drum causes the displaced image to sweep out a circle centered on the aperture at the constant angular velocity of the drum. In each of its circuits, the displaced image will in general pass over several of the bench marks, which are in fact 0.5-mm transparent spots on an opaque bench-mark plate mounted below the rotating periscope assembly. A photomultiplier tube, below the bench mark, samples the light of the image as it passes over any bench mark. Where no track segment falls within the aperture, the photomultiplier pulse will be plateau-bhaped. A track segment within the aperture will produce a sharp nick in the plateau when the opaque image of the track in a clear background passes over the bench mark (Fig. 5 ) . The dip in photomultiplier output is detected by electronic circuitry, and the displacement of that point on the track from the detecting bench mark is directly digitized at that instant with the aid of magnetic azimuthindicating recordings on the drum that carries the periscope. From the approximate position of the aperture taken from the window shades, it is easy to determine which bench mark was responsible for the digitizing, so that the complete coordinates of the track point may be constructed. I n use, the average SMP operator can guide the aperture along the track to be measured at about the same rate that a Frankenstein operator can drive his servo-controlled cross-hair. (While SMP lacks a centering servo, the track to be measured need not be centered, but may fall anywhere within the aperture. For the Frankenstein servo to take effect, its operator must of course also keep the track within the servo aperture distance of the cross-hair.) During this time, however, SMP will generally automatically digitize at least 10 times as many points as is customary with Frankenstein, the exact number depending upon the density of the track. The bubble density information so obtained is often useful in the later stages of analysis. SMP will, of course, digitize not only bona fide track points, but also coordinates of extraneous bubbles and crossing tracks that lie within the strip swept out by the aperture. The on-line computer must therefore filter out the desired points from the full set of coordinates collected during a track-measuring operation. The programs that perform this task, as well as the time-sharing and multiprogram monitoring necessary when a number of SMP units are tied into a 251
HERBERT GELERNTER
single computer, are largely the work of J. Munson. (Among the others who played an important role in the development of SMP, P. Davey, R. Hulsizer, W. Humphrey, and J. Snyder figure prominently.) The SMP filter program does, in effect, the digital equivalent of the trackcentering process performed by the servoed slit of the projection microscope. A parabola is first fitted roughly to the full ensemble of points in the “track bank,” and then each point is transformed to a new set of coordinates corresponding to the distance of that point from Dirploced morks
Aperture,
b’
n m m 1 1 1 1
I lb)
0
60
120 180 240 Azimuth 8 (deq)
PPI
1 1
500
S O
FIG.6. SMP track-point digitization system. The image of a track falling in the aperture of the periscope is displaced the distance R. Rotation of the periscope weeps the image around, describing an annulus, aa in (a). Recalling that in the projected bubble chamber negative, the tracks appear desk on a light background, whenever the seotor of the mnulus occupied by the circumnavigating image intersects a transparent bench mark point in the opaque bench mark plate, light from the image will paes through the bench mark and be detected by the photomultiplier below. The photomultiplier output for one revolution of the periscope will display a plateau signal for each transit of a bench mark by the image (b). If the aperture contains the image of a track, some of the bench marks will be “eclipsed“ by the dark track during the transit, resulting in a notch in the plateau of the photomultiplier output. A detected notch indicates that there exists a point on a track segment at distance R from a bench mark, and in a direction specified by the azimuth angle of the periscope at that instant. This information, together with the approximate coordinates of the aperture aa supplied by the window shade digitizer, is sufficient to determine which bench mark generated the signal, and thence the precision coordinates of the track point. Note that the points on the segment digitized during a single revolution will in general not be the same ones, even if the aperture waa stationary during the measurement. (Courtesy of the University of California Lawrence Rdiation Laboratory.) 252
DATA COLLECTION AND REDUCTION
and along the parabola. I n the transformed system, successive groups of perhaps ten points along the track me averaged in their distancefrom-the-parabola coordinate, and those too far removed from the main sequence are rejected as extraneous, after due consideration of the indications of a fairly sophisticated procedure for tracking prediction and retrodiction. The points that have survived the filter (generally, the majority) are again averaged in groups of ten, and the set of average points, transformed back into rectangular coordinates, are used in the traditional way to construct the best possible track through them. With the exception of the fact that single points are always measured on SMP as the intersection of two lines (artificial lines are available for measuring isolated points), the device thus far is in most ways the complete equivalent of Frankenstein. An SMP unit with all of its associated electronics, however, including its share of the input multiplexing circuitry, costs less than one third as much as Frankenstein, and only about twice as much as a bare scanning table. When one remembers that overhead must be added to the capital costs of maintaining separate scanning tables, the use of SMP for scanning aa well as measuring may be justified economically as well MI on the grounds of “handover” elimination. But it is only when one takes into account the many things one may do with the on-line computer, in addition to track filtering, that the attractiveness of an SMP facility for a moderate size bubble-chamber physics installation becomes apparent (a “moderate size” installation being one where perhaps a quarter to a half million events are analyzed each year; where a million or more events must be measured annually, it is likely that one of the fully automatic systems described below will become the preferred facility). A glance at the photograph of an SMP unit will reveal an electric typewriter at the operator’s right hand. The typewriter provides a two-way communication channel between the operator and the control program throughout the entire course of the measuring process. Through the typewriter, the computer directs the entire measuring sequence, specifying the particular track, vertex, or fiducial that is to be the object of the operator’s ministrations at each stage of the measurement (stereo views are automatically selected and changed by the control program), and sending error messages requesting remeasurement when the filter program is unable to reconstitute the track from the data presented to it. Each error comment provides information concerning the probable cause of the difficulty to guide the operator in remeasuring the offending track. This direct-feedback control of the measuring process eliminates most event rejects in later stages of the analysis due to operator 253
HERBERT GELERNTER
blunders (measuring the wrong track or the wrong view, for example), and greatly reduces the number of rejects due to poor measurements. As with Frankenstein, however, too many events still fail to complete the geometric or kinematic reconstruction phase for reasons other than those corrected by the features of SMP described above. For example, one of the interacting particles (and therefore, its mass) might have been misidentified, causing a misfit to every one of the proposed hypotheses for kinematic reconstruction. If such a failure could be made known to the operator while the event still lay before him on the soanning table, corrective action could be taken immediately, eliminating a vast amount of costly and time-consuming reprocessing with its concomitant bookkeeping. The extent to which such very high-level feedback can be incorporated into an SMP system depends, of course, on the number of scanning-measuring units in the facility and on the power of the digital computer to which these units are connected. The pioneer Alvarez SMP facility in Berkeley has planned a system of twelve units tied into an IBM 7044 computer. It is anticipated that the computer will be able to accommodate the output of these twelve scanning-measuring projectors through the complete PANG (geometric reconstruction) stage and through a first order “pseudo-&cK” (kinematic reconstruction) stage as well, feeding back the results of the analysis quickly enough so that remeasurement when called for, entails no more than backspacing a few frames on the film, Pseudo-KICK will be sufficiently discriminating to assure that surviving events will almost never have to be remeasured. Output from pseudo-KICK will be in suitable format for introduction into the mainstream of the data-analysis program system just prior to full KICK processing. Depending upon the frequency with which the sought-after interactions occur, and upon the topological complexity of the events, a twelve-unit facility such as that described above could be expected to analyze perhaps one half million to one million events each year, operated in the planned scanning-measuring mode on a four-shift schedule.’ (Presently, an abundance of conventional scanning tables combined with a scarcity of SMP units makes it more economical to separate the processes of scanning and measuring in the traditional way a t Berkeley. This state of affairs is considered to be purely temporary.) With the possible exception of the kind of output to be expected from bubble chambers operating with very heavy liquids (and the very irregular tracks caused by multiple scattering in dense media can be expected to tax every conceivable measuring scheme), ‘Full-time we during weekends constitutes the fourth shift.
254
DATA COLLECTION AND REDUCTION
an SMP-based facility is versatile enough to be useful for the analysis of almost any interaction occurring in almost any bubble chamber, This is in sharp contrast to the spiral reader, which loses efficiency rapidly as the complexity of the interaction increases. Indeed, the more complicated the event, the greater is the return that can be expected from the high-level direct-feedback feature, and the “synergetic” man-machine mode of operation of the system. We reiterate then that, as of the present, SMP seems to provide the most useful and reasonable (in both senses, logical and economical) solution to the data reduction problem for intermediate size bubble-chamber installations and for physics departments that obtain their exposed bubble chamber film as guests of other accelerator laboratories. The very large, multimillion-event-per-year installation must look beyond SMP to one or the other of two different schemes for fully automatic bubble-chamber data reduction just on the horizon. These systems, quite different from one another in conception, are currently in about the same stage of final development, although one has already been pressed into service in a semiautomatic mode of operation, with human guidance. Since neither system has yet displayed definite indications of superiority over the other, it is likely that both will coexist peacefully in the future, if their respective promoters will not. 4.3 The Mechanical Flying Spot Digitizer (FSD)
The premise of the Flying Spot Digitizer (called, at one time, the “Hough-Powell Device,” after its initial developers) is based upon the observation that while a single frame of bubble chamber film is in principle capable of carrying a t least ten million bits of information (with more optimistic definitions of the unit of information, the number has been set by some a t more than lo8 bits), the typical frame is in fact relatively sparsely occupied with bubble images. This being the case, it is possible to specify digitally the entire contents of a bubble chamber picture in relatively compact form (typically requiring of the order of lo8 bits) by transforming the film image to a “bubble coordinate representation,” in which the coordinates of a sufficiently dense sampling of the bubble images are digitized to the required precision of 2 p. To achieve the same measurement precision in a “bit image representation,’’ where the photograph is converted into a matrix of ones and zeros (the “one” indicating a dark point on the film, the “zero” a clear point), a 2-p grid would be required. Film from the Alvarez chamber, with an area of 40 by 130 mm to be measured, would require about 6 x 109 bits to store the bit image representation of a single view. (This should not be taken to construe 255
HERBERT GELERNTER
that the bubble chamber image contains 5 x lo8 bits of information, since neighboring bits within the bit image are not necessarily independent of one another. For example, every bit within the confines of a bubble image must be a “one.”) The image scanning and digitizing device, proposed in 1960 at the European Organization for Nuclear Research (CERN) by P. Hough and B. Powell, is a high-precision high-resolution version of the earliest and most primitive form of television picture transducer, modified to enable i t to scan simultaneously the film and the ruler against which the film is to be measured. The scanning spot is generated by the light of a high-intensity mercury lamp passing through the intersection of two narrow (20p) crossed slits, one stationary, and the other a radial slot cut into a rotating disc, curved slightly to give a uniform sweep speed. The spot is imaged by separate optical systems on the bubble chamber film and on an accurately ruled grating. Photomultiplier tubes terminate both optical paths to sample separately the light paased by the film and by the grating. Rotation of the disc causes the spot to trace out a line, and the slow mechanical motion of the stage carrying the film in the direction perpendicular to that line generates a TV-like raster over the bubble chamber image. An encounter between the moving spot and the image of a bubble produces a momentary decrease in the light detected by the photomultiplier. By keeping count of the number of pulses detected by the grating photomultiplier from the start of the scan to the bubble pulse (extrapolating between grating pulses with the aid of electronic circuitry), the location of the center of the bubble may be measured to an accuracy of about 3 p. The other coordinate of the measured point is derived from the digitized position of the traveling stage at the beginning of the line scan (Fig. 6). The Flying Spot Digitizer (FSD) is tied to a large-scale digital computer by means of a direct data channel, permitting the fully automatic control of the mechanical scanner by the computer, and through which the “coordinate image” of the chamber photograph is entered into the computer memory for on-line processing. As originally conceived, FSD was to be used mainly as a humanassisted automatic measuring device, at least until computers became cheap and fast enough to do the prodigious amount of processing required to sort the interesting events from the noninteracting beam tracks and other miscellany without falling behind the scanner (which can digitize approximately four stereo sets a minute) and without bankrupting the laboratory. Indeed, of the four FSD systems currently being developed, at CERN (tied to an IBM 709), Brookhaven (IBM 7090), Berkeley (IBM 7094), and the Rutherford Laboratory 256
DATA COLLECTION AND REDUCTION
in England (Ferranti ORION), the first three are already producing useful results in the human-assisted mode. When used as a semiautomatic system, FSD requires special scanning tables to which have been appended some device for rough digitizing (to about three bubble diameters, or -100 p on the film) selected points on the event to be measured, and provision for transferring these approximate coordinates to the data tape carrying the indicative information. The procedure in this case is the following. The film is human-scanned on the rough-digitizing table for events of interest. When such an event is discovered, the operator enters all indicative data and then rough digitizes three points on each track to be measured (usually one at each end and one in the middle) and a point on each pertinent fiducial. The digitizing procedure is then repeated for the remaining two views. When fully scanned the film with its accompanying data tape is transferred to the computer-connected FSD for measuring and analysis.
FIG.6. Flying spot digitizer optical system. The schematic diagram illustrates one version of the optical system for the normal film scan and the simultaneous grating scan. The abnormal film scan is provided by a third sweep optical path with reflecting surfaces oriented to produce the required sweep direction. (Courtesy of the University of California Lawrence Radiation Laboratory.)
I n a cooperative effort, groups at Berkeley, Brookhaven, and CERN have developed an FSD programming system that integrates H. White’s FOGCLOUDY geometric and kinematic analysis programs with an FSD control, track filtering, and track reconstruction program 257
HERBERT GELERNTER
called HAZE.With the film and tape mounted, HAZEassumes control, advancing the film until the first event-containing frame is found (with the aid of a coded data box in some systems, and index marks in others). For each of the tracks to be measured, the computer constructs a circular arc through the three rough-digitized points belonging to that track. The arc defines the “road,” a 300-p wide strip centered on the arc, within which the filter program will search for the desired track. The scan is initiated in the direction perpendicular to the beam tracks, with the stage moving in the direction of the beam (the “normal scan”). While coordinates are accumulating under data channel control in one of two 2000-word buffer areas set aside in core storage, the central processing unit is occupied with FOQ-CLOUDY data analysis on an event measured earlier. When the first buffer is filled, the stream of digitizings is diverted to the second buffer, an interrupt is signaled, and HAZEtakes over to perform its primary functions of “gating” and filtering. Gating is the process whereby all coordinates that do not fall within a road or in the immediate neighborhood of a fiducial are rejected, leaving only a few percent of the originally digitized points for further processing. Because the accuracy of the measurements decreases as the scan and track directions depart from the perpendicular, only those points are accepted for which the road is inclined a t least 48” from the scan. If for a given event there are tracks that do not meet the inclination condition, a second, “abnormal scan,” orthogonal to the first, will be initiated after the completion of the normal scan. Accepted points are stored in the appropriate road buffers (since different bubble chambers generally employ differing fiducial systems, special provision must be made for fiducial filtering), and, when a road buffer has accumulated twenty points, control is transferred to the filtering subroutine. HAZEseparates the desired track from the irrelevant points in much the same way that the SMP filtering program performs the same task, although the more controlled conditions under which FSD measurements are made permit the use of a somewhat less sophisticated recognition logic. From the coordinates in the road buffer selected as track points, a single average point is computed (as for SMP), and control is returned to the gating routine until another (or the same) road buffer has accumulated enough points to demand attention. When a typical event topology is being processed on an IBM 7090coupled FSD system, the full 2000-word input buffer will have been filtered in less than one third of the 200 milliseconds required by the mechanical scanner to refill the alternate input buffer. More than two thirds of the on-line FSD control computer’s time is therefore free 258
DATA COLLECTION AND REDUCTION
for the computation of the geometric and kinematic reconstruction of just-measured events. These are treated by subdivisions of FOQand CLOUDYin batches of 15 events in order to minimize the time wasted in program shuffling. Where the full set’of stereo views appear on the same film strip, as is the case for the Alvarez 72-inch chamber, reconstruction of typical events is performed in real time, discounting the delay introduced by the event-batching procedure. At Brookhaven and CERN, where different views of the same event are photographed on different rolls of film, the data tapes generated by the somewhat more complicated HAZEprogram for these installations are ready for final FAIRanalysis in just a few hours. If, in quest of a fully automatic FSD system, one were to eliminate the human scanner with his rough-digitizing apparatus, the online computer, lacking the guidance of event-enclosing roads, will be confronted with the problem of reconstructing all of the tracks in the photograph from the 50,000 or so coordinates extracted from a single view by the mechanical flying spot digitizer. A group at Brookhaven, comprising J. Pasta, R. Marr, and G. Rabinowitz, have written an IBM 7090 program (PMR) that, for bubble chamber photographs acceptable under the normal criteria for nonautomatic processing, is able to perform the required reconstruction in real time, allowing the computer to keep up with the approximately 15,000 coordinates per second poured into it by the digitizer. While the initial culling of noninteracting beam tracks may be done for each view at the completion of the normal scan, the search for interesting events must, of course, await the transformation of the full stereo set into coordinate representation, including, where indicated by the results of the normal scan, the necessary orthogonal scan digitizations. The methods employed by PMR for both track reconstruction and event recognitions are quite straightforward. Once a segment of track has been recognized, the track is followed by a generalization of the technique used by HAZEto filter tracks within roads, with the predicted extension of the track taking the place of the road. Initially, beam tracks are discovered by an exhaustive search of a strip of the photograph near the beam entry window for linear arrays of bubbles with the proper orientation. By the time the scan has reached the interaction volume of the chamber, the beam tracks have already been established, and all incoming data are immediately sorted into track continuations (the bulk of the digitizations), which are stored in their respective “track banks,’’ and non-beam coordinates. The latter points, which include noise as well as the tracks of interaction products, delta rays and other electron spirals, and background particles, are collected from a narrow strip of scans and examined as were the incoming 259
HERBERT GELERNTER
beam tracks for linear arrays constituting track segments. As they are discovered, these non-beam segments are followed along with the beam tracks, initiating new track banks to participate in the preliminary sort of the incoming digitizations. When the scan of the frame has been completed, beam tracks that have traversed the length of the chamber without being deflected are “erased,” leaving only those that have interacted within the chamber. The latter tracks, if any such remain, all terminate a t a vertex, from which the tracks of charged interaction products may emanate (if all of the final state particles are neutral, the vertex will of course have just the single prong of the beam particle). Constrained by the physical dynamics of the projectile-target system, PMR may now search the allowed volume of the chamber for tracks that may be related to the primary interaction, using the full stereo set to make the identification and eliminate accidental vertices. Since track reconstruction consumes the full computing capacity of an IBM 7090coupled FSD operating in the fully automatic mode, event recognition and subsequent analysis require an estimated one to two shifts of additional computing time for each shift of FSD operation. (Since PMR is still in the developmental stage, more precise figures are not yet available.) Somewhat less advanced in its development is a program being written at Berkeley under H. White’s direction. Called DAPR (for Digital Automatic Pattern Recognition), the Berkeley program organizes the procedures for track reconstruction and event search somewhat differently than PMR, resulting in an intermediate configuration compatible with that projected for the PEPR device programs, to be described below. We shall, at that point, return briefly to the subject of DAPR. It is of interest, however, to note that DAPR makes use of list storage and processing techniques to deal with the problems of expressing and manipulating the relationships between event-connected tracks. As we have pointed out, the flying spot digitizer is presently in use aa a semiautomatic measuring device at several installations. But aside from the fact that such use provides gainful employment for some very expensive hardware while awaiting the completion of the program development for the fully automatic system, FSD in its human-assisted mode of operation compares unfavorably with its SMP equivalent in most respects. The cost of a rough-digitizing scanning table for FSD is about the same as that of an SMP unit (-$40,000), but it takes as much time to prepare the roads for an event on the FSD table as it does to complete its measurement on SMP. (While an on-line computer is required for SMP, i t is reasonable to assume 260
DATA COLLECTION AND REDUCTION
that the computing power necessary to control and filter coordinates for an SMP system is about equivalent to that required by the FSD control computer for gating and filtering.) At this point, where the same human and capital cost has been invested in each case, the SMPprocessed event is now ready for analysis, while its FSD counterpart must still submit to precision measuring (with the concomitant handover nuisance) on a machine that requires an additional capital outlay of about $200,000, development costs excluded. But even discounting its advantage in speed and economy, there is reason enough to strongly prefer SMP in the fact that it is a true synergetic man-machine system. SMP makes optimum use of its on-line human, feeding back the results of every phase of the measurement process in real time for immediate action by the operator when necessary, while semiautomatic FSD cannot because the rough-digitizing table is not on-line to the computer. Recent intelligence from the FSD group at Berkeley indicates that their system will soon be significantly improved in many respects. For example, two mechanical-optical flying spot units will be coupled in tandem to a single set of electronics to almost double the digitized throughput at much less than twice the hardware cost. Also, a newly acquired IBM 709411 with 65,000 words of fast access core storage will result in impressively reduced computation costs, due to both the greater economy of the larger, faster computer, and the savings realized by the elimination of tape shuffling between HAZEand the event reconstruction programs. I n addition, improvements in HAZEand its tlssociated road-making procedures are expected to cut the roughdigitizing time per event almost in half, to keep pace with the faster tandem flying spot digitizers (the executive program incorporating HAZEand the event reconstruction programs in the new system is called TRIST). But counterintelligence from the SMP people at Berkeley indicates that they have not been idle. They, too, have been improving procedures and programs, with the result that measuring times on SMP are being speeded up to more or less match the improvement in road-making rates for FSD. It is unlikely, therefore, that the arguments presented above will become invalid in the near future. For those installations expecting to process much more than a half million events each year, however, the worm will turn when fully automatic FSD becomes operational. While the author feels that SMP should remain the preferred system for smaller laboratories, i t seems clear on the basis of conservative preliminary estimates that elimination of the road-making requirement with its attending troop of scanners and special scanning tables will result in substantial savings in time and money where the processing load is sufficient to use efficiently 261
HERBERT GELERNTER
FSD’s great capacity. The only reasonable alternative in the offing for such a laboratory is another fully automatic system called PEPR, which we describe forthwith. 4.4 The Precision Encoding and Pattern Recognition Device (PEPR)
The act of scanning a bubble chamber photograph with the objective of discovering and identifying all interactions of interest requires a good deal of visual hopscotch. Underpopulated areas of the frame will earn just a fleeting glance, while areas containing many crisscrossing tracks will be carefully examined and re-examined as the scanner attempts to correlate kinks, vertices, and tracks into meaningful event topologies. He will switch freely between stereo views, often superimposing two different views of the same event on the projection table to ascertain whether an apparent vertex is really what it purports to be or just an accidental coincidence of two unrelated tracks, or to determine whether a track that appears to stop within the chamber really did so rather than merely leave through the glass window. It is clear, then, that in order to do fully automatic event recognition properly, efficiently, and completely, the computer should have essentially random access to any part of every view of each stereo set. As we have pointed out, all FSD event processing is performed on a digital image of the information stored graphically on the bubble chamber film. But notwithstanding the moderate information compression achieved by FSD in transforming the image to a coordinate representation, the full information content of a stereo set remains too unwieldy to be contained in a contemporary computer in digital form (although continuing progress in the technology of high-speed large-scale random access memories may soon weaken this argument). The PMR and DAPR automatic event recognition programs, the reader will note, do their critical topological analysis on data which have already been filtered, in the true sense of the word. These FSD systems must bear with the inevitable consequences of any filtering operation; a certain amount of wheat will be removed with the chaff. The immediately obvious solution to the problem of random access to the full stereo set, recognizing the fact that the film itself is its own best high-capacity information storage medium, is to attempt to make use of computer-controlled cathode-ray tube techniques. With such a system, information is extracted from the film in essentially the same way that it is sensed by the mechanical flying spot digitizer; a tiny spot of light is focused upon the point of interest, and a photomultiplier tube positioned behind the film collects and measures the 262
DATA COLLECTION AND REDUCTION
amount of light that has passed through, indicating the density of the image at that point. But, unlike FSD, the information-extracting spot, generated by an almost inertialess beam of electrons impinging upon the phosphor-coated face of a cathode-ray tube (CRT), may be directed under computer control and with very great rapidity to any point on the film. It is important to note, too, that the kind of random access to the image provided by a computer-controlled CRT scanner is more suited to the requirements of event recognition than the random access offered by massive addressable core stores, for the adjacency relationship between units of information in the image is preserved in the addressing logic for the scanner, while this is in general not the case for the image that has been digitized and transferred to core storage. (The use of list storage and processing techniques by the DAPR program is in fact intended to partially recover this adjacency relationship in core storage, but a commitment to list techniques invariably exacts its toll in processing time and storage requirements.) As in the mechanical system, the optical path may be split, and the scanning spot imaged on orthogonal gratings as well as on the film. By counting pulses generated in collecting photomultiplier tubes placed behind the x and the y gratings, respectively, the associated circuitry is able to compute the precise location of the spot on the film at all times, allowing the system to have closed-loop servo control of the spot position, (That is, to move the spot from the point with coordinates, say, x = 25, y = 40, to the position x = 32, y = 16, a positive voltage would be applied to the x deflection plates of the CRT and simultaneously a negative voltage to the y plates. The x voltage would be cut off after seven pulses were counted in the photomultiplier associated with the x grating, and the y voltage removed after the y photomultiplier has received twenty-four pulses. Any drift in the position of the spot from its assigned position would be signaled by an unauthorized pulse in the photomultiplier tube associated with the affected coordinate, and a correcting voltage could then be applied to the appropriate deflection plates of the CRT) (Fig. 7). I n 1960, H. Gelernter of IBM Research and, independently, I. Pless of Massachusetts Institute of Technology (M.I.T.) pointed out that CRT systems lent themselves readily to a dual-mode technique of operation. Thus, scanning and event recognition would be performed on a relatively low-resolution (and therefore more quickly processed) bit representation of the film image, while precise measurements would be made in the coordinate representation in much the same way that FSD takes its data, but only on those tracks that have been determined (in the bit mode of operation) to belong to a soughtafter interaction, and only at enough points to reconstruct the track to 263
HERBERT GELERNTER
the desired degree of accuracy. With multiple on-line scanning units, one for each view, any section of any view of the complete stereo set could be instantly examined at the request of the recognition program. The device could thus shuttle back and forth between views just as the human scanner does in order to find immediate and specific answers to immediate and specific questions posed by the recognition decision logic, In the recognition mode, a relatively high error rate (perhaps 0.1%) is tolerable in converting the photograph to its bit image, because of the redundancy in the information necessary to merely recognize an event, This stage of the scanning process could therefore be carried on at extremely high bit transfer rates. In the measuring mode, however, accuracy requirements are much more severe. The scanning beam would therefore be slowed down substantially when reading in the coordinate representation in order to improve the signal-to-noise ratio at the photomultiplier detector. But because only necessary points on selected tracks need be measured, the time consumed by this phme of the operation is quite acceptable.
FIG.7. Grating-controlled cathode ray tube image scanner. Schematic diagram of a film scanning system with feedback controlled spot positioning. The abbreviations CRT and PMT stand for cathode ray tube and photomultiplier tube, respectively. To correct for nonuniformity in light output over the face of the tube (and also for electron beam current fluctuations),all signals are referred to the output of a photomultiplier tube that views the spot without film or grating interposed.
264
DATA COLLECTION AND REDUCTION
To perform the memurement, the spot is accurately servoed along a line of the x (or the y) grating to fix that coordinate. The y (or 2) coordinate is determined by counting pulses and interpolating through the other grating. Late in 1960, two rather similar CRT-based systems were independently proposed by an IBM Research group, under H. Gelernter and L. Kamentsky, and an M.I.T. Lincoln Lab group, under I. Pless. Since the IBM work was abandoned, we shall confine most of the ensuing description to the M.I.T. system, the Precision Encoder and Pattern Recognition (PEPR) device, for which the development effort is approaching fruition. In addition to the M.I.T. group, M. Alston and A. Rosenfeld of Berkeley and H. Taft of Yale have contributed actively to the development of the programming systems for PEPR. I n its initial realization at M.I.T., the PEPR device will be supplied with a single scanning unit, comprising an automatic film transport, and a precision high-resolution cathode-ray tube with its associated electronic and optical system. The scanning unit is connected to an on-line PDP-1 computer through the intermediary of a “controller,” which interprets the computer’s commands for the direction of the scanner, and processes the scanner’s output for transmission to the oomputer. In addition to the usual electron optics, the CRT is supplied with a special diquadrupole focusing magnet. By adjusting the currents in this magnet, the spot on the face of the CRT may be “stretched” into a short line segment with any angular orientation. PEPR is therefore able to scan the film with an orientable line segment, as well m with a spot, which is a substantial advantage in searching for track segments against the usual bubble chamber background. It is, in fact, chiefly in this feature that PEPR differs from the proposed IBM device, which was, instead, supplied with a special digital matrix to perform that same function, m well aa some others. It is interesting to note, however, that in recent years, CRT scanner technology has so much improved, and the cost of computing power has so much decreased when compared with the cost of increased hardware complexity, that in its latest realization, the grating-based, closed-loop feedback control for precision encoding is eliminated from the PEPR system. Instead, the CRT and its deflection system are carefully and regularly calibrated against a standard grid to determine a set of distortion parameters. These are introduced into a correction function with which all precision measurements are adjusted. Current plans are to design the PEPR control programs so that the device may be used as a fully automatic “human-free” system, as a “partially automatic” system, still human-free, but one that speeds up the operation considerably by making certain compromises 265
HERBERT GELERNTER
in the initial search for events at the risk of losing all events of a particular rather limited class, and, finally, as a human-assisted semiautomatic system in the FSD vein, but a good deal more sophisticated because of PEPR’s random access capability. In fully automatic operation, the PEPR system follows a threephase procedure. The initial step is a low-resolution allover area scan designed to separate the particle tracks from the background. Taking advantage of PEPR’s special segment-scanning facility, the photograph is transmitted to the computer not as a bit image, but rather in a “cell-segment” representation, as it were. Briefly, the frame is divided into strips 2 mm wide, each of which is in turn divided along its length into square cells 2 mm on a side. Every strip is scanned by the “flying line segment” ninety times, once for each orientation of the segment from - 45 O to + 46” with respect to the vertical, at 1 intervals. In each scan, the length of the segment is adjusted to span the 2-mm width of the strip (dimensions are stated in the scale of the film image) (Fig. 8 ). The presence of a segment of particle track passing through the strip is of course detected by the sharp dip in light passing through the film to the photomultiplier when the flying line scans through the track at the same angle as that taken by the track. Each time such a track element is discovered, its existence and angle are O
FIU.8 PEPR scan pattern for cell-segmentrepresentation of bubble chamber image. Ertah two millimeter wide strip is scanned ninety times, once for each orientation of the “flying line segment” from -46” to +4ao at intervals of 1”. A “hit” is tallied for a cell when it falls within the limits indicated in the diagram. (Courtesy of the Massachusetts Institute of Technology Laboratory for Nuclear Science PEPR group.)
266
DATA COLLECTION AND REDUCTION
tallied for the cell in which it was found. When the entire frame has thus been covered, the process is repeated at right angles to the original scan, so that each cell will have been explored at every angle, At the completion of the area scan, related track elements may be easily identified by angle and cell location for the second phase of the process, track following. Here, connecting elements are collected together into tracks, and connecting tracks are collected into vertices. Noninteracting beam tracks are discarded, and an event topology recognition logic is applied to all that remains. I n a multiple scanner system where all views are on-line simultaneously, many false events could be identified and discarded with the beam tracks at this point. PEPR then shifts from the low-resolution Pattern Recognition to the high-resolution Precision Encoding mode of operation (the third phase), and all remaining tracks of interest are measured at a sufficient number of points with the slow spot scan, as described earlier. Where ionization measurements will assist in the identification of an event, the spot may be used to scan a closely spaced raster along the track so that a bubble count may be computed. From this point on, the usual event reconstruction and analysis programs take over. Because preliminary estimates of the event-processing rates for the M.I.T. PEPR configuration in the fully automatic mode proved to be overly optimistic, it is likely that this first system will be used in the “partially automatic” mode where the nature of the experiment permits, and as a semiautomatic human-assisted device for most others. In the partially automatic mode of operation, PEPR is, so to speak, blinded to the beam tracks by eliminating from the area scan those sweeps of the flying line where the line orientation would be the same as the direction of the beam. Since the average bubble chamber photograph contains mostly beam tracks, only a small fraction of the information that would be picked up by a full scan is thus transmitted to the computer. These track elements are followed, and their endpoints are examined for connecting tracks (interaction vertices) at all angles. In this way, the beam track that initiated the interaction will be picked up and followed, providing that the event is such that the primary interaction between the beam and target particles produces at least one charged product (i.e., the beam track vertex has a visible outgoing track). Where all of the products are neutral, i t is still sometimes possible to deduce the location of the beam vertex by balancing momenta when the neutrals decay into only two particles. But where three-body decays may occur, an area scan is necessary to find the beam vertex. When converted into a semiautomatic human-assisted measuring device, PEPR, like FSD, suffers a disadvantage when compared with 267
HERBERT GELERNTER
S M P in that the human scanner is not on-line to the system when PEPR is searching out and measuring the selected events. But, unlike FSD-HAZE,PEPR will require a bare minimum of information concerning each event to be measured, so that the human-assisted phase of the operation is expected to be considerably less costly and time consuming than the equivalent operation for FSD, and less likely to introduce human errors, as well. The scanner, working at an ordinary scanning table, would merely indicate for the selected events the topology of each vertex and its approximate location, by zone, in each view. (In fact, for a PEPR system with multiple CRT units, allowing all views to be on-line simultaneously, the vertex locations need be supplied for only one view, assigning the relatively simple task of locating them in the other views to a recognition program.) With all vertices located and identified beforehand, only a small fraction of the full frame will require area scanning, and only those tracks that are to be measured need be followed. Initial estimates of the cost per event for PEPR processing in the semiautomatic mode are very low; lower, in fact, than the projected costs for any other system of bubble chamber data analysis now in operation or under development, including fully automatic PEPR and FSD. Past experience, however, forces the prudent to regard such estimates as overly optimistic pending verification. I n the final analysis, cost comparisons among different systems and different modes of operation of the same system are too strongly dependent upon the relative costs of computing power, hardware construction, and the labor of graduate students and technicians to be safely predictable. A year or so from now, when fully automatic FSD and fully automatic PEPR both go into production processing, it seems probable that PEPR will process a million events each year more cheaply than FSD, for PEPR uses cleverly designed hardware to do quickly and easily an operation that requires a good deal of expensive chomping and grinding on the part of FSD’s computer. But there is no reason to believe that the rather precipitous downward trend in the cost of computation will not continue in that direation, as it has since the beginning of digital computing, and, this being the cam, it seems equally likely that the economic tables will be turned soon afterward to favor FSD. That is assuming, as one dwen’t do, that all other factors remain constant. 4.5 Beyond PEPR
In concluding this discussion of bubble chamber data-analysis systems, we must mention that Bruce McCormick, who recorded the first words in behalf of fully automatic bubble-chamber data processing, 268
DATA COLLECTION AND REDUCTION
may yet have the laat word up his sleeve. At the Digital Computer Laboratory of the University of Illinois, McCormick is directing the exploratory development of a unique pattern recognition computer to be coupled to the ILLIAC 111. This work is founded upon the premise that, to achieve any kind of processing efficiency, the special two-dimensional nature of the pattern recognition operation requires special two-dimensional logic realized in array-oriented, rather than word-oriented, hardware. IBM’s proposed CRT-based system mentioned above included such an array processor, but a far less elaborate one than that being considered at the University of Illinois. The Illinois array processor provides for the simultaneous logical processing of the information contained in a 32 by 32 bit matrix. Each matrix position is supplied with its own 8-bit special storage register to hold the intermediate results for a number of special two-dimensional logical operations that may be performed on the contents of the array. These “site registers” are themselves wired for parallel logical processing of their contents. McCormick’s array processor is a substantial and expensive piece of hardware, containing an estimated 60,000 transistors and 160,000 diodes. And while it is a reasonably safe bet that it will perform the pattern recognition phase of the bubble chamber data-analysis process more quickly than either FSD or PEPR, it is not at all certain that it will perform it aa cheaply or as efficiently as a multiple installation of one or both of the latter systems with the same capacity as the Illinois system. On the other hand, one can easily imagine problems in pattern recognition beyond the pale of nuclear physics for which the array processor might be uniquely suited-the automatic processing of aerial photographs, for example. The ultimate value of McCormick’s device has yet to be established for any application, but the reader is advised to remain alert to the possibility of spectacular results from this rather bold approach to the problem of pattern recognition by computers. 4.6 Bubble Chamber Event Libraries
Having established the fact that several millions of fully analyzed bubble chamber interactions a year, while difficult enough to come by, will nevertheless soon be streaming forth from the world’s high energy physics laboratories, we cannot fail to touch upon the inevitable consequence of such bounty. Yearly bumper crops of data will have to be stored away so that that which is significant among them may be later retrieved, whether in pursuit of the aims for which the data were originally gathered, or in order to examine some physical process 269
HERBERT GELERNTER
completely alien to the one that the bubble chamber was set up to investigate. Many of the recently discovered particle-resonance states were in fact unearthed from old bubble chamber film, sometimes spanning several different experiments. The problem of information storage and retrieval is not much different for bubble chambers from that of any other informationsurfeited discipline. It is being attacked by conventional techniques, to which it will undoubtedly succumb. For the most part, these rely upon high-resolution photographic film as the high-density mass storage medium from which the information is extracted by a computercontrolled scanning spot. It is expected that a complete description of an average event will be compressible into a record of standard format containing perhaps 60,000 bits, and that the latter will become the conventional output for all bubble chamber data-analysis systems. One can well imagine (and if one is so inclined, deplore, or a t least view with alarm) a future for bubble chamber physics where events are mass produced and analyzed as part of the same technical service facility that runs the accelerator at each installation, and where the physicist performs his experiments by selection from a vast interlaboratory event library. But one can also imagine the reaction at, let us say, Berkeley to an announcement by, let us say, Brookhaven of an important new result extracted from Berkeley-generated data! On balance, the author would choose not to hold his breath in anticipation of a Brave New World of bubble chamber physics. 5. Spark Chambers
Bruce Cork, one of the first to demonstrate the usefulness of a spark chamber in a successful experiment, remarked in reporting the results of that experiment that the reason for his great interest in spark chambers was that they are such a perfect match to a high energy accelerator. He might have added as well that spark chambers are an equally good match to a high speed computer. It is largely this fortuitous combination of properties that accounts for the present intense activity in spark chamber development at almost every installation where high energy physics is practiced. The genealogy of the spark chamber is as noble as that of the bubble chamber, with which it is undoubtedly destined to share the highest favor among accelerator physicists. If the latter can boast of its descent from the Wilson cloud chamber, the former may point out the parallel plate Geiger-Mtiller counter as its progenitor. Indeed, a spark chamber is little more than a stack of such parallel plate counters operated under suitable conditions (i.e., in the GM discharge region). 270
DATA COLLECTION AND REDUCTION
In 1949, J. Keuffel of California Institute of Technology published the observation that the spark-like discharge initiated between the planes of a parallel plate counter by a fast ionizing particle seemed to occur at just the location where the particle traversed the plates, and that the counter could consequently be used as a track-locating device. Apparently, some six years passed before anyone thought to make use of an ensemble of such counters to delineate the full trajectory of a particle. P. Henning, a graduate student at Hamburg University, seems to have been the first to take stereo photographs of a multiple counter assembly in order to reconstruct the path of a high energy particle in space. The discovery in 1959, by S. Fukui and S. Miyamoto of Osaka University, that a multiplate spark chamber filled with a noble gas could be operated to record the simultaneous passage of several particles, marks the beginning of the current widespread interest in this device as a nuclear particle trace detector. I n its simplest form (which is not very different from its most complicated form), a spark chamber consists of a stack of two or more parallel conducting plates assembled so that adjacent plates are insulated from one another and with a suitable gas filling. The plates may be as small as a few centimeters on each side, or large enough to have an area of several square meters. They may be thin (1 mil aluminum foil stretched between plastic frames) or thick (2.5-cm thick aluminum plates were used for the Brookhaven neutrino chamber). In addition to aluminum, chambers have been fabricated of iron, copper, and glass with an evaporated conducting layer, to list just a representative selection. Any parallel (i.e., everywhere equidistant) plate geometry could probably be made to work; concentric cylinder configurations have been successfully operated at Berkeley and at Dubna, U.S.S.R. The basic electronics necessary for spark chamber operation consists of circuitry for applying a short high-voltage pulse between adjacent plates of the assembly immediately following the passage of a selected particle through the chamber, and an independent circuit for applying a continuous d.c. clearing field voltage to the plates (Fig. 9). A typical system will supply a 12-kilovolt pulse of perhaps 0.1 pmicrosecond duration within 0.5 microsecond or so of the instant the desired event has occurred within the chamber. The intensity of the clearing field will generally be of the order of 100 volts/cm, the exact choice depending upon the nature of the gas filling and the time resolution and efficiency requirements for the chamber. Presently, argon, neon, and helium appear to be the most widely used gas fillings for spark chambers. They have been used in pure form, mixed with one another in various proportions, and occasionally 271
HERBERT GELERNTER
mixed with other gases. For some applications, air is a satisfactory and always available filling. By varying the filling, one can adjust the efficiency, sensitivity, and time resolution of the chamber. Alcohol vapor is sometimes added to the gaa to improve certain chamber characteristics. Acting aa a chemical quenching agent in much the same way as it does in a chemically quenched GM counter, the alcohol sharpens the sensitive interval, and tends to decrease the dead time of the spark chamber. Unfortunately, the action of the continual electrical discharge through the alcohol vapor, or, for that matter, through any organic impurity in the filling, tends to poison the filling, which in turn causes the chamber parameters to change with use. Where such drift in chamber characteristics cannot be tolerated, the spark chamber is generally operated with a continuous flow of gas between the plates.
Fro. 9. Schematic of &gap spark ohamber. The chamber is discharged by supplying a high voltage pulse through a fast, high current switch (generally,a triggered spark gap or a thyratron tube). The switch is closed by a pulse from the chamber triggering system.
Let us now consider our typical system in operation. In the absence of ionizing radiation, the high-voltage pulse applied across the plates of the chamber will be inadequate to break down the gas between the plates, and no discharge will occur. Should a high energy particle traverse the chamber, however, the filling gas will be ionized along the path of the particle, and, if the high-voltage pulse is applied before the ions and free electrons have a chance to recombine or to diffuse to the plates under the influence of the clearing field, a local self-sustaining discharge (a spark) will be produced where the initiating ions were concentrated. The discharge will be quenched by the electronic circuitry (and the chemical quenching agent if one is present) within a fraction of a microsecond. For some time after the chamber has been discharged (generally of the order of 10 milliseconds), the region between the plates will be so contaminated with ions resulting from the discharge that a reapplication of the high-voltage pulse will produce spurious sparks. The time required for the clearing 272
DATA COLLECTION AND REDUCTION
field to sweep the chamber free of these left over ions is called the dead time (or recovery time) of the chamber. The time interval following the passage of a particle during which the chamber will discharge along the ion trace with near 100% efficiency is called the resolution time of the chamber. A desirable spark chamber will generally have a resolution time of about 0.5 microsecond, after which its efficiency will drop very sharply to near zero, so that particles that traversed the chamber more than 0.5 microsecond before the high voltage was applied will leave no trace in the chamber. (For this reason, the term memory time is sometimes used to denote the interval during which the chamber will respond to a particle that has traversed it.) The dead time of our chamber should not exceed the 10 milliseconds mentioned above, and if possible be considerably less. The space resolution of the chamber (the accuracy with which one can locate the trajectory of the particle from the position of the spark) depends upon many factors, but mainly upon the thickness of the spark, the distance between plates, and the angle the trajectory makes with the plates. The thickness of a given spark depends in turn upon the nature of the gas filling, the energy in the discharge, and, when several particles are simultaneously detected, the way the available energy divides among the several sparks. The best reported resolution for a nearperpendicular track is about 0.2 mm in a linear coordinate. Since under usual operating conditions, the spark does not follow an inclined track (although chambers may be made to operate so that they do), but rather jumps perpendicularly between the plates at some difficult to determine point along the trajectory between the plates, space resolution on the average is somewhat poorer than the best figure quoted above. The set of characteristics listed above sums up to Cork’s epithet, “a perfect match to the high energy accelerator,” for they make it possible to use the extremely expensive output of an accelerator much more efficiently than heretofore possible for large and important classes of particle physics experiments, and in fact open up many new areas to investigation. On one hand, the spark chamber’s memory time is long enough to permit event-triggered operation, where the chamber is discharged only after it has been determined that i t contains the ionic trace of a sought-after event. Thus, configurations of high-speed scintillation counters may be arrayed fore and aft of the spark chamber, and even interspersed between groups of plates within the chamber. The output of such a path-delimiting array, processed by suitable coincidence and anticoincidence logic, will trigger the discharge only if the interaction that has occurred possesses a given, predetermined topology. Cherenkov counters may 273
HERBERT GELERNTER
also be introduced into the triggering system to permit momentum selection as well as choice of interaction topology. On the other hand, the memory time is short enough so that any and all particles that may have traversed the chamber earlier than 0.5 microsecond before discharge time will leave no trace of their prior occupancy. It is possible, therefore, to pass a relatively intense particle flux (-lo6 particles/second) through the system in quest of rare events occurring with low cross section, triggering the spark chamber only when the “signature” of the desired interaction has been detected. When the chamber is discharged, the probability is high that it will display only the triggering event with few, if any, background tracks to confuse the picture (Fig. 10). (This situation is to be contrasted with the bubble chamber case, where interesting events must be searched for, and extracted from a rather cluttered background of noninteracting beam tracks and other stray particles collected during the relatively long sensitive time of the bubble chamber.) Coupled with such excellent selectivity, the short recovery time (one hundredth that of a bubble chamber) makes it possible to collect vast amounts of data at staggering rates. It is easy to conceive of a spark chamber experiment that will generate lo5 to loe events to be analyzed .in a day. The data reduction and management problem is compounded by the fact that spark chambers are relatively cheap and easy to build, and have been wildly proliferating in recent years. Before directing our attention to the relief of the spark chamber data glut, let us complete our discourse on the spark chamber itself. The latter is by no means the quintessence of the ideal nuclear particle trace detector. Chief among the weaknesses of the spark chamber a t its task are the limited precision of trace reconstruction attainable with a system where the track element is a spark discharge between two planes, and the fact that the mass of the chamber is so heavily concentrated within the plates that the point of origin of a collision interaction occurring within the volume of the chamber can be determined only by extrapolation. In consequence, the precision with which a given event may be geometrically and kinematically reconstructed is limited to an order of magnitude less than that easily achieved with a bubble chamber. The greatly improved statistics deriving from the spark chamber’s superior event-collection rates, however, goes a long way toward mitigating the disadvantage of poorer spatial resolution in many of its applications, and spark chambers do in fact make possible whole new classes of particle physics experiments that would be difficult or impossible to perform by any other technique. It would not be unrealistic to think of the device as a very high-resolution threedimensional close-packed array of very fast counters. From this point 274
DATA COLLECTION AND REDUCTION
FIU.10. Spark chamber in magnetic field. Photograph of an associated production event in the large magnetic field spark chamber operated at the CERN proton synchrotron by G. Burleson, T.Hoang, P. Kalmus, R. Kuskowski, L. Niemela, A. Roberts, T. Romanowski, S. Warshaw, and G. Yurka, of the Argonne National Laboratory. Distortion of the image introduced by the chamber’s optical system is corrected by computation during the geometrical reconstruction process. (Courtesy of Dr. A. Roberts.)
275
HERBERT GELERNTER
of view, a spark chamber is no longer a poor man’s productive but blurry bubble chamber, but rather a significant improvement in highspeed particle counter technology. Spark chambers can be, and occasionally have been, operated in magnetic fields to permit the determination of the sign and momentum of the detected particles, but this is not the usual mode of use. More often, the latter information is acquired by interposing the magnetic field in the beam upstream or (for the outgoing particles) downstream from the point of interaction, with spark chambers fore and aft of the field so that the effect of the field on the particle may be measured. The fact that the conducting plates of the chamber may be fabricated of materials ranging widely in density, and indeed that plates of materials of almost any density may be stacked alternately with single-gap chambers makes range measurement a most convenient method for energy determination. Interaction target materials may of course be introduced anywhere in the spark chamber array, if the plates themselves are .not to serve as targets. I n this event, the disposition of the triggering counters will insure that the interaction has in fact occurred in the target, and not elsewhere in the assembly. A representative configuration, then, might include (in order) an entrance beam-defining chamber, magnetic field, exit beam-defining chamber, target (liquid hydrogen, for example), large narrow-gap thin plate chamber (for precise geometrio reoonstruction of short-lived decays), large thicker plate chamber (to stop charged interaction products for range measurements), and an alternating sequence of single-gap ohambers and lead plates (toconvert and identify gammas rtssociated with the interaction). I n all likelihood, the input particle beam will have had its composition roughly particle-preselected and momentum analyzed. The spark triggering oounters will be interspersed throughout the chamber assembly (Fig. 11). mQYl
14-
FIG.11. Representative spark chamber configuration. The representative configuration described in the text. The triggering scheme illustrated will display the decay of neutral particles produced in the target ae a result of an interaction with e becun partiole.
276
DATA COLLECTION AND REDUCTION
A final word is here in order concerning the comparison of the virtues and demerits of the spark chamber with those of the bubble chamber. Clearly, neither is the universal nuclear particle trace detector. The triggerability of a spark chamber is an immense boon in making possible the selection of a particular rarely occurring event from an overwhelming background of unwanted interactions. But in order to partake of the benefits of triggerability it is necessary to know just what one is looking for, and the order of magnitude of superiority in measurement precision possible with a bubble chamber makes it far more suitable for the latter kind of determination. Triggered detectors, too, can introduce the biases of their selection systems, always a potentially serious pitfall in high statistics experiments (of course bubble chamber scanners have their biases as well). If one were forced to test the proverbial limb, a reasonable guess would be that bubble chambers will remain the most useful detector for the discovery of new phenomena, particles, and resonances, and to establish the existence of, and rough parameters of, their interactions and decay modes. Spark chambers, on the other hand, will probably become the preferred device for sharpening those parameters where the particles or their interactions occur too infrequently for bubble chambers to supply adequate samples, or for the measurement of those quantities that by their very nature require large samples of the interaction (branching ratios or angular distributions, for example). The potential importance to fundamental physics of either device is equal to that of the other. 6. The Data Problem for Spark Chambers
Adopting, for the moment, the point of view that a spark chamber is egsentially a very high-resolution three-dimensional close-packed array of fast counters, we are faced with the problem of determining and recording which of the array of counters has fired at a given time, and reconstructing the physical phenomenon that caused that particular configuration of discharges. We remarked earlier that the spark chamber was as good a match to the high-speed digital computer as it was to the accelerator. It is, in fact, extremely well suited to on-line real-time use as well. Let us examine now the reasons for our sanguineness. The information content (in the information-theoretic sense) of a single spark chamber discharge is orders of magnitude less than that in a roughly equivalent bubble chamber picture. Consider, for example, our representative configuration of Fig. 11. The assembly comprises about 50 gaps, each of which is, let us say, 0.5 meter on a side. Assuming the ultimate in measurement precision, the position 277
HERBERT GELERNTER
of each spark must be determined to 0.2 mm in 60 om, or one part in 2600. A remonable estimate of the average number of sparks occurring in a typical discharge may be taken to be perhaps 6/gap. (The far downstream gaps may well exceed this number, but the forward, beam-defining chambers will usually display only one spark per gap, tracing the path of the beam particle that initiated the event.) To specify the z-z coordinates of a single spark, then, 6 bits will determine which of the 60 gaps contains the spark, and 12 bits will determine the coordinate along the gap. One may choose to add 3 more bits to specify an additional useful quantity, the intensity of the spark (allowing for eight distinguishable levels). Intensity information can be most helpful when ambiguities in the correlation of spark images in different views must be resolved. A single view of the chamber then, may be specified by about 260 spark coordinates of about 24 bits/coordinate (allowing a few extra bits/coordinate for housekeeping information), or about 6000 bits/view. This figure is to be compared with our estimate of -lo6 bitslview for bubble chamber photographs in the economical coordinate representation. It should be added that our “representative” configuration is rather more complicated than most spark chamber experiments reported to date, so that the 6000 bitslview figure is likely to be closer to the maximum than to an average for an experiment. The relatively small quantity of “raw” data generated by a spark chamber event contributes substantially, but not exclusively, to the ease of data reduction. Other positive factors are the following: (a) The event displayed will have been preselected by a triggering system, so that its topology will be confined to a rather narrowly circumscribed class. (b) Noise and extraneous tracks will be almost completely absent, (0) Because the same data point (the spark) is measured in all views, stereo reconstruction is rendered almost completely trivial. (Stereo reconstruction from bubble chamber photographs is seriously complicated by the fact that the same bubble is, in general, not measured in each view.) (d) Since the primary data are of limited precision, fewer significant places must be carried through the computations (a weakness, aa we have noted, of the spark chamber as a detector, but, one will admit, a convenience in data processing). In consequence of factors (a) and (b) above, spark chamber output is in effect already prescanned for interesting events in the same sense that bubble chamber film is prescanned for measurement by FSD. These oomputerphilic properties of spark chambers did not go unnoticed by the physicists using them. A t the Argonne Symposium on Nuclear Spark Chambers in 1961, the first conference devoted exclusively to that instrument, L. Koester and A. Roberts both pointed 278
DATA COLLECTION AND REDUCTION
out that spark chamber film was ideally suited for automatic scanning and processing as currently practiced with bubble chamber film. Several groups began developing such systems, with the result that a number of automatic spark chamber film scanning and processing systems are successfully being used today. They do not differ greatly from the bubble chamber systems already described in detail above, and we shall not consider them further. Of far greater interest to us here is the fact that the information in a discharged spark chamber may be extracted without recourse to photographic film as an intermediary, making it possible for the physicist to have immediate access to data as his experiment proceeds. It is with such filmless spark chamber systems that we shall concern ourselves in what follows. 7. Filmleos Operation of Spark Chambers 7.1 Vidicon Spark Chamber Systems
The advantages of dispensing with film as an intermediary information storage and transfer medium are easy to see, at least for that very important category of spark chamber experiment where the physicist expects to collect perhaps 106 or lo7 three-view pictures of the triggering event. Under such conditions, film and processing costs, bookkeeping procedures, and storage space requirements become serious problems. But by far the most important advantage lies in the elimination (for on-line computer-coupled systems) or drastic reduction (where magnetic tape is introduced) of the time lag between the generation and the analysis of the data. In many cases, immediate access to the data is considered important enough alone to justify the use of an on-line system, even when the anticipated total sample is small. Since magnetic tape may be erased and reused immediately following the analysis of its contents, the advantages of cost and space reduction are, of course, not lost with tape systems. It should be emphasized that i t is quite possible to store all of the information generated by the spark discharge in digital form, including an intensity figure for each spark if the information transducer used in place of the film camera has been selected for that property. When a real-time directly coupled system is in use, a daily magnetic tape of the raw data may be produced as well. Should the on-line analysis program experience difficulty with a given event, that event could easily be visually reconstructed later on a cathode-ray tube display device from the taped data for further consideration by the physicist at his convenience. Such doubtful events could lie compactly assembled on a special tape, the daily tapes being returned to service. 279
HERBERT GELERNTER
Elimination of the photographic film was first proposed by H. Gelernter at CERN, who in 1961 demonstrated the feasibility of a filmless spark chamber system using a television vidicon-type camera tube as the information transducer. The output of the vidicon could be immediately digitized for input to a computer or for storage on magnetic tape. A vidicon camera tube was selected in preference to the more highly developed television image orthicon for several reasons, Both are storage-type tubes, in which an optical image is converted into a distribution of charges on a target plate, and subsequently read by scanning with an electron beam. Charge images may be preserved for a significant fraction of a second in such tubes without serious degradation, In the orthicon, the image is produced by photoemission; in the vidicon, by photoconduction. Vidicons are considerably simpler and more stable in operation than image orthicons. They are, consequently, much less expensive, and require simpler operating circuits. For spark chamber data collection, vidicons possess a further critical advantage. The formation of the image charge distribution in a vidicon is a passive process; electrons leak through a charged photoconducting target where it has been exposed to light. Image formation is therefore unaffected by the intense electrical noise produced by the discharging spark chamber. The sensitive process of converting the image into a time-varying current by scanning with an electron beam can wait for complete quiescence in the high-powered transmitter that a spark chamber, in fact, is. The image orthicon, on the other hand, includes an electron image intensification section before the target, which can be highly susceptible to the electrical noise generated by the spark discharge. Experimenters who have used vidicons for spark chamber data collection report no great difficulty in adequately shielding the tube’s electron optics from the stray magnetic fields that invariably inhabit the neighborhood of a particle accelerator. In considering the performance requirements for an on-line vidicon spark chamber system, let us assume that our previously described “representative configuration” is to be used to perform an experiment at the CERN proton synchrotron in Geneva. This machine can deliver a 100-millisecond long burst of very energetic particles every 3 seconds. The spark chamber, the reader will recall, has a dead time of about 10 milliseconds after each discharge. Where the interaction cross section of the triggering event is not too low, the data collection rate will be close to ten events per burst every 3 seconds, or, at twothirds efficiency, about 2 x lo6 events each 24-hour running day. The cross section for an event topology as complex as that considered in our example is likely to be rather small, so that our expected data 280
DATA COLLECTION AND REDUCTION
rate might be perhaps half the maximum.a The prospect of 106 analyzed events/day, however, is enough to bring roses to the cheeks of the most hardened spark chamber physicist. Although we are assuming that the average delay between triggers during the burst will approximate 20 milliseconds in our experiment, the delay between individual occurrences will of course fluctuate, and in general we should like our system to be ready to accept the next event as soon as the detector has recovered its sensitivity. During the available 10 milliseconds, then, we must allow a short time at the beginning (-1 millisecond) for electrical noise transients to decay, and substantial time at the end (perhaps 4-6 milliseconds) to erase fully the afterimage from the vidicon target by flooding it with electrons. Four milliseconds or so remain sandwiched in between for scmning useful information from the tube, and during this time the electron beam must sweep the target at least once for each gap to be read by that tube. Standard U.S. commercial video practice allows 62.6 microseconds, including flyback, for each sweep of the image. While this scanning rate can be increased without too much difficulty, if we choose to adopt it for convenience, each vidicon in our system will be able to perform eighty to one hundred information scans during the allotted time. Our experiment could be conveniently set up with five vidicon cameraa in the system, one for each of the beam-defining chamber aasemblies (with mirrors arranged to allow one camera to scan both stereo views), and three to pick up three different views of the postinteraction assembly (since multiple tracks are expected here, a redundant view is included to facilitate spark correlation). No camera in the system is required to read more than forty gaps, and so we may conveniently design our circuitry to scan each gap twice and our programs to average the results for increased precision. We have not yet, the reader will notice, considered the question of resolution in specifying vidicon requirements. The resolving power of an optical system, whether for photons or electrons, is a welldefined concept. It refers to the ability of the system to form two separate and distinguishable images of two separate objects. I n requiring a measurement precision of one part in 2600 along a 60-cm gap, we do not demand that two sparks separated by 0.2 mm be distinguished. Indeed, since the width of an average spark is several SBecauee the dead time of the chamber introduces a constant delay, the data rate for an event of fairly low probability will not differ greatly from that for a high probability event, unkess the compound event moss seation is so low that the mean interaction time will be long compared with the ohamber recovery time for the available end permiwible berun particle flux ratee.
281
HERBERT GELERNTER
times that figure, they would not be resolved by the chamber itself. What we do demand is that the tube be sufficiently precise and stable in its operation so that the center of a fairly broad spark image may be determined to one part in 2600. Standard commercial off-theshelf 1-inch diameter vidicon tubes with advertized resolution of only a few hundred optical line pairs have been successfully used to achieve a measurement precision of one part in 2000. Such a tube will of course produce an output signal broader than the line being scanned, but, because the spark image is an intense clean signal well out of the noise, its center will accurately represent the center of the spark. Since we seek measurement precision somewhat greater than the best yet reported for standard I-inch vidicons, we would probably have to make use of a currently available 1.6-inch high-resolution vidicon to meet the requirements of our illustrative experiment. Although no measurements on the latter tube have been published at the time of writing, it reasonable to expect perhaps twice the precision from the special purpose vidicon. In all of the successfully operating vidicon systems to date, digitization of the spark image position has been accomplished by accurately counting off the time interval between the signal from a fiducial marker and that from the spark image. This method of digitization is limited in accuracy by the degree to which the vidicon’s high-voltage power supply may be precisely regulated, and the degree to which the sweep ramp voltage may be made precisely linear. To realize the ultimate measurement precision possible with the highestresolution vidicons, a grating system similar t o that used by FSD and PEPR for bubble chamber film may be devised. Since one cannot use the vidicon electron beam to scan the image and supply grating signals simultaneously, a slaved high-resolution cathode-ray tube flying spot scanner, driven by the same deflection voltage that sweeps the vidicon image, may be used to produce the required digitization timing pulses. The CRT spot is imaged on a grating, and a photomultiplier positioned behind the grating is used to supply the timing pulses (Fig. 12). As often as necessary, between accelerator bursts, the vidicons may be calibrated against the scanner to correct for drift in the vidicon and CRT characteristics. Calibration may easily be accomplished while the spark chamber is dark by illuminating a calibration grating properly positioned behind a half-silvered mirror whenever necessary. Each vidicon in such a system may be separately calibrated over its entire face with respect to the flying spot scanner. The availability of the slaved spot on the CRT offers an additional dividend. By splitting the image of the spot, it may be used in conjunction with a mask supplied for the particular experiment in progress 282
DATA COLLECTION AND REDUCTION
FIQ.12. Vidicon spark chamber digitizing and gap finding system. As described in the text, locking together the vidicon and the CRT sweep makes it a simple matter to control the vidicon sweep pattern, aa well aa increasing the sccuracy of spark chamber digitization.
to guide the vidicon sweep to follow any arbitrary (but parallel) spark chamber gap configuration. The mask is prepared so that each gap to be scanned is represented to scale by a clear stripe on a transparent base; the remainder of the mask is opaque (Fig. 13). A photo-multiplier 0€AM PARTICLE LOCATION SECTION
9
D QAP SECTICU SOAP SCCTIOW
FIG.13. Gap-finder mask. The mask (b) corresponds to the spark chamber configuration (a).
behind the mask will indicate whether the electron beam is scanning within the gap. I n operation, the vertical deflection voltage would be applied until the “gap-finder” photomultiplier indicates that the beam is positioned in the first gap. After the beam has been swept the length of the gap, the vertical deflection voltage is switched on again until the photomultiplier signal indicates that the beam has 283
HERBERT GELERNTER
found the next gap in the chamber. Lacking such a gap finder, existing vidicon spark chamber systems have made use of electronic voltage dividers to step the beam from gap to gap, or else merely scan the entire image with a standard TV raster, discarding those sweeps that convey no information. Returning now to our representative experiment, a buffer of approximately 20,000-bit capacity will be necessary to assemble the information transmitted by the vidicons during the 4 milliseconds of scanning time (-1000 24-bit words with 4 microsecond cycle time will do nicely). For on-line analysis, the 1000 words of raw data must be transferred to the high-speed processor during the 6 milliseconds available before the buffer must again be ready to accept a possible next event. We therefore require a high, but not unreasonable, data transmission channel rate. Following each 100-millisecond accelerator burst, the raw data for an average of five events will have accumulated in the computer. These will have to be reconstructed and analyzed during the 3-second interval to the next burst, allowing about 0.6 second for each event. Considering the fact that all events have been trigger-selected, the ease of stereo reconstruction, and the order-ofmagnitude lower precision of the calculation, existing high-speed computers should have no difficulty completing all necessary computation in the allotted time, based upon present experience in processing roughly comparable bubble chamber data. As we have remarked, the analysis of especially difficult and time-consuming evente may be deferred until some later time by storing the raw data on a “special events” tape. Where a high-speed computer is not available for on-line analysis, the raw data may easily be transferred to magnetic tape between accelerator bursts. I n this case, however, a tape buffer with sufficient capacity to contain a whole burst’s worth of events must replace the smaller direct-coupling buffer. The core storage of a small, general purpose computer will serve that end, providing that the store is sufficiently large and fast, and it will control the tape-writing process as well. But most important, the small computer may be used to analyze on-line a running sample of the data being committed to tape. Even if the computer’s power is such that it can completely process only one of each thousand events, say, that pass through it en route to the tape, it can still keep the physicist fully informed MI to the progress of his experiment and greatly decrease his reaction time to an unexpected occurrence. For many kinds of experiment, this mode of operation may in fact be preferred to the full real-time analysis of all the data. In both cases, the propitious match of spark ohamber to computer is clearly evident. 284
DATA COLLECTION AND REDUCTION
7.2 The Sonic Spark Chamber
It soon became evident that vidicons were not the only suitable means for directly extracting digital information from a spark chamber, and as interest in filmless operation of spark chambers has increased, so have the number of ways of achieving such operation. Among the most ingenious of these new techniques is that of the sonic spark chamber, in which, paraphrasing Roberts, the electrical spark discharge is located not by the flash of its lightning, but by the sound of its thunder. I n 1962, B. Maglic of CERN reported that he had succeeded in determining the position of a spark to the accuracy and precision of the chamber resolution by measuring the time interval between the initiation of the discharge and the arrival of the sound of the spark (more precisely, the leading edge of the shock wave in the chamber gas) at each of two microphone probes suitably positioned at the edge of the gap. The delay times, when converted to linear distance from each of the probes, were used to “stereo” reconstruct the position of the spark in the plane. Only a minimum of electronic circuitry is necessary to convert the delay interval into a digitized distance: a precision oscillator to be gated on by the chamber trigger and off by the probe signal, and a scaler to count the clock pulses generated by the oscillator during the interval. With the shock wave traveling at a rate of about 0.6 mm per microsecond, a &megacycle oscillator and scaler are quite adequate to achieve the maximum theoretical precision of the system. In addition to its simplicity of design, the sonic chamber possesses an important additional virtue. Because light is no longer the information carrier from spark to transducer, the chamber optical system, sometimes an extremely complicated array of cylindrical lenses, prisms, and mirrors, is no longer necessary. When spark chambers must be surrounded by magnet yokes and massive radiation shielding, as is often the case, an optical chamber may be completely unworkable. And while, as we shall soon see, the sonic spark chamber is not the only solution to the latter problem, it will usually be the simplest and cheapest. As one would expect, sonic chambers have their weaknesses as well as virtues. Unlike the vidicon, the sonic probe is not inherently a storage device; the system must be ready to use the information conveyed by the probe immediately as it is delivered. Each probe must therefore be buffered by its own clock-pulse counter, or at least by a register fast enough to catch the contents on the fly of a scaler counting at 6 megacycles. While the electronics requirement is not serious for many sonic chamber applications, i t may become quite 285
HERBERT GELERNTER
formidable in bulk for a configuration as large as that considered above as “representative.” Fortuitously, a second major weakness of the sonic chamber makes it unlikely that it would ever be adopted as the sole information transducer for such an experiment. Sonio chambers are not well suited to detecting more than one spark per gap because of microphone damping problems and the difficult to control reflected wave fronts that arrive after the direct wave from the spark. Although these problems may be partially circumvented by increasing the number of probes per gap, only two or three sparks at most have yet been simultaneously measured sonically, and sonic chambers have thus far been used only as single-track detectors. Nevertheless, the number of single-track applications for spark chambers me many, and the first completed filmless spark chamber experiment in fact employed the sonic technique. Our illustrative experiment above might well benefit by the substitution of sonic for vidicon chambers in the beam-defining section fore and aft of the magnetic field. Since multiple tracks are expected downstream from the target, sonic chambers are not suitable for the remainder of the configuration. In principle, only two probes, properly positioned, are necessary in each gap to determine the planar coordinates of any spark in that gap. It has become standard practice, however, to overdetermine the coordinates by using double that number of probes per gap. The redundant information makes possible more uniform stereo reconstruction precision throughout the plane, and in addition allows for the continuous redetermination of the parameters for the conversion of shock wave transit time to distance (i.e., the velocity of sound in the gas, and a lumped delay constant reflecting a lag in the electronics and a nonlinearity in the distance-time relationship, all of which will vary with the temperature and the age of the system). To illustrate the operation of a sonic system, let us msume that we have indeed decided to make use of sonic chambers in the beam defining section of our sample experiment. Having supplied each of the twelve gaps in the section with four microphone probes apiece, forty-eight counters must be added to our electronics rack, together with an oscillator and the necessary control logic. (On the other hand, we have dispensed with two vidicon cameras and their associated chamber optics.) A t the instant the chambers are triggered, clock pulses from the oscillator are simultaneously gated into each of the counters, which had been earlier reset to zero. At the later instant when the shock wave from a spark reaches a given probe, the counter associated with the probe is stopped. I n a meter-square gap, every counter will be stopped within 2 milliseconds; less than a millisecond will be necessary in our case. Nine milliseconds now remain till chamber 206
DATA COLLECTION AND REDUCTION
recovery, during which time the contents of the forty-eight counters must be transferred to the processor or to tape, and the counters cleared for the next event. In all, no great strain is placed upon our electronics or upon the data channel. To recapitulate, the sonic technique offers an uncomplicated and relatively inexpensive solution to the problem of directly coupling an uncomplicated and relatively small spark chamber experiment to a computer or magnetic tape unit. In the domain comprehending all of its virtues and bounded by its limitations, the sonic chamber represents a sound approach to filmless spark chamber practice. 7.3 Wire Chambers 1962 was a vintage year for the development of new filmless spark chamber techniques. Appearing almost simultaneously with the sonic chamber report, a paper published by F. Krienen of CERN described a completely different approach to the extraction of information from a spark chamber, one that corresponds closely to the “packed stack of Geiger counters” picture of the device. I n Krienen’s chamber, the location of the spark is betrayed not by the light i t sheds nor by the noise i t makes but, rather more in line with tradition, by the current i t carries. If one replaces one of the plates of a chamber gap with a planar grid of closely spaced fine parallel wires, the particle trace spark will jump between the plate and a wire (or pair of adjacent wires), and the coordinate of the spark along the axis in the wire plane normal to the wires is simply the coordinate of the wire in the grid carrying the spark current. All of the foregoing smacks suspiciously of reversion to the well-known, and at one time much-used, Geiger counter tray (a number of long thin counter tubes arranged like the logs of a stockade fence). But the innovation of the wire chamber goes beyond replacement of the separate external cylindrical counter electrodes with a single plane electrode, which makes it possible for the wire electrodes to be much more closely spaced. Of equal importance is the method whereby it is determined which of the wires in the grid entertained a spark during the discharge. By properly adjusting the parameters of the triggered discharge pulse, it is possible to produce an output current pulse in the wire that is just right for flipping the direction of magnetization in an ordinary ferrite memory core (Fig. 14). Perhaps more than any other filmless chamber technique, the wire chamber is congenial to on-line computer processing, for without intervening analog processes, it can write its message in the same language and on the same tablet that the computer can read.
287
HERBERT GELERNTER
Fro. 14. Schematic of wire chamber plane pair. The high voltage plane is puleed immediately following the passage of an ionizing particle through the chamber. Current carried by the spark will flip the core strung on a wire to which a spark hae jumped. Illustrated is the simplest method of extraoting the coordinate information contained in the core array. The wires axe pulsed in BUCceasion. A signal on the sense line indicates that the core strung on that wire hed been switched by a spark.
Wire chambers (alternatively called digital discharge planes) possess many of the separate virtues of the vidicon and sonic systems, and embellish the combination with a few more that are unique to the wire technique. As is the case with a vidicon chamber, events of high track multiplicity may be easily defected with wire planes, and, as with the sonic system, the chamber optical system is eliminated. The peculiar advantage of the wire chamber results from the fact that the discharge need be neither seen nor heard, but merely sensed by the appropriate ferrite core. The energy per spark released by the discharge is therefore considerably less than that necessary for the other techniques, and correspondingly less damage is done to the gas filler mixture which must be broken down to carry the discharge current. The important consequence of this more gentle treatment of the filler is a very substantial decrease in the recovery time of the chamber, by almost two orders of magnitude. Wire chambers have already been built and operated with dead times of only 0.2 millisecond. Unfortunately, the benefits of the digital discharge plane do not come cheaply. The fabrication of large wire grids of suitable fineness and satisfactory stability is a difficult task, as is the assembly of the special ferrite core arrays associated with each plane. The 288
DATA COLLECTION AND REDUCTION
cost of a very large chamber configuration can become prohibitive, especially when compared with that of available alternatives. As a step toward relieving the burden of excessive cost in wire chamber experimentation, some accelerator laboratories have developed standard, modular wire discharge units. Here, the advantages of the wire chamber may be enjoyed somewhat less prodigally, but at the cost of some decrease in the extreme flexibility of experimental design that generally characterizes spark chamber practice. Wire arrays have been fabricated by stretching very fine copper wire in an insulating frame, spaced as closely as 1 mm center to center, and by etched wire techniques on epoxy fiberglass backing. The former method, while clearly the more expensive, has the advantage of greater control over the exact cross section of the individual wires, so that one may know and control the electric field configuration in the chamber gap with greater precision than with the latter method. Because a particle passing between two wires will in general initiate a discharge to both contiguous wires, while one pctssing through a wire will discharge only that element, wire chambers can be designed to give twice the space resolution one would normally expect with a given wire fineness and spacing. It is to make consistent and reliable use of this effect that one wishes to know and control the electric field in the neighborhood of the wires. Although current practice has been to construct each gap with the wire grid operating at ground potential and the high-voltage pulse to be supplied to a conducting plane electrode, there is in principle no remon why a gap could not be designed with wire arrays for both electrodes, orthogonally oriented. The latter arrangement would permit the determination of both z and y coordinates of each spark, while presently a two-ply lamination is necessary to extract the same information. The difficulty is largely the technical one of insulating and operating the cores on the high-voltage side of the circuit. I n either case, a third plane running obliquely to the other two (equivalent to the redundant third stereo view in an optical system) is useful to help resolve ambiguities in multiple spark situations. As we have implied, the major attractive feature of the wire chamber is its extremely short .recovery time, Unless the experiment is such that the expected event rate will capitalize on that feature, however, and the data acquisition system is such that it can keep up with the expected event rate, it seems that the wise physicist should carefully consider the suitability of available alternatives before opting for the best (from the aforementioned point of view, at any rate). Before concluding this section, a very recent new development is worth reporting. In a paper presented early in 1966, V. Perez-Mendez 289
HERBERT GELERNTER
and J. Pfab describe a promising new method for extracting coordinate information from a wire chamber. The new technique makes use of the magnetostrictive effect in a magnetized thin nickel ribbon which runs across the wire grid. The magnetic field due to the current pulse in a sparked wire produces a magnetostrictive pulse in the nickel ribbon, which travels the length of the ribbon at acoustic velocity. The pulse is sensed by a pickup coil placed at an end of the ribbon. As in a sonic chamber system, the coordinate of the wire is determined by converting the time required for the pulse to get to the pickup coil to a distance. But, unlike the sonic chamber, in the case of multiple spark events, there is no difficulty in detecting and converting the resulting sequence of pulses traveling down the ribbon. It is too early to predict the ultimate value of the magnetostrictive recording wire chamber, for its operating parameters have not yet been fully explored. Nevertheless, insofar as it eliminates a good part of the expense in using a wire chamber without substantially decreasing its virtues, this new technique will bear watching.
8. Some Other Particle Trace Detectors The nuclear particle trace detectors thus far discussed in this review are the most important ones today from the standpoint of current utility. Although cloud chambers still find an occasional application where some special characteristic of theirs is required (for example, the continuous sensitivity of the diffusion cloud chamber), they are for the most part obsolete. A number of other devices, however, deserve brief mention here, because of their promise for the future, or for some property of special value in certain experimental situations. In particular, we shall consider below two additional types of discharge chamber, and two varieties of scintillation chamber. The latter are not to be confused with the scintillation counter, to which they bear approximately the same relation as the spark chamber does to the spark counter. 8.1 The Current Distribution Chamber
Late in 1963, G. Charpak of CERN pointed out that, if the electric current discharged to the grounded plate of a spark chamber by a single spark was carried off through two discrete electrical connections located at the periphery of the plate, the predictable division of current between the connections could be used to compute a coordinate of the point on the plate to which the spark had discharged. With a suitable choice of contact points and plate geometry, the current can be forced to divide in such a way that it may be easily and instantaneously 290
DATA COLLECTION AND REOUCTfON
reconstructed in a rudimentary analog computer to give the coordinate as direct output. This information can be available within the 0.6microsecond memory time of a spark chamber, so that the output of a few strategically placed current distribution chambers can be used as part of the triggering logic for a larger and more complicated spark chamber assembly. In addition, the output is immediately available in a form suitable for a CRT display unit. By introducing additional contact points, it is possible to derive both coordinates of the spark from a single gap, but the computation for converting current ratios to coordinates becomes considerably more complex for, unlike the case of the sonic chamber, each probe interacts with every other probe. The fact that the current distribution chamber cannot resolve multispark events is a serious weakness, but not one that is likely to impede its rapid and enthusiastic development as a spark chamber triggering logic element. Although work is in progress with the goal of eliminating that weakness, it will not modify the assertion that the current distribution chamber will find its greatest use and value as an extremely selective trigger element for complex spark chamber assemblies. 8.2 The Microwave Discharge Chamber
Investigators concerned with the properties of spark chambers have often reported an interesting phenomenon in wide-gap chambers, namely, the fact that under certain operating conditions, the spark, instead of jumping perpendicularly between the two plates, would follow the track of a particle traversing the gap a t an angle to the plate normal. The effect, however, proved not to be sufficiently consistent or reliable to enable its use in the design of single-gap chambers that could detect the angular orientation of a particle as well as its location. In 1962, S. Fukui and his co-workers at Nagoya University in Japan described a gas discharge chamber that was essentially a microwave guide with a transparent window through which one could view and photograph the trace of a particle in a suitable gas mixture. The chamber was triggered as is an ordinary spark chamber, but with an intense microwave pulse rather than the usual high voltage. When triggered, the trace of the visible discharge followed exactly the path of the passing particle. The microwave discharge chamber is limited in its applicability by a rather small upper bound on its maximum sensitive volume. It may, however, prove to be a useful detector for that class of experiment where angular distributions are the primary data sought by the physicist. 291
HERBERT GELERNTER
8.3 The Scintillatlon Chamber
While the inferenoe is obvious that the fluomoent r d a t i o n emitted by a sointillating orystal (or liquid or gtw) originates dong the path of the pclrtiole that inoited the material to fluoremenoe, the traoedeteoting property of a sointillator waa until reoently a quite useless edjunot to its many other vduable ohmaoteristios, beoause the radiation emitfed is 80 low in intensity that only an extremely high gain deteotor suoh aa a photomultiplier will respond to it. Upon development of the very sensitive multistage image intensifier tube, however, it beoame possible to amplify the light genemted by the partiole to the d extent that photographs oould be taken of ite feeble traoe aa it p through the sointillator. With enough intensity for photography, the image oould of oourse be s o m e d by a television amera for direot digitization and prooeasing, and in faat image intensifiers am available to produoe video output direotly, byprsssing the intermediate visible image. The virtues of the sointillation ohamber are those of the sointillation oounter, aa am its weaknesses. Of partioulm importmoe from the standpoint of its use aa a tram deteotor am its extremely short signal rise time, nonexistent dead time, and the faot that it is a oontinuoudy sensitive passive devioe. At this point, however, it should be abundantly dear that oontinuous sensitivity is not an unmixed blessing, and, while the dead time of the sointillator is near zero, this is not 80 for the image intensifier and its aasooiated eleotronios, although this can be small oompared to that for the usual spark ohamber. In sum, the sointillation ohamber is likely to find its greatest use in the same kind of applioation for whioh we have already presoribed the microwave disoharge ohamber, with the ohoioe between them depending upon the partioulm requirements of the experiment. Both these devioes will of oourse b v e to compete with the mom highly developed spark ohember, whioh the author feels will be the preferred instrumentation in most 0&888. 8.4 The Filament Chamber
A fiber-optioal version of the sointillation ohamber, the filament ohember oonsists of layers of filaments (-1 mm in diameter) of a tmnspamnt plastio sointillating material. The filaments in eaxth layer me parallel to one another, and perpendioulm to those in djaxtent layers. The plaetio fibers o m y the image of a partiole tram in the aeeemblage to two planes outaide the ohamber, one of whioh oontains the ends of all the fibers running in one direction, and the other, the end of the perpendioulm fibers. As in the 0&88 of the sointillation 292
DATA COLLECTION AND REDUCTION
chamber, electronic light amplification is necemry before the image oan be used. In those experimental situations where a scintillation-type particle trace detector offers advantages not to be had otherwise, the filament technique provides somewhat greater flexibility of experiment deaign than would be possible with a volume scintillator. Since each fiber is a light pipe that can be rather freely bent, it is possible to conceive of quite complex configurations of detectors that might be useful in satisfying some unusual experimental requirement. When such special circumstances are lacking, however, the filament chamber is probably destined to play a secondary role as a tool of high energy physics. 9. On-Line Data Processing in Physics At this stage of our discussion, there is no need to belabor the point that, among prwtitioners of high energy experimental physics, an overwhelming trend has developed t o w d automation of data 001lection, handling, and reduction, and, where circumstances are appropriate, a parallel trend is developing toward the on-line real-time performance of that task. We mrt without demonstration that a similar trend has appeared in all phases of nuclear physics, and will undoubtedly soon become evident in other scientific disciplines as well, where the costs of manufacturing experimental data are substantial. It would be fatuous to 8seert that all experiments are better performed on-line to a computer than not or, for that matter, that every experiment will benefit from automation of the data acquisition p r o m . In searching for neutrino interactions in high-density spark chambers, for example, the delay between triggering events will be measured in minutes rather than in milliseconds. Clearly it makes little sense to deaign such an experiment with filmless recording of the data. Nor is it likely that anything more complicated than a shq-eyed physicist with a precision ruler, or at most a measuring projector, will be necessary to recognize and digitize the collected neutrino events. Nevertheless, the number and kinds of nuclear physics experiment that flourish and prosper in a highly computerized environment by far exceed the number and kinds that don’t. The attraction to the physicist of being able to follow the progress of his experiment as it proceeds is a powerful one, even if he can examine only a statisticd sampling of his data. Not only does on-line operation cut down the turn-around time between dependent stages of a sequential experiment and reduce the lag between initiation of work and publication of results, but it allows the physicist to modify the course of his experiment in midstream, as it were, bawd on intermediate 293
HERBERT GELERNTER
results. Physicists who have completed on-line experiments report that setup and debugging of their instrumentation are made much easier, and the time consumed by the process is substantially decreased by the immediate avajlability of the results of a test. Under these circumstances, the payoff to a large laboratory can be substantial and economic, w well w scientific. With the offspring of the computer population explosion appearing in residence at the smallest of laboratories, the day does not seem far off when the decision to go on-line will be taken aa a matter of course. BIbllography BUBBLECEAB~ERS, General: The bubble chamber, D. A. Glaser, in Eiccndbuch der Phyaik ( S . Flugge, ed.), Vol. 46, pp. 314-341. Springer,Berlin, 1968. Bubble chambers, H. Bradner, Ann. Reu. Nucl. Sci. 10, 109-160 (1960). Experience with a lmge hydrogen bubble chamber. L. W. Alvmez, Proc. Intern. Cwf. Imtmmentdwn for High Energy Phya. BerkeZey, 1960 pp. 146149. Wiley (Interscience),New York, 1961. BUBBLECHAMBERS,Data Analysis : Data processing for bubble chambers. H. 8. White. Uniw. Cdq. (Berkeley) Rept. No. UCRL-9476 (1960). Analysis of bubble chamber deb. A. H. Rosenfeld and W. E. Humphrey. Ann. Rev. NucZ. SOa. 18, 103-144 (1963). The development of data analysis system for bubble chambers. G. R.Macleod, N d . I-&. M e t M $0, 367-383 (1963). b e n t performesloe of the Alvarez group data processing system. A. H. Roeenfeld, NucZ. Imtr. Methodo 20,422-434 (1963). QUEST: An on-line event-processingroutine. M. H. Alston, J. E. Braley, and P. White, Rew. Sci. In&. 84, 64-70 (1963). BUBBLECEAB~BERB, Autometic and Semi-Automatic Systems: S p i d Reader The spiral reader memuring projector and associated filter program. B. McCormick and D.Innee, Proc. Interra. emf. Inetrumentdwnfor High Energy Phye., B e r M y , 1960 pp. 248-248. Wiley (Interscience),New York, 1961. SMP A proposed device for the repid measurement of bubble ohamber film.L. W. Alvmez, Lonvrsnoe Radiation Lab. Phya. Note No. 233 (Univ. of californie, Berkeley, 1960). Scanning and memuring projeotor. P. a. Davey, R. I. Hulsizer, W. E. Humphrey, J. H. Muneon, R.R. ROSE, and A. J. Schwemin, Rev. Sci. Inch.. 86,11341146 (1964). FSD A method for faster analysis of bubble chamber photographs. P. V.Rough and B. Powell, Proc. Intern. C h f . Irkatmmentdion for High Energy Phye., Berkeley, 1960 pp. 242-246. Wiley (Interscience), New York, 1961. Realization of HPD system at three l8boratoriea. J. V. Franck, P. V. Rough, and B. W. Powell, N d . In%&. Methode 20,387-392 (1963).
294
DATA COLLECTION AND REDUCTION
Preliminery operating experience with Hough-Powell devioe programs. H. 5. white, T. &onstein, C. Osborne, N. Webre, and W. G. Moorhead, Nucl. Inatr. Methods 20, 393-400 (1963). PEPR Aprecision encoding and pattern recognition system (PEPR).I. Pless, L. Rosenson, P. Batien, B. Wadsworth, T. Watts, R. Yamamoto, M. Ahton, A. Rosenfeld, F. Solmitz, and H. Taft, Proc. Intern. Conf. High Energy Phya., D u h , 1964. Programming for the PEPR system. P. L. Baatien, T. L. Watts, R. K. Yemrunoto, M. Aleton, A. H. Rosenfeld, F. T. Solmitz, and H. D. Taft, Methods Computational Phya. 5, (1966), in press. Pattern Recognition Oomputer Design of a pattern recognition digital computer with application to the automatic scanningof bubble chamber negatives.B. H. McConnick and R. Narasimhm, NUCLI&. Methods 20, 387-392 (1963). SP~LRK CHAMBERB,General: Development of the spark chamber: A review. A. Roberts, Rev. Sci. Inetr. 82, 482486 (1961). Initial operation and performance of a large magnetic field spark chamber system. 0.R. Burleson, T. F. Hoang, P. Kdmus, R. L. Kuskowski, L. Q. Niemela, A. Roberta, T. A. Romonowski, S. D. Warshaw, and G. E. Yurka, Nucl. Inatr. Methods 20, 186-192 (1963).
S P ~ R KCHAMBERS,Filmless Systems: Vidicon Syatema The automatic collection and reduction of data for nuclear spark chambers. H. Gelernter, Nuovo Cimento [lo] 22,631-642 (1961). Automatic digitization of spark chamber events by vidicon scanner. S. W. Andrme, F. Kirsten, T. A. Nunamaker, and V. Perez-Mendez, Proc. Informal Meeting Filmlea Spark Chamber Techniques and Aaaociated Computer Use, Geneva, 1964 CERN-64-30,pp. 66-72. Spark chamber data handling using T.V. camera. H. L. Anderson and A. B m a , Rev. Sci. Inetr. 85, 492-496 (1964). An automatic recorder for the spark chamber. 5. Fukui, S. Hayakawa, R. Kajikawa, K. Kikuchi, and K. Mori, Japan. J. Appl. Phya. 8,400-408 (1964). Sonic Chambera Acoustic spark chambers. B. C. Maglic and F. A. Kirsten, N w l . Inatr. Methods 17, 49-69 (1962). Operation of a sonic spark chamber system. A. Lillethun, B. Maglic, C. A. Stahlbrendt, A. Wetherell, G. Manning, A. E. Taylor, and T. G. Walker, Proc. I n f d Meeting on Filmless Spark Chamber Techniques and Aaaociated Computer Uee, Geneva, 1964, CERN-64-30,pp. 167-170. Sonic spark chamber system with on-line computer for precision measurement of muon decay spectrum. M. Bardon, J. Lee, P. Norton, J. Peoples, and A. M. Seche, Proc. Informal Meeting on FilmlesaSprk Chamber Techniques and Aaaociatad Comp&r Uae, Geneva, 1964, CERN-64-30,pp. 41-48. Wire Chambera Digitized spark chambers. F. Krienen, Nucl. Inatr. Method8 16,262-266 (1962). Properties of triggered multi-wire spark counters of the Rosenbloom type and of parallel plate spark counters. M. A. Meyer, Nucl. I w t r . Methods 28, 277-286 (1963). Aooursoy of track location of charged particles in triggered multi-wire spark
295
HERBERT GELERNTER
oounters. M.A. Meyer, J. W. Koen, and I. J. Leasing, N d . Ins&. Mdunid 28, 287-301 (1963). A digitized spark ohember for eutomatio data retrieval. J. B o d , M. J. Neumrmn, R. H.Miller, and H.Sherrerd, N d .In&. Methods 80.34-44 (1964). MegnetoStriative readout for wire eperk ohmmbem. V. Perez-Mendez and J. M. pf8b, h t m Radiotion ~ ~ ~Lab. R W . NO. UCRL-11620 (Univ. of Celifol’Xli8, Berkeley, 1964). OTmm PAB!rIoLn T R A a DBImmRB: A new methad for determining the position of B eperk in e apark ohember by meeeurement of ourrents. chsrpsk,J. Fevier, and L. Mseeonnet, N d . Inetr. Mdunid U,601-602 (1963). M ~ ~ w B d kv h~ @ ohember. S. Fukui, 8. H 8 Y d C 8 W 8 , T. Teukiehima, and H . Nukuehins, N d . In&. Methods 20,238-237 (1968). T w o high energy phyeio experiments using the luminwoent ohember. L. W. Jonea and M. L. Perl, Advan. Electron. EZwtron Phye. 16,613-629 (1962). Filament eointhtion ohember experiments. T. Reynolde, D.B.[Jaarl,R. A. Swenson, J. R. Weters, and R. A. Zdenie, Advan. Eeclron. Elcchon Phye. 16, 487-600 (1962). Sointillation chamber oomperiaione: Fibers v . N d and image inteneifiers v . orthimm. D. 0.Caldwell, Advan. EZwtron. Eeclron Phye. 16,469-474 (1962).
a.
a.
COMPUTATION IN PHYBIOB: Integration of oomputer-enelyeer oomplex. W o n on on-line phyeioe, Proc. U m f . UticizatMn of Mu%parame.ter Andysare in Nuol. Phye., 1962, CU (PNPL)227, pp. 143-166. Columbia Univ. Prese, New York. On-line oomputer w. Proc. Inf& M d n g F M e Spark: Uhamber Techniqtua and Aaeooiatcd Oompu.tt3r Uee, Uencva, 1964 CERN-64-30, pp. 1-66 and 246-312. On-line opeFstion of 8 digital oomputer in nuoleer phyeioe experimentation. J.F. Whalen, J.W. Meadow, and R. N.Lareen, Rev.Sci. Itz.uk.. 86,682-690 (1964). A muter hodmope digital date handling and on-line computer eyetem UBBd in high energy scatteringexperimenta. K.J. Foley, 8. J. Lindenbum, W. A. Love, S.O&o, J. J. Ruesell, and L. C . L. Yuan, N w l . In&. Methafa 80,46-60 (1964). Pm. EANDU U mf. Autornahc * AcquWbn and Rcdu.di~n of N d . Dda,Karlanche, July 1964. To be published. Deta eystame for multiperemeter snslysie. R. J. Spinrad, Ann, Rev. N d . Sci. 14, 239-268 (1964).
ON--
296
Author Index Numbers in parentheses are reference numben and indicate that an author's work Is referred to althau h his name is not cited In the text. Numbers in italic show tffe page on which tho complete reference Is listed.
A Abrarrmon, N., 226 Adrian, E. D., 66, 83 &en, H.,187 (l), 177,191 Wport, F. H.,40.41 (S), 73,83 Abton, M.,266,296 Abton, M. H.,294 Alvmwz, L. W., 282, 294 b r i o ,
V.C., 286
h h n , H.L.,296 hdreee, 8. W., 296 W m r , E. V.,226 h 0 1 & R.F., 136, 177, 179 (24),191,192 Aronrtein,T.,296 &by, W.R.,62,83 Aahhhunt, R.L.,137 (44),191, 193 +inell, D., 160 (30, Sl), 181, 192, 193 Avidenin, A,, 16@, 162, 191
B Bekez, F. B., 17, 28 B&, C. F., 22, 28 Bardon, M.,296 Bu-Hillel, V., 8, 38 (6, lo), 28, 83, 84 Banu, A., 296 h t i e n , P., 296 Baatien, P. L.,296 Boxendele,P. B., 12, 28 Beeker, J., 9, 28 Bedrij, 0.J., 166, 191 Bennett, E., 226 W n , B. C., 177, 179 (24),192
Bsmerd, E. E,,226 Bernioh, M., 16, 16 (7),28 Beurle, R.L.,73 (6). 84 Bibb, J., 162 (le),191 B b k , M.,40,84 Bond, D. S., 226
Booth,A. D., 167, 191 Borko, H.,16, 16 (7),28 Bounin, J., 296 Bourne, C. P., 7, 9, 28 Bradner, H., 294 Brain, Lord, 38,84 krdey. J. E., 294 Brevermen, D.,226 Brockus, C. U., 177, 179 (24),192 Bmome, P. W.,226 Bruner, J. S., 39 (9). 84 Buebll, T.D., 8, 30 Burh, A. W., 147, 149, 167 (8), 191 Burle, N.,166, 166, 167, 191, 192 Burleaon, U. R.,276, 296 Burntar, V. a., 166, 191 Bucaell, B., 162 (16). 191
C Celdwell, D.O.,296
Carnep, R.,38 (10). 84 Certer, C. F., 34, 84 Chamberlin, U. P., 162, 191 cherpak,U., 290,296 Cheney, P. W.,179 (29),191,192 Cherry, E. C., 38, 89 (12),40, 66. 84 Chomaky, N.,40 (la),84 Cleverdon, C. W.,26,28, 29 Cooper, W.S., 24, 27, 29 Culberteon, J. T.,69 (la),84 Culve&oon, J. T.,226
D Ddy, W. U., 160,191 Devey, P. U., 294 Devideon, C. H.,73,(92)87 Deviea. I. L., 38 /l05),88 Degan, J., 286
297
AUTHOR INDEX
de Soh Prim, D. J., 2,29 Doyle, L. B.,20, 29 Duke, V. J., 226
E Eoolee, J. C., 88 (IS),84 Eden, M.,226 Edmundmn, H. P., 13, 16,29 Edwards, D. B. Q.. 180 (SO, Sl), 192 Eetrin, Q.,149, 162, 191 Everett, R.R., 147 (17),192
F Fmno, R. M.,48 (168). 84 Fmvier, J., 296 Feigenbmum, E. A., 226 Flory, P.J., 61, 84 Foley, K.J., 296 Frmenkel, A. S., 167, 192 Frmnok, J. V., 294 F r e h , C. V., 178,192 Friedberg, R.M.,49 (18),84 Fukui, S., 271, 291, 296, 296
Green, B. F.,24, 29 areen, J. C., 3 (17),29 Qreene, P.H., 40 (60), 86 G&, R.M.,179,192
H Hmller, (3. L.,226 Halmoe, P.R.,41, 86 Hmmming, R.W., 190 (27),192 Hmrtmenis, J., 83 (62),86 Hmwkins, J. K.,226 Hmyakmwm, 8.. 296.296 Hmyek, F. A., 39, 42, 43 (SS),86 Heyes, R. M.,9,28 Hebb, D.O.,43,89 (84),60,74,78,86 Heijn, H. J., 186, 192 Hemher, M.B.,226 Hoeng, T.F.,278, 296 Holmhmn, J., 226 Hopkins, C. O.,226 Hough, P.V.. 286, 294 Howea, D.H., 39 (94),87 Huleieer, R.I., 252, 294 Humphrey, W.E,, 282, 294
G Gmbor, D.,60 (19),84 Gmll, F.J., 39, 84 Oarfield, E., 21, 29 Qmmer, H. L., 134, 138 (23),148 (21),167, 177 (22,24),179, 192 Gsvenmmn, E.K., 226 Gelerntar. H., 263, 268, 280, 296 Gilohriet, B., 148, 149,191, I92 Gill, A., 226 Gilliee, D. B., 187 (70), 194 G h r , D. A., 231, 294 Goby, M.,226 Uoldatine, H.H.,147,149,167 (8), 191 Gonesle5, K., 177, 179 (24),192 Good, I. J., 31 (46),33 (22,34,44, 48). 34 (re),38 (49), 36 (as),37 (42),38 (21),39 (23,24,30,34,40,41),40 (Sl),40 (el),42 (29,31, 38, 36, 41, 48), 44 (28, 48), 48 (21,31, 38), 47 (26,31, 36, 4l), 48 (21, 26,27,31,41),49,80 (33). 81 (SO, 31,36, 41, 47, 48), 62 (28,48), 83 (re),84 (33, 38), 86 (34),61 (41,43, 48). 67, 68 (99). 71 (81,41), 78 (84),76 (21,26, 36), 79 (47,48), 81 (21,26, 32), 82 (37,39, 47, 44, 83 (47,48), 84. 86
298
I Innes, D.,246,294 Ishibeehi, Y.,178 (69), 194
1 JerVie, D.B., 188, 193 Jasper, H.,44, 64 (79), 87 J a y , E. T.,61, 79 (Lie), 83 (88, 86). 86 John, E. R.. 62, 66 (M),86 Johnson, W. E., 82 (58),86 Jones, L. W.,296 Jonker, F., 8, 29 JOW,Q.,226
K Kmjikswts, R.,296 Kalmut~,H., 86 gaLnry P., 278,296
Kmrliu, J. E., 226 Kaae. M. R.,226 Kehl, W.B.,17,29 Keir, Y.A., 179, 192 Kelly, D. H., 226
AUTHOR INDEX
Keeeler, M. M., 21, 29 Kikuohi, K.,296 Kilburn, T.,160, 192 Emoh, R. A., 27, 29 Kireten, F.,296 Kiraten, F.A., 296 Kieeda, J. R.,66 (60), 86 Koen, J. W.,296 Krienen, F.,287, 296 Krulee, GI. K., 226 Kruy, J. F.,160, 191 Kuhns, J. L., 19, 46, 49 (64). 29, 86 Kuskoweki, R. L., 276, 296
L Laraen, R. N.,296 Leshley, K. S., 66 (61). 86 Lawlor, R. C., 18, 29 Ledley, R. S., 175, 194 Lee, J., 296 L e a , M., 166, 166, 167, 163, 192 Leaning, I. J., 296 Le Veque, W.J., 136, 192 Levine, D., 226 Lewis, P. M., 83 (62), 86 Lioklider, J. C. R.,226 Lillethun, A., 296 Lindenbaum, S. J., 296 Love, W.A., 296 Luoal, H.M.,134,193 Luhn, H.P., 10, 22 (22, 23), 29
M
Biemke, 0. T.,26 (26), 29 M m n n e t , L., 296 Matthewe, B. H.C., 66, 83 Meadows, J. W., 296 Meager, R. E.,167 (70), 194 Meggit, J. E., 182, 186, 193 Merrill, R. D., Jr., 180, 193 Metrolpolis, N.,137 (44), 193 Metze, U.,166. 167, 158, 176. 193 Meyer, M. A., 296,296 Middleton, D., 39, 86 Miller, U. A., 43, 70 (69), 86 Miller, R.H.,296 Milner, P. M.,43, 61 (71), 64, 73, 86 Minsky, M., 48, 49 (73), 86,226 Mitohell, J. N.,Jr., 167, 193 Montgomery, W. D., 226 Moorhead, W.U., 296 Morgan, C. P., 166, 193 Mori, K.,296 Mueller, P., 86 Muller, D. E.,167 (70), 194 Mummy, C. J., 226 Muneon, J. H.,262, 294
N Nadler, M., 161, 193 Neraeimhen,R., 296 Neeh, J. P., 167 ('lo), 194 Needhem, R.M., 27,61 (76,76), 52,29,86 Neumann, M. J., 296 Neyman, J., 61 (77), 86 Niemela, L. Q., 276, 296
Norton, P., 296 Mdormiok, B.,246, 247, 268, 294 MoCormiok, B. H.,296 MoCormiok, E.J., 226 MoDermid, W. L., 66 (66), 86 YoDougall. W., 39.86 MoUill, W.,47 (67), 48, 82, 86 MeoKay, D. M.,39, 66, (63) 86 MoKay, R. W.,167 (70), 194 MeoLeem, M. A., 181,193 Meoleod, U. R., 294 MoNaughbn, R., 27, 29 MaoSorley, 0. L., 167, 166, 174 (41). 176, 193
Maglio, B., 286, 296 Manning, U., 296 Maron, M. E., 14, 19, 46, 49 (64), 29, 86, 226
Nukuehins. H.,296 Nunnmaker, T.A., 296
0 Oeborne, C., 296 Ozakio, S., 296
P Parker-Rhoda, A. F., 27, 61 (78). 30, 87 Peek, U.,27, 61 ( 7 8 ~ 430, , 87 Pedeld, W.,44 (79), 66 (80), 64 (79, 80), 87
Peoples, J., 296 Perea-Mendez, V., 296, 296 Perl, Y.L., 296
299
AUTHOR INDEX
Perry, W.,886 P8 H.E.,66 (60,66), 86 Pfab, J. M.,290,896 Piem, J. R., 87,87 Pleu, I., 268,266, 896 Pomrrrssls, J., 148, 149 (26), 192, 198 Porejd, D.J., 888 Pootley, J. A., 8,30 Portmbn, L.9 89 (9)s 84 Powell, B., 266, 804 Prioe, J. R.,886
0 Queetler, H., 47 (IN),48, 82, 86, 886
R Reo, C. * 61 (El), 87 Rso, T. R.N., 180,193 ReiBlen, B., 88
w.,
%itwhlmD (3.. 149, 164, 176, 198 Reynolds, 0. T., 896 Roberta, A., 276, 278,896 Rob, L., 66, 64 (SO), 87 Robarkon, J. E., 166, 167 (67), 167 (as), 109, 170,172, 1748 175, 103, I04 Rodr&uw, J., 89 (O), 84 Rogem, D. J., 20 (31), 30 Romonodd, T. A., 276, 296 Ronohi, V., 886 a b h t t , F.,48, 49 (82), 68, 87, 886 RM0XlbUrgp D.P*D 177, 178 (%)D I88 Rosenfeld, A*,242, 266, 886, 894, 896 Rollerwn, L., 896
h, R. R.,894 Rmmberg, D. P., 138,193 R d , 0.. 84,88 Rursell, J. J., 246, 896
S Beohm, A.
M.,206
ad&, F., 160, 194 Selton, G., 20,30 aemuel, A. L., 49 (saa), 87 Soerl, D. B., 896 Sohwemiu, A. J., g94 Soott, E. L., 61 (77), 86 kha,M.,84 (84)D 87
300
-9
E.,160, I94 U. S., 60 (88), 87, 886
Bebedyen.
Wbaoh, W. C., 88 (60), 86 BeWdge, J. A., 48 (70), 86 M i d @ , 0. U.8 48 (86)s 48s 49 (78)s 8&87 Semen, W., 187 (l),177, 191 &mbrhkoff, V., 89 (87), 87 8 h W I O n , a. E.,889 40 (88)~48, 64 (88)s 87 8herrsrd, H., 896 6h0llOD.A*,86, 68s 64 (90)s 87 Bhouldem, K,R., 86 (el), 87 BimmOnr, R. F., 24 (88)s 30 einger, J. R., 886
,a-
J., IM, 187, 194 Bkolnik, M.I., 886 amith, D.R., 78 (m),87 amith, J. L., i 4 e D i 6 i ,168,194 Sns~th,P. H. A., 61 (OS), 87 BOlmit~,F-,242, 896 8olomon, R.L.,a9 (94), 87 8perok Joneu, K., 27,239 Spkok Jomm, K., 46, 87 ap-1, J.. 336 Spkusd, Re J., 896 LlpuraheimD USD 88 (20)s 84 S W b d t , C. A*, 896 Btanwood, R. H.,22,88 Stevema, M. E., 27 (84), 30 8til-D H.E.,19,48. 30,87 Bvobods, A., 177,194 SW& F.E.,147 (17)s I98 awenmn, R. A., 896 SCebO, N. 8.9 177, 179 (07, 6 0 ) v 181 (68)s I94
a*,
Q.
c.,226 T
Taft, H., 242, 266, 296 Tnkeheai, H., 178 (09), 194 T& R. I., 177,179,181 (68)s I94 T h o t o , T.T.,20, 61 (97)s 30, 87 TsnneSlbaUm, Me, 179, (29) I98 Tmnan, P., 17, 30 Taub, A. H., 167 (70), 194 Taube, M., 10, 90 Taylor, A E., 296 Teig, M.,86 (60), 86 Tooher, K. D., 163. 174,194 T o m p h , C. B., 88,87 Tower, D.Be, 86, 87
AUTHOR INDEX
..
Tribur, M., 83 (loo),88 T.,296 Tulsey, J. W., 4, SO Turing, A. M., 227 Turn, R., 162 (IS),I91 Ta&hma,
Weinberger,A., 146, 151, 163, 194 Web, H.N., Jr., 227 Wetherell,A., 296 Whden, J. F., 296 White, B.S., 242,260, 294,296
P.,294 Wilby,W.P.L.,50 (W),84 W W ,J. B.,Jr., 16, SO Wileon,J. B.,175, 194 Winder,R. O., 49 (104),88 Wow,8.Y.,148, 149 (25),192 Woodoock,R., 60 (IS),84 Woodeon,W.E.,227 Woodward,P.M., 38,88 Wozmoraft, J. M., 88 Wyllp, R.E.,13, 16.29 Whits,
U Uttley, A. M., 54, 71,88
v V-, M.,177,194 Ven Meter, D.,39 (68), 86 Violrery, B. C., 9,JO Volder, J. E., 186, I94 von Neumran, J., 147, 149, 167 (8). 191, 227
W Wadmorth,B.,296 Walker, T.G., 296 Welter, W d G., 65 (103),88
W d w ,8.D., 275, 296 Woterr,J. R.,296 Watte,T.L., 296 Weaver,W., 88, 40, 48 (89), 87 Webre, N., P96
Y Yemamoto, R. K.,296 Yngve, V. H.,70 (107),88 Yuau, L. C. L., 296 Yurke, Q. E.,275,296
Z Zsngwill, 0.L., 56, 64, 88 Zdenir, R. A., 298 Zipf, G. K.,10, SO
301
Subject Index A
B
Abnormal man, 218,269 Abstraating, automatio, 11 Abstraotion, 71 Aoouraoy, desired, for bubble ohamber meesurementa, 240;866 a h Preaision, remlution Aoquieition time, 222 A d k modifloation, 123 Alpha rhythm, 06 Androids, 41 Anxiety, 103 Apollo M o n Simulstor, 96, 129 Applioetiom of &due numbers, 179-182 Area soen, 207 &ray prooBBBor, 269 Artifioisl intelligence, 31,202 h m b l i e s rn Cell eeeembliea hiation between asaembliea me undsr Cell eeeembliee between worde, 64, 66, 70
Baby, eduoetion of, 46, 46, 66, 06 Balencing of a matrix, 62 “Beseball”, 24 Ba~ioaddition prooees, 144-146 Bayed theorem, 39, 48,49, 79 Behavioral interpretation of meaning, 41 Behaviorimn applied to robota, 41 Benoh marke, 249, 2Kl; 8ee. also Fiduoial masks Bibliographio ooupling, 22 Biomeohaniaal relationehip w under Symbiotio Bionioe, 203 Bit image representation, 266 Bit rate in manned and meohanized syshm0, 221-222 Boole-Poinoare theorem, 69, 81 Botvology, 72,79;866 Q&O clumps,theory of Brow~ing,4 Bubble ohambers, 230-209 automati0 data d y e i s for, 246200 aomparieon with apark ohambers, 271 data reduotion for, 236-2445 dewription of, 23CL236 fiduoiaLe, 236 hydrogen, 232, 233, 244 media for, 232,283 operated in megnetio fleld, 233 properties of, 234 Bubble ooordinate representation, 266
bonds, KO faotors, 19,48, 70 h i a t i v e memories, ertifiaial, 66 Attention span 63 Auditory oue, 100 Aural oommunioation, 213-216 Automatio abstraoting, 11 aids to information retrieval and dieeeminetion. 18-22 oI&&tbatioetion, 14-17 data end* for bubble chambers, 246-269 ooet of, 264, 266, 201, 268 feot retrieval, 22-26 indexing, 10-13 measuring eystems, for bubble oham. bers, 242; 866 dso Fully automatio, lJemiautomi0 Automation. 200
302
C Camera tube, Vidioon, 280 Canonical form 8W Statements Canonid aigned digit oode, 164 Capeaity, event-promsing, for bubble ohamber data handling, 246,264, 208 of a ohannel, me under Cheunel CeFdinal numbers, defbition of 40, 41
SUBJECT INDEX Carry, 136, 136, 139, 144-166 eseimiletion, 146 exolwive oa, 160 halving, 161-163 initiation, 146 propagated, 144, 146 ripple, 146, 146 selwt adder, 164, 166 simultaneous, 146 ekip adder, 166, 166 stetiStios, 148 storage, 147 storage adder, 148 modified, 167 threshold, 160 c a d caloulus, 67 force, 68.69 independence, and the Bring squad, 67, 68 intermtion, 66, 79-83; 86e &o under Intermtion tendencies, 64, 66,67, 76, 76 additivity of, 67 intrinaio, 79 Cell assemblies, 64-74 eotivity of, 67 aa three-dimensional fishing neb, 62,64 eeeooiation between, 67,66 olumps of, 71 oonneetivity in, 62 hieramhy of, 66 intersection of, 69,70 meohenism of Bring, 66 priming of, 72 reeofivation of, 68 selection of next to fire, 66 sequences, 58, 60, 70 half-lifeof, 63 replaying of, 80 Siee, 69 theories of, 31, 66-74 Cell-segment repreamtation, 266 Centrencephdio system, 44, 64,67,68;me dso under Feedbmk Cerebral Atomio Reactor, 64 Channel capacity, 208 human, 210 Charge hag-, 280 Chw, 34, 60 mechanical, 49 randomized, 36
Circularity in de&litions, 42, 76 Citation index, 21 Claaeification, 71 arboresque (or dendroidal), 53 aa regeneration, 46 automatic, 14-17 Dewey Deoimal, 9 hierarchical, 9 Closeness, statiatiaal, semantic, 19 CLOUDY, 242;Bde d80 FOG-CLOUDY, Clumpinem, 71 Clumps conjugate, 62, 63 of eeaemblies, 71 of words, 71 partially ordered, 63 theory Of, 27,6143; ale0 Botryology Clusters, 61 Coinoident oarry-borrowstorage, 167 Collation, 66 Combinetione of man and machinee, 206 Command modification, 123 Communication and semantios, 38 aa random transformation, 37 88 regeneration, 37-40 a d , 213-216 ~ h l l i h t o - ~ t d i t223 e, someethetic, 216-218 VaOe-to-grOWd,222-224 EPWe-to-~paOe, 222-224 epeeoh, 210 visual, 210-216 Communiaetion system, 208 Complement oode, 137-143 Computer intelligence, 204 Conoeptual p r o c ~ 60 ~~, Conditional sum adder, 164, 166 Connectivity, 57, 62; dso u?u&3r Cell assemblies, Bubtwmmbliea Conning Btation, 98 Coneoiouenase, 64,615 habituation of, 66 single element of, 67 Contingenoy table, multidimensional, 60 Coordiuate indexing, 9 COBDIO trigonometrio oomputer, 186-191 Correlation matohing, 66 Cortex, 6Z64 hietology of, 73 parmxmtera of, 36, 63, 64 Coat/Effeotiveneee, 130
303
SUBJECT INDEX Coat of eutomatio bubble
ahember date
d p b , 264, 266, 261, 268 Courm oontml, 116-118 Cow, probebilirtio, 42 Wisdom'r 42,74 credibility, 42 critiqw, trrjnins, 94 URT --, 968-266 Cqwgeniw, 88
cue
auditory, 100 ground, 100 internal, 100 vieuel, 180 Current distribution ohamber. 290, 291
D DAPR, 260,262 Data enmlytaia for bubble ohambere, 286,241-246 for spark ohembera, 277-279 ow d m undsr Autometio, On-line Data reduotion, for bubble o h t u n h , 286-246 Deed Time,of llperk ohsmben,278 DeOieion pro129 humnu, 129 Decision time, 222 Definition okrmlsrty in, 42, 76 of "definition", 4B Deliveq of information, 4 Delta rhythm, 66 Depth oontrol, 118, 119 Depth hypothe&, 70 Dewriptive oontinuum, 9 Dew ex rmrohins, 82 Dewey Deoimd Clewihation, 9 DiagaOetia tating, 126 Dictionary, at8tistW, 48,47 Dienoephelon, 64 Digit by digit wmputation, 18E-191 Digital Autometio Pottam Remopition, me DAPR dimhergel planem, M 8 image pudtion repmentation, 12, 288 treolr following, 240
Dieitisine
pnojection mioromope, W4 rssdroRoughdigi~ Diminished radix oomplement aode, 187
391
Dirariminsnts, linesr m d nonlineer, 49,60 D h e m h t i o n of Information, Sehtive (SDI), 22 Dividon, 168-177 modifled HRT, 176-177 noxuwtoriug, 189-172 mtorhg, 168,160 SRT, 178-176 Division slpplithm initiahdon, 148 itemtive, 142,148 DNA, 88 Dootor-in-house priuoiple, 48 Doaument retrieval. 7 operational mewoh on, 60 Documentstion, 124 atendads for, 124 Dreemlem deep, 67,66 W h y , 100 Dyn8mio aimihuity. 108
E Eoonomio advantage, 102 Edumtion, wd Baby Eigenv&re, 68 End mound oerry, 189 Entropy, 208 maximum, prinoiple of, 61, 82 of oortioel eotivity, 68 Equation# of motion, 112 Equipment oepabilitiea, 100 Error oorreotion, 37 Error due to trunoation, 172 Ethiosl problems, 84 Evduntion, tniniag, 94 Evduetion oriteris for information retrievd techniquuu, 26 Event lib-, bubble ohember, 242,248, 869,270 Event-prooemhg, a60 Cspa0it.y Eveat-triggered operation of spark ohem. bez, 278 m,248 E ~ 0 s u - code, h 188 Exoldve on oerry, 160 Exmtion tkne variability, 126 Extended digit number syatasn, 162, 168
F Faailitation m umdw Syneptio Feot retrieval, 7 eutomsfio, 22-26
SUBJECT INDEX Fwtoriel exprimenb, 70 FectoriGetion of aimulteneoua logio, 160, 151
Feedbwk oontrol by oentrenoephelio ayatam, 84, 65 Fidelity, 99, 103 Fiduoiek, of bubble ohembem, 235; 8M Obo Bemoh merka Filrunent ohember, 292, 293 Filmlean operetion of eperk ohembem, 279-290
Fmrms, 246, 247 Filter program for S M P 262 Finite number eyetem, 132 Fire oontrol, 98 Fixed point. 137 Fleet Bollietio lldieeile Traiuer. 92 Flexibility, 102 Flight mimuletor, 94 Floating point, 137 Floeting point interpretation, 143 Flying Spot Digitizer (FSD), 256362 fully autometio, 259 d u t o r n a t i o ( h u m a n - d t e d ) , 256, 287, 260, 268
obo Hough-Powell Foa, 242; wd obo HUQ wd
FOO-CLQUDY-FAIB e m , 243 F o m t t e d flle eysteme, 7 Fourier hamform, mutidimdonel, din-
orete, 81 Frenkemtain, 240, 244, 247, 261,253 FSD wd Flying Spot Digitizer Full text indexing, 17, 18 Fully eutomatio FRD e m , 259 PEPR ayntem, 267
--
G
Gep findine, for apark ohrunbsrs, 282,283
for bubble ohsmbena, we undsr Mulie for aperk ohambera, 271, 272 QIting in, 258 Qeigen oounbr, 870,287 Qel, 61 (hncdhation, 71 Qeometrio end kinemeti0 remnntruotion, 235,241, 242 aertelt, 211 Ground om,100 Ghwrth, 102
H Habituetion, Undsr Coneoioumerr Hendover time, 248
U, 287-289 Helmsmen elgorithma, 115 Hieramhid o M o e t i o n BJrstem, 8 Hiererohid regenerstion, 38 Hierrwhy of cell asaembliee, 66 Hietology of the oortex, 73 Hough-Powell flying #pot digither, 248, 256262
Humen-aeeisted eutamatio mesmuing, ado wdsr 8emieUtom8tiO Human ohennel, 209-216 Humen feotors apeoidbt, 99-100 Humrdty redundanoy of, 34 aurvivd of, 31, 34 Hydrogen bubble ohemban, 232, 233,244
I Idle time, 125 Image orthioon, 280 Imeginetion, 34 end laqwp, 36 in dreemr, 58 Independence
a d ,67,68 maximhetion of, 83 Index tern, 43 Indexing, 8 autometlo, 10-13 oitation, 21 Ooordinete, 9
full-te&, 17,18 byword, 9 probabbilistio. 19 nubjeot heading, 9 unitarm, 10 word frequfmoy,10 h d h t i v e dete, 235,238, 240, 241 Indioator digit, 158 Individuel ekill trainer, 95-97 Idormetion emount of oonoerning one propomtion, provided by enother one, 47, 48, 81 in e proposition, oonoerning itclelf, 38, 64, 81
browning, 4 delively-of, 4
SUBJECT INDEX Information-cm&u& internal generation of, 58 mutual, 48 an an intareation, 81 p d t of, 4 rejeotion of, 37 seleotive dissemination of (SDI),22 statistically independent, 48 Information oontent of bubble chamber pioturea, 255 of spark ohamber discharges, 277, 278 Information explosion, 1 Information handling, eleotronio, 218-222 Information problem, 1-3 Information retrieval, 1-30, 36, 43-j4 and mall, 43-54 and the d a t i o n of “definition”, 42 Out-dom fmtor in, 48 parallel, 43, 44 statiatioal or weighted, 43, 44, 47 SyEteme. 7 types of, 6-8 Informetional intersotion eee undw Intersotion Ingenuity traded for money, 62 Inhibition, 57, 62 Inputloutput, 126 Inspiretion, 72 Instruotiona for scanners wd Soanners’ inBtNOtiOne Inatmotor’s oonaole, 104-108 Integration method, 119-122 Intelligenoe, 200-204 Intelligenoe explosion, 33, 34, 78 Interaotion between evente, 81 o a d bet3 under C a d high-order, 55, 83 hypotheeee (in partiolephysios), 241.242 informational, 48, 50, 53, 79-83 Interfaoilitstion, 65, 69; wd olso Bynaptio faoilitation Internal oue, 100 Iterative division algorithm, 142, 143
K Keyword in Context eystem, 22 Keyword indexing, 9
=ox, 242, 243,284 Kinematio eee (hometrio KWIO,22
306
1 Lengusge
handling by maohine, 36 and imagination, 36 an regeneration, 38
oanonid, 77
atdo Trenslation Leunoh, 88 Learning, 57, 58 Library of events, 242, 243 Linear number system, 133-135 h o o , 243 Linguietio tramformatione, 41, 42 Linke, link indioators, 21 Log faotoln, 48 wd
M Maohine versus men, 187 Maohine-to-maohineoommuuioetion, 218222
Magnetio field for bubble ohamber operation, 233, 241 for spark ohamber operation, 275. 276 Magnetoetriotive reedout, 289, 290 Magnitude-plus-signoode, 137-143 Man versus maohine, 187 Manoeuvers, 121 Man-mnohinesymbiosis, 34, 204-207 Manned spaoe mission, 186 Mapping funotion, 123 Marginsl totals, 80 Markov ohaim, 54, 70 Manter IIIEtNOhr’S C O n S O h , 106 Matmhing by oorrelation, 66 Mathematioal model, 111, 124 Merrning and behaviorism, 41 and oommuniwtion, 38 and degrees of belief, 76 and subassemblies, 74,75 aaaembly theory of, 74-77 eoonomy of, 77, 78 evolutionery function of, 17 lited, 76 multiaubjeotive, 76 of “meaning,” a m&am&dqw ’ * ti0 problem, 40 representation of, 40-43 subjective, and oaueal tendenoies, 74 versus effeotiveneee, of a statement, 76 wd atdo f3eunentioa
SUBJECT INDEX
M
m
, of bubble ohember 9kn, 236 automatic, 242 Meohenioal t r d a t i o n , 40, 41, 71;we dso
Number system properties, digital wmputer, 138
Medie for bubble ohambere, 232, 233 Memorizing of long tmquenoee of digite. 60 Memory blook, 44 oluea, 44 dietributed, 65, 66 e d o n of, loes of d e w in, 69 stereotyping of, 59 Memory time, of sperk ohambem, 273 Mental akill me Skill Metaphymce, 40, 64, 77 Mioroweve disi?hergeohamber, 291 Modified oany storage adder, 167 M d e d SRT division, 176177 Motion hulation, 109 O W ~ship, 109, 111-122 Motivational similarity, 104 Motor Skill 888 skill Multiplication, 163-168 Multiplier ooding, 163, 164 Multiplier logio, 164-166 nonstnudard, 167, 168 Mutation of aynaptio strength,68, 69, 80 Mutual information, 70, 76
Oakhem's razor, 66, 69 On-line dete prooesSing, 243,246,261,293, 294 Open loop similarity, 103, 104 operational similarity, 104 Operational system, 93 improvement of, 126 Orthogonal 80811,268,269
N Navy Trnining Devioe Center, 92 N e d density at apertures, 66 Neural net erti&ial, 36 umbrella eheped, 66 Neumne pained, 61 refrsatory, 61 relbbliity of, 36 Noneanentiel dieorepenoiea, 103 Noxuw-toringdivision, 168-172 Norm+ form of a number, 137 Number system, 132 extended digit, 162, 163 finite, 132 interpretation of, 136 linear, 133-135 poaitional, 133 refleoted binery, 134 I.edundent, 135, 157-163 d d u e , 136, 177-182 weighted, 138-136
0
P PAOKAQE, 243 Pein, real metaphysioal, 64 PANO, 242, 243, 264 Parallel plate oounter, 270, 271 Partial oarry, 147 Partial sum, 147 Partially automatic measuring system ude under Semiautometio Pattern woognition oomputer, 269;888 d m DAPR, PEPR PEPR, 26!2-268 fully autometio, 267 semiautometio, 267,268 Perception end initial probabilities, 39 Peraepto-motor skill wb Skill Peroeptron, 49 PerOeptUd Skill 888 skill Performanoe, integrated teek, 97, 98 Peripheral equipment, 126 Personnel, 98 Pereon-to-pereon wmmunication, 218-222 Phonemes, 45,6l, 66 end eubaeeembliea, 60 regeneretion of, 39 Physioal enelysie of date 81% Date tbdysk Physioal appes~enaeof environment, 103 Pi, 60 PBfR, 259, 262 Polymerizntion, 61 Population oontingenoy table, 61 Positional number system, 133 Preoticel problem, in p r o b e & 122 Preotioe, opportunity for, 94 Preoiaion Encoding end Pattarn Reoognition Devioe 8 M PEPR Preoieion of trclce rewnstruation, for spark ohembere, 274; wb dso AWure~y, Resolution
307
SUBJECT INDEX &ObbUWO inde;rine, 19; rn dro Inform-
atiionrstriac&fJta~
Aob.billfJr .od,
41 ertinutiOnof, 32, 33,54 end degree of mgommtion of a ouby#imbly, 77 kin& of, 76 pertid o d d q of, 45,48 me a&o Credibility Problem-oriented oompufer oonmle, 105 proasdurcrlrlrillWSkill
pmdwt, ~1indireat”,81, 82 Program cheokout, 123 P q m m t&ing, 124 pm?J-mh ooddentionr, 122-128 ProjjeotiOn miomroope, 240, 241,248 di#tizing, 244 -0-oontrolleul, 240,244 Projeotor, S 0 m m i n g - l . eoe S M P Propegatad o ~ y 144,145 , Propagation chanoterirtiol,110 Propsae, M a bubble ohamber medium, 233, !U4 Propertier of bubble ohemberm, 234 of qmrk ohambere, 273 &O&tiOlb M 8 O h Of Oqllh‘&Ilt .ktement4,77 pleudo dividon, 182-188 pleudo-a;Iol, 284 -do rnultipbtion, 182-188 Pmudopmbabilith, 78 P.jroholoey, oommuniation, and w d t y 70 PMUit of informeb, 4
-
R#lirtia opfaomsr, 98 and Phrultuy, betawn,59 M hgi0 PlUl p?Obd&Y, 88 lzeamhg murhirur, 200-204 Reacrll, 38, 4immediste to long-term, 72, 73 long-“meaning”, 77, 78 parfeot, 59 Reaovery time, of rrpsrk ohamben, 273
WundanaJr
in the b d n , 38 of hummiby, 34
Redundant number qntem, 135, 157-183 Redundant w e d digit number ay~tem, 159482 Referenoe retried, 7 Refiected numk qwtem, 134 Regenemtion, 37-40, 46 and eoonomy, 37 and hgmge, 38 aud mowing, 39 approximate, of an m b l y , 57 M error oorreotion, 37 hierrrrohioel, 38,39 of phonemes, 39 pmbebiliatio, 37, 38, 57, 82, 77 ~ i n f o ~38,t 48, , 86, 77, 88 Relevmoe (of index teama), 53 Relevanoe number, 19 Repetitive h p y thought, 68 Repnmentative oodgumtion (for epark ohrrmber), 278, 280 Representative experiment (for apmk ohamber), 284, 288 W d u e number m, 138,177-182 t @ i ~ ~ of, t i179-182 ~ ~ 0 Rerolution for qmk ohsmben,274 for Vidiwn qwtem, 281 for w h ohmnbera, 289 R .w dro Ammay, himion Reeolution time,for rpuk ohamberm, 273 Radar, 108 Redolving power, 282 Radio, mlom&ture, 35 Re%o,nsnoe,44 Radix c o r n p h t code, 137 hotoring diviedon, 188, 169 Random t d o r m e t i o n rn undsa Corn- Reverbtion, 81 rnunioation Reward and prrakhment, 77 qwionality, p M p l e of, 38 Ripple OMJT, 145,148 Reel metaphynioel pain, 64 Road buffer, 258 ReaI.worl4 aimllletor, 109-122 Robest.on ohact, 171 Rdian, 100,103 Robot, 83,78,80
308
SURJECT INDEX
Spaoe-~-8pmle oommddm B2s-Pa4 Spark o h b e m , 270-890 oomparkon with bubble ohem-, 277 dab pmblm, 277-279 S ebotrodo oirouita, 271 fabrioation, 271 Iktellite, 223 filmlew operation, 278-280 k n n e m ’ inotruotionr, for bubble h. 6Y fllline, 271,272 ber data, 288,289 &o fleld, operation in, 275,276 preoidon, 274 8o.pnine abnormal, 258,259 propertiem, 278 ~ 1 267, reprmmhtive oodpmtion, 276,280 by CRT, ma265 repmemt8tive experiment, 284,286 of bubble ohamber film, 235-288 ronio, 285-287 258,259 trigphg oountem, 276 mgnhmt, 205,266 Vidioon, 279-284 Boedng-Msyurine Projeotor n o SMP sire oham287-280 Sointillation ohamber, 292 Spsaia eventa tepe, 284 Segment man, 265,266 Spsoulstion, 32 8eleofivs dimmination of information Speeoh oommdostion, 210 (SDI), 22 Spirsl &a, 245-248 Bgnrntio ohamem, 19 SRT division, 173-175 Somanti0 information, 38 Iltendard form of e number, 137 &3Ul&hU66droM6€bUiOg Ststemenb in ~ O z l i o f form, d 41 rbthtbl,45,46 ateti0 Oimiwty, 103 8edeuhmatio m w Stetinti4 olomnar, 19 by FSD rprtsm, 256. 257,200,268 Stetbtice of oarry generation, 148 by PEPR ryrtem, 267,268 Stereo vim, 285, 241, 253 Benror, 110 Student environment, 102-104 m t i e 0c.rry repmemtetion, 167-159 Subsammbliea (of oelb), 43, 54-74 Servioe oontrolr, 106 end p h o n m , 80 Servoamtrolled projeotion miororcope, and unoo11wioua (preooxuoioum)thought, 240,244 58 “Set”, Prgohologid, 60, 04, 66, 75 oonndvity of, 57, 62 Similuity, betweon simulator and reality, “half life”, of 57, 61 108,104 uninhibited in deep, 58 Simulation of brain, 84 Subjeot heading indexing, 9 S t n u l ~ u OaU T y , 146 Subjeotive probability and mwmiq, 38 Sketohhg of bubble ohamber data, 238 Submnrine Attcrok Center Trainer, 106 skillel, 96,97 106 Slaved rpot, on CRT, 282 Submarine ettcrok team, 97 Sleep, 57, 65, 72, 80 Submarine Idmulstor, 94 smell, meme of, 216-838 Subwmmbliea, 57 SMP, 24s-255, 260, 261, 26s Smr x, 248 flter pro(pam, 252 Survival of humanity, 31.34 Soaid problem% 34 SymbioriS, bioohemid (of men and m m h a Bomerthetio oommuniomtion, 216-218 h e ) 34, 204-207 h , 98, 104, 106, 109 Splsptio Sodo rporlt ohamban, 285-287 atrengthn (quantized?)58, 59, 80 Botwe (of information), 37 faoilitation, 55,58,59 sproe midon, 196 Synohronouu satellite, 223 sp.oe-to.glmndoommuniocrtion, 222-224 SYNTEXOX, 24
309
SUBJECT INDEX
T Tsrget refleotion obareoterietios, 110 Tank trainer, 93,99 T M ~eeme , of, 216.218 Teem, mabmnrine ettwk, 97 Teem tmbr, 96-99 Television trtmmimion, 212, 213 Temporal munmetion, 01 Theseurue, 21 meohanioel oodruotion of, 71 Thought, 203 Thought-word-thing-engramtatrehedron, 40 Threahold w r y , 160 Time deley in neural networks, 64 Time elepaed, judgment of, 69 Time sequenoe ( n e k l repmuentation),69 To@o s t h k problem, 94 Totally perellel arithmetic, l6!3-102 Traok meammment teohnique, 240 Treiner advantap of, 94 deibition of, 93 flight, 94 general-purpose digitel oomputem, 101124 individual skill, 9697 non-trainingUBBB, 126-128 eubmerine at*, 103
493,99 fesm, 96-09 total, 99
Trainer, Fleet Ballietio Mimile, 92
TFeining oonoept, 93 devioe, 93 rationale, 93, 100, 101 requirementa, 80-101, 103 individd, 91 teem, 92 simulator, 93 aon-fraining UBBB, 126-128 trslufer, 102 Transfer digit, 100 Tramgeneration, 39 Trmnnktion, meohenioal, 40, 41, 71; am
TBXm, 261 "rummtad repremntmtion, 172 h o e t i o n error, 172 "Twenty qu&iomSs and m b l y a k , 66
U UETEJW 219 ~, Ultraintelligent m8ohilleJEi, 33-37 de&lition of, 33 value of, 34 integrmtad with eleotronio computer, 36 Ultraparallel m~hineo,36, 80, 61, 66, 66, 79 Uncertainty hotion, 208 Uncolllroiow and preoonsciow pmta of mind, 80; wd duo undw Subamemblien Uniterm indexing qwtem, 10 Unmanned npaoe &on, 196
V Variebility, of emoution time, 126 Vidioon p a r k ohember egsteme, 279-284 V i e d oommdnioetion, 210-210 Visuel oortex, 06 Vied om, 130 Visuel paroeption, 216, 216 V i e d eimulstion, 130 Vooebulary (in memory), 44,40
W Wekefwnees, degree of, 04 Weber-Feohner law, adapted to judgment of e l e p d time, 69 Weighted number ayntem, 188-1315 Weighta of evidenoe, 38,48, 64, 76 Wheles, ultdntelligent?, 36 WHIBLWIND, 147 Window ahaden, 249,260 Wire ohambera, 287-290 Word hquenoy indexing, 10 wadi a4 dump, 38 d t i o n of, 64, 66, 70 olumpe of, 71 diatribution of, 44
aLW3ueP
Triggersbility, 277 Triggering oountere, for spark ohembem,
270 Wgonometrio oomputer, COBDIO, 186.191
310
Z ZStOaoding, 66 Zemg environment, 99