Advances
in COMPUTERS VOLUME 18
Contributors to Thls Volume
S. E. GOODMAN M. H. HALSTEAD MONROEM. NEWBORN AZRIELROS...
32 downloads
1069 Views
16MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Advances
in COMPUTERS VOLUME 18
Contributors to Thls Volume
S. E. GOODMAN M. H. HALSTEAD MONROEM. NEWBORN AZRIELROSENFELD PATRICK SUPPES STUARTZWEBEN
Advances in
COMPUTERS EDITED BY
MARSHALL C. YOVITS Department of Computer and Information Science Ohio State University Columbus, Ohio
VOLUME 18
ACADEMIC PRESS w New York
Sen Francisco w London-1979
A Subsidiary of Harcourt Brace Jovanovich. Publishers
COPYRIGHT @ 1979, BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED. NO PART O F THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.
ACADEMIC PRESS,INC.
111 Fifth Avenue. New York. New York 10003
United Kingdom Editiori published by ACADEMIC PRESS, INC. (LONDON) LTD. 24/28 Oval Road, London N W 1 7 D X
LIBRARY OF CONGRESS CATALOG CARD NUMBER: 59-15761 ISBN 0-12-0121 18-2 PRINTED
IN THE UNITED STATES OF AMERICA
79808182
9 X 7 h S 4 3 2 1
Contents
CONTRIBUTORS . . . . . . . . . . . . . . . . . . . . . . . PREFACE . . . . . . . . . . . . . . . . . . . . . . . . .
ix xi
Image Processing and Recognition Azriel Rosenfeld
1 . Introduction . . . . . . . . . . . . . . . . . . . . . . 2 . Digitization . . . . . . . . . . . . . . . . . . . . . . 3 . Coding and Approximation . . . . . . . . . . . . . . . 4 . Enhancement. Restoration. and Reconstruction . . . . . . 5 . Segmentation . . . . . . . . . . . . . . . . . . . . . . 6 . Representation . . . . . . . . . . . . . . . . . . . . . 7 . Description . . . . . . . . . . . . . . . . . . . . . . 8. Concluding Remarks . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
2 3 8 16 28 40 48 55 55
Recent Progress in Computer Chess
.
Monroe M Newborn
1 . Introduction . . . . . . . . . . . . . . . . . 2 . After Stockholm . . . . . . . . . . . . . . . 3 . Tree-Searching Techniques (Modifications to the Minimax Algorithm) . . . . . . . . . . . . . . 4 . Chess-Specific Information in Chess Programs . . 5 . Endgame Play . . . . . . . . . . . . . . . . 6. Speed Chess . . . . . . . . . . . . . . . . . 7 . The Microcomputer Revolution . . . . . . . . 8 . Final Observations and the Future . . . . . . . References . . . . . . . . . . . . . . . . . . V
. . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
59 62
. 92 . 99 . 100 . 106 . 110 . 113 . 114
vi
CONTENTS
Advances In Software Sclence
. .
M H Haletoad
1. Introduction . . . . . . . . . . . . . . . 2 . Basic Metrics . . . . . . . . . . . . . . 3.Volume . . . . . . . . . . . . . . . . . 4 . Potential Volume . . . . . . . . . . . . . 5 . Implementation Level . . . . . . . . . . 6. Language Level . . . . . . . . . . . . . 7. The Vocabulary-Length Equation . . . . 8. The Mental Effort Hypothesis . . . . . . 9. Extension to “Lines of Code” . . . . . . 10. Programming Rates versus Project Size . . 11. Clarity . . . . . . . . . . . . . . . . . . 12. Error Rates . . . . . . . . . . . . . . . 13. Measurement Techniques . . . . . . . . 14. The Rank-Ordered Frequency of Operators 15 . The Relation between r), and q2 . . . . . 16. The Use of q2* in Prediction . . . . . . . 17. Grading Student Programs . . . . . . . . I8 . Semantic Partitioning . . . . . . . . . . 19. Technical English . . . . . . . . . . . . . 20 . Learning and Mastery . . . . . . . . . . 21 . Text File Compression . . . . . . . . . . 22 . Top-Down Design in Prose . . . . . . . . 23 . Conclusions . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i19 120 122 122 123 125 126 129 130 132 133 136 141 143 146 148 150 153
154 158
161 162 166 168
Current Trends In Computer-Asdsted Instructlon Patrkk Supper
1. Introduction . . . . . . . . . . . . . . . . 2 . CAI in Elementary and Secondary Education 3. CAI in Postsecondary Education . . . . . . 4 . Current Research . . . . . . . . . . . . . . 5 . The Future . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . .
. . . . . . 173 . . . . . . . 175 . . . . . . . 185
. . . . . . . . . . . . . . . . . .
199 222 225
CONTENTS
vi i
Software in the Soviet Union: Progress and Problems
. .
S E Goodman
1 . Introduction . . . . . . . . . . . . . . . . . . . . . . .
2. 3. 4. 5.
A Survey of Soviet Software . . . . Systemic Factors . . . . . . . . . . Software Technology Transfer . . . ASummary . . . . . . . . . . . . References . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
231
. . . . . 233 . . . . 249 . . . . . 268 . . . . 278 . . . . 281
INDEX . . . . . . . . . . . . . . . . . . . . . . 289 AUTHOR INDEX . . . . . . . . . . . . . . . . . . . . . . 295 SUBJECT CONTENTS OF PREVIOUS VOLUMES . . . . . . . . . . . . . . 303
This Page Intentionally Left Blank
Contrlbutors to Volume 18 Numbers in parentheses indicate the pages on which the authors' contributions begin.
SEYMOUR E. GOODMAN ,* Woodrow Wilson School of Public and Internritional Affairs, Princeton University, Princeton, New Jersey 08540 (231) M. H. HALSTEAD,** Department of Computer Sciences, Purdue University, West Lafayette, Indiana 47907 (119) MONROEM . NEWBORN, School of Computer Science, McGill University, Montreal, Quebec, Canada H3A 2K6 (59) AZRIELROSENFELD, Computer Science Center, University of Maryland, College Park, Maryland 20742 ( I ) PATRICK SUPPES, Institute for Mathematical Studies in the Sociril Sciences, Stanford University, Stanford, California 94305 ( I 73) STUARTZWEBEN , Department of Computer and Information Science, Ohio State University, Columbus, Ohio 43210 (119)
* Present address: Department of Applied Mathematics and Computer Science, University of Virginia, Charlottesville, Virginia 22903. ** Deceased. The contribution was further edited by Stuart Zweben. ix
This Page Intentionally Left Blank
Preface
Volume 18 ofAdvances in Computers continues to treat in some depth a number of dynamic and significant areas of current interest in the computer field, thus continuing a long and unbroken series that began in 1%0. In each of the volumes that have appeared thus far, important, topical areas having long-range implications in the computer field have been treated. This volume is no exception. Appearing here are articles on software, considered both as a science and as a management concern in the Soviet Union, as well as articles describing a number of applications of computer science including image processing and recognition, computer chess, and computer-assisted instruction. Computers are being used more extensively to process and analyze pictures. This is, in fact, one of the more important computer applications. The subjects of image processing (improving the appearance of a picture) and image recognition(pr0viding a description of the picture) are treated by Azriel Rosenfeld in the first article. Image processing and analysis differ from computer graphics in that pictures are input in the former rather than described in the latter. Professor Rosenfeld describes many of the ideas and methods used in the field and reviews the basic techniques involved. Rosenfeld concludes that the field has a broad variety of applications in such fields as astronomy, character recognition, industrial automation, medicine, remote sensing, and many others. He expects a continued growth both in the scope and the number of practical applications. In his article on computer chess, Monroe Newborn summarizes the history of chess playing by computer, an idea that has fascinated man for hundreds of years. In the late 1950s rudimentary working chess-playing programs were developed based on the ideas of Shannon and Turing. Progress since then has been rapid due to both hardware and software improvements and interest has mushroomed. Newborn particularly concentrates on the dynamic events of the past few years. He discusses and summarizes various tree-searching techniques and concludes with a discussion of recent microcomputer chess. He points out that present programs are playing essentially at the Expert level and predictsconservatively-that by 1984 they will be playing at the Master level and at the Grandmaster by 1988. Software science as described by Maurice Halstead is a foundation for software engineering but is not synonymous with it. It is an intellectually xi
xii
PREFACE
exciting discipline currently undergoing rapid development. He defines “software” as any communication that appears in symbolic form in conformance with the grammatical rules of a language. He furthermore goes on in his article to define a science. Professor Halstead presents a summary and overview of the present state of software science. He encourages the practitioner to engage in actual personal experimentation to convince himself of the science’s validity and indicates that there are no theorems and perhaps will never be any. However, he points out that a major attribute of software science is a total and complete lack of arbitrary constants or unknown coefficients among its basic equations which, furthermore, are characterized by utter simplicity. He concludes by stating that natural laws govern language and its use far more strictly than has generally been recognized. Patrick Suppes surveys current activities in computer-assisted instruction (CAI) and emphasizes the past five years. He discusses elementary and secondary education as well as postsecondary education and summarizes current research. Suppes emphasizes those activities requiring increasingly sophisticated programs, and concludes by forecasting the main trends in computer-assisted instruction. He expects that current hardware improvement and economics will have a major effect on CAI and predicts that by 1990 CAI will have widespread use in schools and colleges in the United States. By the year 2000, Professor Suppes predicts, it is reasonable to expect a substantial use of home CAI. Videodisks particularly will have a major effect on the field. He concludes by stating that by the year 2020, or shortly thereafter, CAI courses should have the features that Socrates thought so desirable long ago. A computer tutor will be able to converse with the individual student at great length. In the final article, Seymour Goodman treats a subject of great interest and importance about which most of us in the United States have only passing knowledge. In his discussion of software in the Soviet Union, Professor Goodman points out that it is only within the past decade that the Soviets have actually committed themselves to the production and use of complex computer systems on a scale large enough to pervade the national economy. He points out that the Soviets have made substantial progress in removing limitations due to hardware availability but have not yet made much progress in overcoming software development problems. He expects, as a consequence, that they will continue to borrow from foreign software technology. Professor Goodman states that economic and political factors are of considerable importance in Soviet software development. Soviet software development has followed the United States technical pattern but has differed greatly in time scale. He claims that major changes in the system will be necessary if the Soviet software
PREFACE
xiii
industry is to function effectively, but it is not clear to what extent such reforms will be allowed to take place. I am saddened by the sudden and unexpected loss of a good friend and a valued colleague, Maurice Halstead. Shortly after he sent me the final version of his article, “Advances in Software Science,” Maury was suddenly and fatally stricken. We all miss him both as a friend and as a leader in our profession. Software science, largely due to his productive research and that of his colleagues and students, has become an important and a rapidly growing area of interest with considerable application. Professor Halstead’s article is one of the last major contributions he had written. I am indebted to his colleague, Stuart H. Zweben of Ohio State University, who undertook the responsibility for the detailed editing of this article. It is my pleasure to thank the contributors of this volume. They have given extensively of their time and energy and thus have made this work an important and timely contribution to their professions. This volume continues the tradition established for Advcrnces in Computers of providing authoritative summaries of important topics that reflect the dynamic growth in the field of computer and information science and in its applications. I fully expect that in spite of its currency (or perhaps because of it), the volume will be of long-term interest and value. Editing this volume has been a rewarding experience. MARSHALL C. YOVITS
This Page Intentionally Left Blank
Image Processing and Recognition AZRIEL ROSENFELD Computer Science Center University of Maryland College Park. Maryland
1.
2.
3.
4.
5.
6.
7.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Digitization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . Coding and Approximation . . . . . . . . . . . . . . . . . . . . . . 3.1 Exactcoding . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Differencing . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Other Coding Schemes . . . . . . . . . . . . . . . . . . . . . . 3.6 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . Enhancement, Restoration. and Reconstruction . . . . . . . . . . . . 4.1 Grayscale Modification . . . . . . . . . . . . . . . . . . . . . . 4.2 Geometric Transformations . . . . . . . . . . . . . . . . . . . . 4.3 Noise Cleaning . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Deblurring . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Reconstruction from Projections . . . . . . . . . . . . . . . . . 4.6 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . I Pixel Classification . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Sequential Segmentation . . . . . . . . . . . . . . . . . . . . . 5.5 Fuzzy Segmentation . . . . . . . . . . . . . . . . . . . . . . . 5.6 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . . . Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Connectedness . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Representation by Runs; Component Labeling and Counting . . . . 6.3 Representation by Border Curves; Border Following, Chain Coding . 6.4 Representation by Skeletons; Thinning . . . . . . . . . . . . . . . 6.5 Segmentation of Curves . . . . . . . . . . . . . . . . . . . . . 6.6 Bibliographical Notes . . . . . . . . . . . . . . . . . . . . . . Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Geometrical Properties . . . . . . . . . . . . . . . . . . . . 7.2 Gray-Level-Dependent Properties . . . . . . . . . . . . . . . .
2 3 4 6 7 8 9 10 12 14 15
16 16 17
.
18
19 23 26 27 28 29 33 34 37 38 40
.
. . . . .
40
41 42 43 45 46 48
.
. . . .
48
49 51
1
.
ADVANCES IN COMPUTERS. VOL . 18
Copyright @ 1979 by Academic Pren Inc . All rights of reproduction in any form reserved. ISBN n-12-n1211x-2
2
8.
AZRlE L ROSEN FE LD 7.3 Relations and Models . . . . 7.4 Bibliographical Notes . . . . Concluding Remarks . . . . . . References . . . . . . . . . . .
. . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . . . .
54 55 55 55
1. Introduction
Computers are being increasingly used to process and analyze pictures. This chapter reviews some of the basic techniques that are used for image processing and pictorial pattern recognition by digital computer. (We will use the words image and picture interchangeably.) The material is presented from a technique-oriented standpoint; applications are not discussed. A picture is defined by specifying how its brightness (or color) varies from point to point. (We will deal almost entirely with black-and-white pictures in this paper.) Before a picture can be processed, it must be converted into a discrete array of numbers representing these brightness values, or shades of gray, at a grid of points. This process of digitization will be discussed further in Section 2. The resulting array is called adigitnl picrure, its elements are called points or pixels (short for “picture elements”), and their values are called gray levels. In order to represent adequately the original picture (in redisplay), the array of pixels must generally be quite large (about 500 x 500, for an ordinary television picture), and there must also be a relatively large number of distinct gray levels (on the order of 100). Image coding deals with methods of reducing this large amount of information without sacrificing the ability to reproduce the original picture, at least to a good approximation (see Section 3). One of the principal goals of image processing is to improve the appearance of the picture by increasing contrast, reducing blur, or removing noise. Methods of doing this come under the heading of image enhancement or image restoration. A related problem is that of reconstructing a picture from a set of projections; such image reconstruction techniques are used in radiology, for example, to display cross sections of the body derived from a set of x-ray images. These subjects are covered in Section 4.
In image processing, the input and output are both pictures (the output being, for example, an encoded or enhanced version of the input). In image recognition, the input is a picture, but the output is some type of description of the picture. Usually this involves decomposing the picture
IMAGE PROCESSING AND RECOGNITION
3
into parts (segmentation) and measuring properties of the parts (their sizes, shapes, colors, visual textures, etc.). Image recognition techniques will be reviewed in Sections 5-7; Section 5 discusses segmentation, Section 6 deals with representations of picture parts, and Section 7 treats property measurement. It should be pointed out that both image processing and analysis have pictures as input, in the form of arrays of gray levels. This distinguishes them from computer gruphics, in which the input is a description of a picture (e.g., a set of functions that define curves, regions, or patches, and possibly also other functions that define the brightness variations over these regions), and the output is a display of that picture. Computer graphics will not be discussed in this paper. We will also not cover picture processing hardware or software, or the processing of pictures by nondigital means (photographic, optical). Many image processing techniques involve advanced mathematical concepts. For example, some of the important approaches to image coding, enhancement, and description that will be reviewed in this paper make use of two-dimensional Fourier transforms. It will be assumed that the reader is familiar with these transforms and with the basic ideas of spatial frequency analysis and of convolution. On the other hand, we will not cover techniques that involve other types of transforms (WalshHadamard, Karhunen-Loeve, etc.), or that are based on modelling ensembles of images by stochastic processes. In general, mathematics will be avoided unless it is needed to define precisely or clarify a technique, and emphasis will be placed on methods whose definitions require little or no mathematics. This chapter can cover only a selection of the ideas and methods used in digital image processing and analysis. Its purpose is to give a nontechnical introduction to some of the basic techniques. Suggestions for further reading can be found in the bibliography at the end of the paper. References will be given at the end of sections in the text to textbook chapters and papers (most of them recent) where further information can be found about the techniques that are described or mentioned, including those that are beyond the scope of the present paper. Over 3000 additional references can be found in the bibliographies cited at the end of the paper. 2. Dlgltlzatlon
Digitization is the process of converting a real picture or scene into a discrete array of numbers. It involves
4
AZRIEL ROSENFELD
(a) sampling the brightness of the scene at a discrete grid of points, and (b) quantizing each measured brightness value so that it becomes one of a discrete set of “quantization levels.” In this section we discuss sampling and quantization as mathematical operations. Hardware devices for image sensing, sampling (sensor arrays, scanners, TV cameras), and quantizing (analog-to-digital converters) will not be treated, nor will devices for displaying and interacting with digital pictures . 2.1 Sampling
In general, any process of converting a picture into a discrete set of numbers can be regarded as “sampling,” but we will adopt the narrower definition given above. It should be pointed out, incidentally, that one cannot really measure scene brightness “at a point”; rather, one measures some sort of (weighted) average brightness over a small neighborhood of the point. The sample points are almost always assumed to be the points of a regular square (or sometimes hexagonal) grid. How densely spaced should the grid points be? If they are spaced too far apart, some of the information in the scene may be lost or misrepresented (aliased ). According to the sampling theorem, if the grid spacing is d , we can exactly reconstruct from the resulting samples all the spatial frequency components of the image whose frequencies (in cycles per unit distance) do not exceed 1/2d. Thus if we want to represent correctly a given spatial frequency in the sampled image, we should sample at least at twice that frequency. However, as we shall next see, if frequencies greater than 1/2d are also present, they can introduce spurious information that affects the lower frequencies. To illustrate the possible effects of undersampling, we shall consider two simple one-dimensional examples. The function sin x has period 2 n , so that its spatial frequency is 1/2m cycles per unit distance. Suppose that we sample it at pointsx = 0, 3m/2,3m, %/2, tin, . . . , which are spaced 3 d 2 apart, i.e., further apart than the spacing required by the sampling theorem, which is m. The values of sinx at those points are 0, - 1,0, 1 , 0, - 1, 0, I . . . . These are just the values we would obtain if we sampled the function sin W 3 ) , which has period 6 r , at spacing 3 d 2 , which is twice as frequent as necessary. Thus when we undersample sinx, the values we get are the same as if we had oversampled sin @/3), a lower frequency that is not actually present. This phenomenon is called aliasing, since it involves one frequency appearing to be a different one. In two dimensions, not only the frequency but also its orientation can change. MoirP patterns
IMAGE PROCESSING AND RECOGNITION
5
FIG.1. Aliasing. When the dot pattern (a) is superimposed on the bar pattern (b), with the latter rotated slightly clockwise, the result is as shown in (c); a lower frequency, rotated counterclockwise, appears to be present. Here the blank spaces in the dot pattern act as sampling points on the bar pattern, but their spatial frequency is not high enough, so that aliasing results. From Legault, 1973.
are an everyday example of aliasing (usually involving square waves rather than sine waves); an example is shown in Fig. 1. As another example of the effects of undersampling, suppose that instead of sampling the values of sinx at single points (Le., taking averages over very small neighborhoods of those points), we measure the average values of sinx over intervals of length 37r, which is larger than the period of sin x. When such an interval is centered at a peak of sinx, say at d 2 , it extends from --P to 27r, and includes two valleys and one peak of the sine
1'"
function; thus the average over this interval is sin x dx137r = - 2 1 3 ~ . -n On the other hand, when the interval is centered at a valley, it includes two peaks and a valley, so that the average is +2/3tr. Thus if we compute (overlapping) averages centered at the points 7r12, 3 ~ 1 2 ,5 ~ 1 2 ,. . . , which are the proper spacing apart for sampling sin x , we obtain the sequence of values -2/37r, 2/3-~,-2131~~. . . , which have the proper frequency but the wrong phase-they are negative when sin x is positive and vice versa. In other words, if we sample sin x at the correct frequency, but allow the samples (averages) to overlap, the values we obtain are essentially the same a s if we had sampled sin@ + -P) using nonoverlapping samples. This again illustrates the misleading results that can be obtained if improper sampling is performed. This particular phenomenon is known as "spurious resolution" since we are detecting sinx (with the wrong phase) even though our samples are too coarse. A square-wave example is shown in Fig. 2.
6
AZRlEL ROSENFELD
FIG. 2. Spurious resolution. The bars in (a) are 4 pixels wide. Parts (b)-(fl show the results of locally averaging (a) over a square neighborhood of each pixel of size 3 x 3 , 5 x 5 , 7 x 7, 9 x 9, and I I x 1 I , respectively. Note that in (e)-(0, where the averaging neighborhood size exceeds the period of the bar pattern, the bars are in the wrong positions [where the spaces were in (a)].
2.2 Quantization
Let z be a measured brightness value, and let z l < zz < . . . < zk be the desired quantization levels. Let i be the z j that lies closest to z . (If z is exactly midway betweenz, andz],,, we can arbitrarily say that i = z,.) We quantize z by replacing it by 2. The absolute difference )z - i 1 is called the quantization error associated with z. Ordinarily, the quantization levels zl, . . . , z k are assumed to be equally spaced. However, if the brightness values in the scene do not occur equally often, we can reduce the average quantization error by spacing the levels unequally. In fact, let Z i be the interval consisting of those points z that lie closest to zi; then if i i is small, the average quantization error associated with these zs is small, and vice versa. (It is straightforward to express this observation quantitatively; the details are omitted here.) Thus in a region of the gray-level range where zs occur frequently, we should space the quantization levels close together to insure a small average error. On the other hand, in a region where zs are rare, we can afford to space the levels far apart, even though this yields large quantization errors for such zs, since the error averaged over all zs will not be greatly increased by these rarely occurring large errors. The unequal spacing of quantization levels to reduce average quantization error is sometimes called tapered quantization. Using too few quantization levels results in objectionable “false contours,” which are especially conspicuous in regions where the gray level
IMAGE PROCESSING AND RECOGNITION
7
changes slowly (see also Section 3.2). This is illustrated in Fig. 3. Figure 4 shows the improvement that can be obtained by using tapered quantization. In digitizing color images, brightness values are obtained by scanning the image through three color filters (red, green, and blue), and each of these values is then quantized independently. Thus a digital color image is an array of triples of discrete values.
2.3 Bibliographical Notes Digitization is treated in Rosenfeld and Kak (1976, Chapter 4), and in Gonzalez and Wintz (1977, Chapter 2). A detailed discussion, also covering color images, can be found in Pratt (1978, Chapters 4 and 6). The two-
Fiti. 3. False contours, Parts (a)-(d) have 16, 8, 4. and 2 gray levels, respectively. Note the conspicuous gray level discontinuities in the background in parts (b) and (c).
8
AZRIEL ROSENFELD
FIG.4. Advantages of tapered quantization. Parts (a)-(d) are quantized into 16 equally spaced levels, 16 tapered levels, 4 equally spaced levels, and 4 tapered levels, respectively. From Huang, 1%5.
(or multi-) dimensional sampling theorem is due to Peterson and Middleton (1%2). Aliasing problems in image sampling are discussed in Mertz and Grey (1934); for a more recent treatment see Legault (1973). Nonuniform spacing of quantization levels to minimize quantization error is treated in Max (1960). 3. Coding and Approximation
The aim of image coding is to reduce the amount of information needed to specify a picture, or at least an acceptable approximation to the picture.
IMAGE PROCESSING AND RECOGNITION
9
Compact encoding schemes allow one to store pictures in less memory space, or to transmit them in less time (or at lower bandwidth). They can be used in television or facsimile transmission, provided that the receiver can be designed to incorporate a decoding capability. This section discusses several types of image coding techniques: (a) Exact techniques. These take advantage of the nonrandomness of images to devise codes that are, on the average, more compact than the original images, while still permitting exact reconstruction of the original. (b) Approximation techniques, which vary the fineness of sampling and quantization as a function of image content, so as to take advantage of the perceptual limitations of the human observer. (c) Diferencing and transform techniques. These convert the image into a modified form in which greater advantage can be taken of approaches (a)-( b) . Special coding methods, including schemes designed to handle binary images, time-varying images, etc., are also briefly discussed. Another class of approaches to image compression is based on piecewise approximation of the image gray levels by simple functions. Each piece of the image can thus be represented by a small amount of information (namely, the parameters of the approximating function for that piece). This approach will not be discussed further here. 3.1 Exact Coding This section describes several methods of representing images compactly by taking advantage of their nonrandomness to devise codes that, on the average, are more compact then the original image. Three such approaches are the following: (a) Shannon-F'uno-Hu$man Coding If the gray levels in an image do not all occur equally often, it i s possible (at least in principle) to compress the image by using short codes for the frequently occumng gray levels, and longer codes for the rarer ones. As a very simple example, suppose there are four levels, 0, 1 , 2, and 3, and their relative frequencies of occurrence are 3/4, 1/12, 1/12, and 1/12, respectively. If we represent the levels by binary numbers in the usual way (0 = 00, 1 = 01, 2 = 10, 3 = l l ) , each of them is a two-bit number, so that the number of bits needed per pixel is exactly 2. On the other hand, suppose that we use the codes 0, 10, 110, and 1 1 1 for 0, 1 , 2, and 3, respectively. Thus 3/4 of the pixels will require only one bit each; 1/12 will require two bits each; and 1/2 + .1/2 = Q will require three bits each. The average number of bits needed per pixel is thus ( 1 . 314) + (2 1/12) + (3 . 1/6) = 3/4 + 1/6 + 1/2 =
10
AZRl E L ROSENFE LD
1 5/12, which is less than the two bits per pixel needed for the ordinary binary number representation. In general, the more “biased” (i.e., unequal) the frequencies of occurrence of the two gray levels, the more compression can be achieved by this approach. (b) Run length coding Any image row consists of a succession of constant gray level runs, and the row is completely determined if we specify the sequence of lengths and gray levels of these runs. If the runs, on the average, are sufficiently long, this run length code representation is more compact than the original array representation. For example, suppose that there are 64 gray levels and that the rows are 512 pixels long. Thus the length of any run is between 1 and 512, and can be specified by a 9-bit number. Suppose there are r runs; then the run length code requires 6r bits to specify the gray levels of the runs and at most 9r bits to specify their lengths, a total of 1% bits, while the ordinary representation of the row as a string of 512 &bit numbers requires 3072 bits. Thus if 1% < 3072 (so that the average run length is greater than about 21), the run length code is more economical. (c) Contour coding Any image consists of a set of connected regions of constant gray level, and is completely determined if we specify the set of these regions. A region can be specified by its position and the chain code of its boundary (see Section 6.3). If there are sufficiently few regions, this contour code representation is more economical than the original array representation. For example, suppose that the image is 512 x 512 and has 64 gray levels, so that the array representation requires 3 . 219 bits. If there are r regions, each having a single boundary of average length I, then the contour code representation requires 6r bits to specify the regions’ gray levels, 18r bits to specify their positions, and 31r bits to specify their boundary chain codes, for a total of only 3(/ + 8)r bits; this may well be less than 3 219 (e.g., let l = 32, r = 40%). 3.2 Approximation
The methods described in Section 3.1 encode the original image without any loss of information; the image can be perfectly reconstructed from its code. On the other hand, the compression provided by these methods is relatively small except for images of special types (composed of relatively few regions of constant gray level, or having gray levels that occur very unequally often). In this and the following sections we discuss coding schemes that only approximately represent the image. Such schemes can yield high degrees of compression even if we require the approximation to resemble quite closely the original image.
IMAGE PROCESSING AND RECOGNITION
11
One of the key ideas that underlies image approximation techniques is that the fineness of sampling and quantization required to represent an image with sufficient faithfulness depends on the image content. In particular, sampling can be coarse in regions where the gray level varies slowly; and quantization can be coarse in regions where the gray level fluctuates rapidly. These observations are illustrated in Fig. 5. To illustrate the application of this idea to image coding, suppose that
FIG.5 . Trade-off between sampling and quantization. Parts (a) have 128 x 128 samples and 64 quantization levels; parts (b) have 256 x 256 samples and 16 quantization levels. The smaller number of quantization levels is more acceptable in the crowd scene than in the face; the lower sampling rate is more acceptable in the face than in the crowd. From Huang et ( I / . , 1%7.
12
A 2 RIE L ROS ENFE LD
we compute the Fourier transform of an image, and break up the transform into low-frequency and high-frequency parts. The low-frequency portion corresponds to an image with slowly varying gray level, so that it can be sampled coarsely, while the high-frequency portion can be quantized coarsely. We can then recombine the two portions to obtain a good approximation to the original image. A variety of image coding schemes based on this idea have been proposed. In the next two sections we discuss methods of transforming an image so as to take greater advantage of both the exact coding and approximation approaches. 3.3 Differencing
Suppose that we scan the points of an image in sequence (i.e., row by row, as in a TV scan), and use the gray value(s) of the preceding point(s) to predict the gray value of the current point by some type of extrapolation (linear, for example). Since abrupt changes in gray level are relatively rare in most classes of pictures, the errors in this prediction will usually be small. We shall now discuss how a prediction process of this sort can be used to facilitate image compression. Suppose, for concreteness, that we simply use the value z,,-~ of the preceding pixel in the scan as our prediction of the value zn of the current pixel. The error in this prediction is then simply the difference zn - z , - ~ . The image can be exactly reconstructed if we know the gray level z1 of the first pixel and the sequence of differences z Z - zl, z s - z2, . . . . Thus the sequence of differences can be regarded as a transformed version of the image. The difference sequence itself is not a compact “encoding” of the image. In fact, if the original gray levels are in the range [O,z], the differences are in the range [-z, z], so that an extra bit (a sign bit) is required to represent each difference value. However, the differences do provide a basis for compact encoding, for two reasons: (a) The differences occur very unequally; as pointed out earlier, small differences will be very common, while differences of large magnitude will be quite rare. Thus the Shannon-Fano-Huffman approach can be used to great advantage in exact encoding of the difference values. (b) When large differences do occur, the gray level is fluctuating rapidly; thus such differences can be quantized coarsely, so that fewer quantization levels are required to cover the range [-z, z].
IMAGE PROCESSING AND RECOGNITION
13
FIG.6 . Difference coding. Part (a) has 256 x 256 8-bit pixels. Parts (b)-(d) were reconstructed by summing the sequence of differences on each row, where the differences were quantized to 3 bits, 2 bits, and 1 bit, respectively.
A large number of “difference coding” schemes have been developed that take advantage of these properties of the gray level differences (or, more generally, prediction errors) in typical pictures. An example is shown in Fig. 6. The main disadvantage of such schemes is that errors (due to coarse quantization, or noise in the transmission, of the difference values) accumulate, since the image is reconstructed by cumulatively summing these values. To avoid this, the differencing process should be reinitiated frequently, by specifying the actual gray level of a pixel and then once again taking successive differences starting with that pixel.
14
AZ RI E L ROSE NFE LD
3.4 Transformations
Any invertible transform of an image can serve as the basis for a coding scheme. The general idea is as follows: We take the transform Tfof the given image f; we encode (andlor approEimate) TJ obtaining Tk say; and we apply the inverse transform T-’ to 2-f when we want to reconstruct f. Evidently, the usefulness of this trcrnsforin coding approach depends on Tf being highly “compressible.” As an example of this approach, let T be the Fourier transform. In TJ the magnitudes of the different Fourier coefficients have widely different ranges (very high for low frequencies, very low for high ones). Thus we can quantize each coefficient individually, using a number of quantization levels appropriate to its range (and we can even ignore some coefficients completely, if their ranges are sufficiently small). It should also be noted that when we quantize the high-frequency coefficients coarsely, we are in effect coarsely quantizing the parts of the picture where the gray level is fluctuating rapidly, and this is consistent with the remarks on approximation made earlier. This method of coding in the Fourier transform domain is illustrated in Fig. 7. Coding schemes of this sort have been developed for a variety of image transforms. They have the advantage that when errors (due to coarse quantization or noise) occur in a particular coefficient, their effects are distributed over the entire image when it is reconstructed using the inverse transform; thus these effects will tend to be less conspicuous than if
Fic,. 7. Fourier transform coding. Figure 6a was divided into 256 blocks of 16 x 16 pixels. In (a). each block has been reconstructed from the first 128 of its 256 Fourier coefficients; in (b), only 64 coefficients per block were used. From Wintz. 1972.
IMAGE PROCESSING AND RECOGNITION
15
they were concentrated at particular locations in the image. The principal disadvantage of transform coding is that the inverse transform process used to reconstruct the image is relatively complex (as compared to difference coding, where the reconstruction involves simple cumulative summing). 3.5 Other Coding Schemes A wide variety of other image coding schemes have been developed. Some of these are applicable to arbitrary images, while others are designed for special classes of images (e.g., binary, black and white only, with no intermediate gray levels) or for special situations (e.g., sequences of images, as in live TV). In this section, a number of such schemes will be briefly mentioned.
(a) Dirhev coding Coarse quantization becomes more acceptable if we add a pseudorandom “noise” pattern, of amplitude about one quantization step, to the image before quantizing it, and then subtract the same pattern from the image before displaying it. This method is illustrated in Fig. 8. A variety of related “ordered dither” schemes have been developed that give the effect of an increased number of gray levels. (b) Coding of binary images Many of the standard approaches to image coding (e.g., Shannon-Fano-Huffman, coarse quantization, difference or transform coding) are not useful for two-valued images. (Run length and contour coding, on the other hand, should be especially useful, since binary images should consist of relatively few runs or connected
FIG.8. Use of pseudorandom “noise” to break up false contours. From Huang, 1965. The quantization is to 8 levels.
AZRlE L ROSENFELD
16
regions.) In addition, a variety of specialized coding schemes have been developed for binary images. Such schemes can also be applied to the “bit planes” representing specific bits in the gray level values of a general image. (c) Interfrcime coding In sequence of images of a scene taken at closely spaced time intervals, changes from one image to the next will be relatively limited. One can thus use the preceding image(s) to predict the gray levels of the current image, and encode only the differences between the predicted and actual values (compare our discussion of difference coding in Section 3.3). Of course, one can also use combinations of coding schemes, or adaptive techniques in which different schemes are used for different types of images or regions. 3.6 Bibliographical Notes
Image coding is covered in Rosenfeld and Kak (1976, Chapter 5 ) , Gonzalez and Wintz (1977, Chapter 6), and in considerably greater detail in Pratt (1978, Chapters 21-24 and Appendix 3). Piecewise approximation techniques are treated in Pavlidis (1977, especially Chapters 2 and 5). The literature on image coding is quite large; a 1971 bibliography (Wilkins and Wintz, 1971) lists about 600 references. Recent review papers have dealt with adaptive image coding techniques (Habibi, 1977); binary image coding (Huang, 1977); and color image coding (Limb et ul., 1977).
4. Enhancement, Restoration, and Reconstruction
The goal of image enhancement is to improve the appearance and usefulness of an image by, e.g., increasing its contrast or decreasing its blurredness or noisiness. Related to enhancement are methods of deriving a useful image from a given set of images; an important example is the “reconstruction” of cross sections of an object by analyzing a set of projections of that object as seen from various directions. This section discusses several types of image enhancement (and reconstruction) techniques: (a) Grayscale modification, e.g., for contrast stretching; (b) Geometric transformation, for distortion correction; (c) Noise cleaning;
IMAGE PROCESSING AND RECOGNITION
17
(d) Deblumng; “image restoration”’; and (e) Reconstruction from projections. These will be treated in the following subsections. The measurement of image quality will not be covered here. 4.1 Grayscale Modification
An “underexposed” image consists of gray levels that occupy only a portion of the grayscale. The appearance of such an image can be improved by spreading its gray levels apart. Of course, if the image is already quantized, this does not introduce any new information. However, it should be pointed out that (depending on the type of detail present in an image) adjacent gray levels are usually not distinguishable from one another when the image is displayed. Spreading the levels apart makes them distinguishable and thus makes the existing information visible. Even when the image occupies the entire grayscale, one can still stretch its contrast in some parts of the grayscale at the cost of reducing contrast in other parts. This is effective if the gray levels at the ends of the grayscale are relatively rare (as is usually the case), or if the information of interest is represented primarily by gray levels in the stretched range. In fact, if the grayscale is not equally populated, one can stretch the contrast in the heavily populated part(s) while compressing it in the sparsely populated parts(s); this has the effect of stretching contrast for most of the image, as illustrated in Fig. 9. The same effect is also achieved by remapping the image gray levels in such a way that each of the new levels occurs equally often. This “histogram flattening” (or equalization) technique is used both for image enhancement and for standardization of an image’s gray level distribution; it is illustrated in Fig. 10. Such operations can also be done locally by modifying the gray level of each point based on the gray level distribution in a neighborhood of that point. Since a person can distinguish many more colors than he can shades of gray, another useful method of “contrast enhancement” is to map the gray levels into colors; this technique is known as pseudocolor enhancement. Remapping of the grayscale (into itself) can also be used to correct arbitrary distortions to which an image may have been subjected. Another image enhancement task that involves gray level modification is the correction of images that have been unevenly exposed (due to I The term “restoration” is used to denote enhancement processes that attempt to estimate and counteract specific degradations to which an image has been subjected. Such processes generally have complex mathematical definitions.
18
AZRIEL ROSENFELD
Fic. 9. Contrast stretching. The middle third ofthe grayscale of (a) has been stretched by a factor of 2, while the upper and lower thirds have been compressed by a factor of 2.
optical vignetting, unequal sensitivity of the sensor(s), etc.). If the nature of the uneven exposure is known, it can in principle be corrected by applying an appropriate gray level adjustment to each point of the image. The nature of the uneven exposure can be determined using images of known test objects such as grayscales. 4.2 Geometric Transformations A geometric transformation is defined by a pair of functions x ' = 4(x, y ) , y ' = I,&, y ) that map the old coordinates (x, y ) into the new ones
FIG. 10. Histogram flattening. In Fig. 9a, the grayscale has been transformed so that each gray level occurs equally often.
IMAGE PROCESSING AND RECOGNITION
19
(x’, y’). If we want to perform such a transformation on a digital picture [so that the Cr. y ) s are points of a discrete grid], the difficulty arises that x ’ and y’ are in general not grid points. We can remedy this by mapping each (-r’, y ’) onto the nearest grid point, but unfortunately the resulting mapping of grid points onto grid points is no longer one-to-one; some grid points in the output may have more than one input grid point mapped onto them, while others have none. To avoid this problem, we need only make use of the inverse transformation x = WJ’, y ’ ) , y = q(x’, y’). We use this transformation to map each output grid point back into the input plane. The point can then be assigned, e.g., the gray level of the input point nearest to i t - o r , if desired, a weighted average of the gray levels of the input points that surround it. If an image has been geometrically distorted, and the nature of the distortion is known, it can be corrected by applying the appropriate geometric transformation (the inverse of the distortion) to the distorted image. The nature of the distortion can be determined using images of known test objects such as grids. This process of distortion correction is illustrated in Fig. 11. Similarly, two images can be regicstered with each other by identifying pairs of pixels whose neighborhoods match (see Section 5.3 on matching), and then defining a geometric transformation that maps each of these reference pixels on the first image into the corresponding pixel on the second image.
4.3 Noise Cleaning
Noise that is distinguishable from the information in an image is relatively easy to remove. For example, if the image is composed of large objects, and the noise consists of small specks (“salt and pepper”), we can detect the specks as pixels that are very different in gray level from most or all of their neighbors, and we can then remove the specks by replacing each such pixel by the average of its neighbors, a s illustrated in Fig. 12. (In a binary image, specks-as well as thin c u r v e s d a n be removed by a process of expanding and reshrinking; for an arbitrary image, this is analogous to performing a local MAX operation followed by a local MIN operation (or vice versa) at each pixel.) As another example, if the noise is a periodic pattern, then in the Fourier transform of the image it corresponds to a small set of isolated high values (i.e., specks); these can be detected and removed as just described, and the inverse Fourier transform can then be applied to reconstruct an image from which the periodic pattern has been deleted, as illustrated in Fig. 13. (This process is sometimes called notch filtering.) If the noise pattern is combined multiplica-
20
AZ RI E L ROSENFE LD
FIG.1 1 . Correction for geometric distortion: (a) distorted; (b) corrected. From O'Handley and Green, 1972.
IMAGE PROCESSING AND RECOGNITION
21
FIG.12. Removal of "salt and pepper" noise by selective local averaging: (a) original; (b) result of replacing each pixel by the average of its eight neighbors, provided it differed from at least six of its neighbors by at least three gray levels.
tively, rather than additively, with the image, we can take the log of the noisy image (so that the combination is now additive) before filtering out the noise; this approach is called homomorphic jiltering. Image noisiness can also be reduced by various types of averaging operations. For example, if we have several copies of an image that are identical except for the noise, averaging the copies reduces the variability of the noise while preserving the image information, as shown in Fig. 14. Similarly, local averaging (of each pixel with its neighbors) will reduce noise in uniform regions of an image; but it will also blur edges. To avoid this, one can, e.g.:
(a) Detect edges first, and then average each pixel only with those of its neighbors that lie in the direction along the edge (Fig. 15b). (b) Average each pixel only with those of its neighbors whose gray
(a) (b) (C ) FIG. 13. Removal of periodic noise by notch filtering: (a) original; (b) Fourier power spectrum of (a); (c) image reconstructed after removing from (b) the two small spots above and below the center.
AZRIEL ROSENFELD
22
(a)
(b)
(C)
FIG. 14. Smoothing by averaging over different instances of the noise. (a)-(c) are averages of 2, 4, and 8 independently noisy versions of Fig. 12a.
FIG. IS. Smoothing by selective local averaging: (a) original; (b) result of averaging at each pixel in the direction along the edge, if any; ( c ) result of averaging each pixel with its six neighbors whose gray levels are closest to its own;(d) result of median filtering.
IMAGE PROCESSING AND RECOGNITION
23
levels are closest to its own, since these are likely to lie on the same side of an edge as the given pixel (Fig. 1%). (c) Compute the median, rather than the mean, of the gray levels of the pixel and its neighbors; the median will generally not be influenced by the gray levels of those neighbors that lie on the other side of an edge (Fig. 15d). A sequential local averaging process known as KnlrrianJiltering can also be used for noise cleaning; but the description of this process involves concepts that are beyond the scope of this chapter. 4.4 Deblurring
A number of methods can be used to reduce blur in an image. Blurring weakens high-spatial frequencies more than low ones; hence highemphasis frequency filtering (strengthening the high frequencies relative to the low ones) should have a deblurring effect, as shown in Fig. 16. In particular, given an imagef, suppose that we blur it and subtract the resultingf fromf; this “cancels out” the low frequencies (since they are essentially the same inf andf), but leaves the high frequencies relatively intact (since they are essentially absent fromf). Thus adding the Laplacianf - f tof boosts the high frequencies relative to the low ones, with a resultant crispening off, as shown in Fig. 17. It should be pointed out that high-emphasis filtering will also enhance noise, since noise is generally strong at high-spatial frequencies; thus such techniques are best applied in conjunction with noise cleaning, or to images that are relatively noise free. Suppose that an imagef has been blurred by a known weighted local averaging process; this is equivalent to saying thatf has been convolved with a “point spread function” h (namely, the pattern of weights), so that the blurred image g is equal to the convolution h *f. By the convolution theorem for Fourier transforms, this implies that the Fourier transform G of g is the product HF of the Fourier transforms of h andf. Since h is known, so is H , and we can (in principle) divide G = H F by H to obtain F, from which we can get the originalf by inverse Fourier transforming. (To avoid overenhancing the noise that is present in the image, it is best to do the division only for relatively low spatial frequencies, where the noise is relatively weak.) This process is known as inverse$ltering; it is illustrated in Fig. 18. A more general process known as Wienerjltering can be used even when noise is present, but it will not be described in this chapter. In a blurred image, the observed gray levels are linear combinations of the original (unblurred) gray levels, so that in principle one can determine the original gray levels by solving a large system of linear equations. Such algebraic image restoration techniques have been extensively studied, bpt
24
AZRIEL ROSENFELD
((1)
FIG. 16. Deblurring by high-emphasis frequency
IMAGE PROCESSING AND RECOGNITION
(bl
filtering. From O'Handley and Green, 1972.
25
26
AZRlE L ROSENFE LD
(a)
FIG.17. Crispening by adding the Laplacian: (a) original images,f; (b) y‘ - .f where,ris a blurred version off.
they will not be discussed further here. A wide variety of other image restoration techniques have also been developed. 4.5 Reconstruction from Projections
We obtain a projecfion of an image by summing its gray levels along a family of (say) parallel lines. Each of the sums is a linear combination of the original gray levels; thus if we have enough such sums (obtained by taking many different projections), we can in principle determine the original gray levels by solving a large system of linear equations. Such algebraic techniques for reconstructing an image from a set of its projections have been investigated extensively; they will not be described further here. Another approach to reconstructing images from projections is based on the following property of Fourier transforms: Letf be an image, and let F be the two-dimensional Fourier transform off. Letfe be the projection off obtained by summing along the family of lines in direction 8 , and let FO’ be the cross section of F along the line through the origin in direction H + ( d 2 ) : then the one-dimensional Fourier transform off,, is just FO’. This means that if we have many projections off, we can take their Fourier transforms to obtain many cross sections of F . This gives us an approximation to F from which we can reconstructf by taking the inverse Fourier transform of F. Iff is a cross section of a solid object, we can obtain projections off’ by taking x rays of the object from various positions. By methods such as those described in the preceding paragraphs, we can reconstruct f from these projections. (This process is called tomography. ) An abdominal cross section reconstructed in this way is shown in Fig. 19.
IMAGE PROCESSING AND RECOGNITION
27
F I G . 18. Deblumng by inverse filtering: (a) unblurred dot; (b) blurred dot; ( c ) unblurred "5"; (d) blurred "5"; ( e ) result of dividing the Fourier transform of (d) by that of (b) for spatial frequencies below 2 cycleslmm; (0 result of doing the division for spatial frequencies up to 3 cycles/mm. From McGlarnery, 1%5.
4.6 Bibliographical Notes
Image enhancement is treated in Rosenfeld and Kak (1976, Chapter 6); in Gonzalez and Wintz (1977, Chapter 4); and in Pratt (1978, Chapter 12). Much work on image enhancement, including grayscale modification, geometric correction, noise cleaning, and deblurring, has been done at NASA's Jet Propulsion Laboratory (O'Handley and Green, 1972). A
28
AZRl EL ROSENFELD
FIG. 19. Abdominal cross section, reconstructed from a set of x rays. Courtesy of Pfizer Medical Systems, Columbia, Maryland.
classic paper on noise cleaning is Graham(1962); some recent work on simple noise cleaning techniques can be found in Davis and Rosenfeld (1978). Kalman filtering is treated in Rosenfeld and Kak (1976, pp. 235249); and Pratt (1978, pp. 441443). The use of the Laplacian for image sharpening is due to Kovasznay and Joseph (1955); and the application of homomorphic filtering to images has been extensively studied by Stockham (1972). Inverse and Wiener filtering, as well as algebraic and other image restoration techniques, are treated in Rosenfeld and Kak (1976, Chapter 7); Gonzalez and Wintz (1977, Chapter 5 ) ; Pratt (1978, Chapters 13-16); and Andrews and Hunt (1977). Reconstruction from projections is reviewed in Gordon and Herman (1974). 5. Segmentation
Picture descriptions nearly always refer to parts of the picture (i.e., objects or regions); thus .segmetiitrtion of a picture into parts is a basic step in picture recognition and analysis. This section reviews a variety of segmentation techniques, such as (a) Pixel classijication (itid clustering on the basis of gray level, color, or local properties:
IMAGE PROCESSING AND RECOGNITION
29
(b) Edge detection, for extraction of region or object outlines; (c) Pattern matching, for detection of specific local patterns (e.g., lines); the general topic of picture matching, for applications such as registration, change detection, etc., is also briefly discussed; (d) Sequential techniques, including curve tracking, region growing, and partitioning; and (e) Ficuy techniques (sometimes called “relaxation methods”). 5.1 Pixel Classification
Pictures can be segmented by classifying their pixels on the basis of various properties, such as lightness/darkness, color, or local property values computed on the pixels’ neighborhoods. If the desired classes are not known n priori, cluster detection techniques can be applied to the set of property values. A general treatment of pattern classification and clustering will not be given here. To illustrate this idea, consider a class of pictures that contain dark objects on a light background (alphanumeric characters on paper, chromosomes on a microscope slide, etc.) or vice versa (clouds over the ocean, as seen by a satellite). Here we want to assign the pixels into “dark” and “light” classes based on the property of gray level. If the classes are known in advance (e.g., we might know the statistical properties of the brightness of ink and paper under a given illumination), we can use standard pattern classification techniques to assign each pixel to the most likely class. If not, we can analyze the frequencies of occurrence of the gray levels and look for peaks corresponding to light and dark subpopulations of pixels; we can then choose a gray level threshold between these peaks, so as to separate these subpopulations from each other. This is illustrated in Fig. 20. If the illumination varies from one part of the picture to another, we can do this analysis locally to find a set of thresholds, and then interpolate between these to obtain a smoothly varying threshold. This thresholding method of classifying pixels into light and dark classes has many refinements. For example, suppose that for each pixel we measure both gray level and rate of change of gray level (see Section 5.2 on edge detection). If we look at frequencies of occurrence of gray levels only for those pixels at which the rate of change is low, we should be able to detect peaks more easily, since such pixels are probably in the interiors of the objects or background, rather than on object/background borders. Conversely, the pixels for which the rate of change is high often do lie on borders, so that their average gray level may be a good threshold. These ideas are illustrated in Fig. 2 1. Thresholding (using more than one thresh-
++ +++ +
++ + +++ + I
::
*++ ++
v
Q,
h
E gl c
vi .c
IMAGE PROCESSING AND RECOGNITION
31
FIG.21. Use of rate of change of gray level to facilitate thresholding: (a) original picture; (b) histogram of (a); (c) scatter plot of gray level (across) versus its rate of change (down); (d) histogram of gray levels of those pixels in (a)at which the rate of change of gray level is zero: (e) histogram of gray levels of those pixels in (a) having rates of change of gray level in the highest 20%.
FIG. 22. Pixel classification in color space: (a) red, green, and blue components of a picture; (b) projections of color space on the red-green, green-blue, and blue-red planes, showing clusters; (c) pixels belonging to each of five major clusters, shown in white.
32
AZRIEL ROSENFELD
old) can also be applied in cases where there are more than two peakse.g., in microscope images of white blood cells, where there are peaks corresponding to nucleus, cytoplasm, and background. In a color picture, the color of each pixel has red, green, and blue components, so that a pixel can be characterized by a point in a threedimensional color space. (Analogous remarks apply to pictures obtained from rnultispectral scanners, where for each pixel we have a k-tuple of reflectivity values in various spectral bands.) If these color values form clusters, as they usually do, we can partition the color space (e.g., with planes) so as to separate the clusters, and thus classify the pixels. An example is shown in Fig. 22. Pixels can also be classified based on properties computed over their neighborhoods. For example, edge detection (Section 5.2) can be regarded as pixel classification based on the rate of change of gray level; similarly, line detection (Section 5.3) is pixel classification based on degree of match to an appropriate local pattern. We can also segment a picture into “smooth” and “busy” regions by using some local measure of “busyness” as a pixel property. (In practice, the values of such measures are highly variable even within a busy region; but we can reduce this vanability by locally averaging the values.) The use of local busyness, in
FIG.23. Pixel classification in gray level-busyness space: (a) smoothed busyness values for Fig. 22; (b) gray level-busyness space, showing clusters; (c) pixels belonging to each of five major clusters, shown in white.
IMAGE PROCESSING AND RECOGNITION
33
conjunction with gray level, to segment a picture is illustrated in Fig. 23. Local averages of local property measures are generally useful in segmenting a picture into differently textured regions; some simple examples are given in Fig. 24.
5.2 Edge Detection Abrupt transitions between regions in a picture, or between objects and their background, are characterized by a high rate of change of gray level (or color, or some other local property value). Thus we can detect these "edges" by using local differencing operations to measure the rate of change of gray level (or other property value) at each pixel. If we know the rates of change A,, A, in two perpendicular directions, then the maximum rate of change (the gradient) in any direction is given by (A1' + A22)1/2, and the direction of this maximum is tan-I(A2/Al). [For computational simplicity, (A,, + Az')'P is often approximated by IAl 1 + lA2 or by max( A, 1, A, ).I A, and A2 can be defined in a vanety of ways; for example, the Roberts operator uses A&, y ) = f ( x , y ) - f ( x + 1,y + 1) and A&, y ) = f ( x + 1,y) - f ( x , Y + 1) (note that these are in the two diagonal directions) to estimate the gradient at (x + t, y + it), while the Sobel operator uses A1k,y ) = cfor - 1,y + 1) + 2 f ( x - 1, Y ) +for - 1 , Y - 1)1 - Vor + 1,Y + 1) + Yor + 1 , Y ) + 1,y - l)] andA,(x,y)=If(x- l , y + 1 ) + 2 f ( x , y + l ) + f ( x + I , y + 1) 1- L f ( x- I, y - 1) + 2 f ( x , y - 1) + f ( x - l,y - l)] to estimate the gradient at (x, y). Similar operators are obtained if we least-squares fit a surface (e.g., plane
I
1
1 1
+for
FIG. 24. Use of local averages of local property values for texture segmentation: (a) densely dotted region on a sparsely dotted background;(b) local average gray level in (a); (c) result of thresholding (b); (d) black-and-white noise on a background of grayscale noise; (e) gray level gradient values of (d); (0result of locally averaging and thresholding (e).
34
AZRIEL ROSENFELD
FIG.25. Edge detection: (a) original pictures; (b) Roberts gradient values.
or quadric) to the gray levels in a neighborhood o f b , y), and compute the gradient of this surface.2The results of applying the Roberts operator to several pictures are illustrated in Fig. 25. Color edges are detected by applying difference operators to each color component and suitably combining the outputs of these operators. A variety of “texture edges” can be detected by locally averaging the values of various local properties, and taking differences of these averages, as illustrated in Fig. 26. Other statistics of the local property values can be used instead of averages. Many other approaches to edge detection have been investigated, based on such concepts as recursive filtering and maximum-likelihood decision. A variety of specialized difference operators have also been developed, e.g., involving comparisons of differences taken in various directions, or of the differences obtained after various amounts of averaging; however, details will not be given here. 5.3 Pattern Matching
Another form of segmentation is the detection of specified patterns of gray levels in a picture. This involves measuring the degree of match between the given pattern and the picture in every possible position. This process is sometimes called template mutching. Note that the pattern must have a specific size and orientation. The match between a picturef and a pattern g can be measured by computing their correlation l , f g l ( v Zg2)1’2;alternatively, the mismatch - R ) ~ .These between them can be measured by, e.g., I: (f- g or measures are maximized (or minimized) whenf and g match exactly, but
I I:u
* Another approach to edge detection is to find a best-fitting step edge in a neighborhood of (x. y ) . The
given here.
details of this Hueckel operator qpproach are rather complicated, and will not be
IMAGE PROCESSING AND RECOGNITION
35
F I G .26. Texture edge detection: (a)-(b) differences of average gray levels for Figs. 24a. Thin edges can be obtained by suppressing nonmaxima.
they often do not discriminate cleanly between matches and nonmatches, as illustrated in Fig. 27. Better discrimination can usually be achieved by matching outlines or difference values rather than “solid” patterns. For examples, to detect a pattern of 1s on a background of Os, (or, in general, high values on a background of low values), we can represent the pattern by + 1s just inside its border and - 1s just outside; when this pattern of 5 1s is correlated with the picture, we get high positive values only at positions of good match.3 Still better discrimination can be obtained by incorporating logical conditions into the match measurement, rather than simply computing the correlation; thus, in the example just considered, we might require that each point of the pattern have higher gray level than each of its (nonpattern) neighbors. An important application of pattern matching is the detection of lines or curves in a picture. For example, to detect thin high-valued vertical lines on a low-valued background, we can correlate with the pattern -1 -1 -1
1 -1 1 -1 1 -1
of +Is, or we can require that each of the three pixels in the middle column have higher value then each of its neighbors in the two outer columns. The latter approach is preferable, since the correlation method also responds to patterns other than lines, e.g., points and edges. To detect lines in other orientations, analogous patterns (or sets of conditions) can be used, e.g., -1 1 -1
-1 1 -1
-1 -1 1 , - 1 -1 1
1 1 -1 -I 1 - 1 , - 1 1 - 1 -1 -1 -1 -1 I -1
These observations can be justified mathematically by the rnarchedfilrer fheorern and its generalizations. which tell us what (derived) patterns should be correlated with a picture in order that specified detection criteria be maximized at the positions where a given pattern is present.
36
AZRIEL ROSENFELD
(a)
(b)
(C
1
FIG.27. Matching by correlation: (a) picture; (b) template; (c) correlation of (a) and (b) (note the many near misses).
in the horizontal and diagonal directions. Arbitrary (smooth) curves can be detected by using a set of such patterns representing all possible local orientations of a line. To detect thick curves, an analogous set of patterns can be used, based on blocks of 1s and - 1s rather than on single pixels. This process is illustrated in Fig. 28. Pattern matching in every position is a computationally costly process if the pattern is large. This cost can be reduced by using inexpensive tests to eliminate positions from consideration; for example, matching with a distinctive piece of the pattern can be used as a test. Another way to reduce the cost of the correlation process is to perform the correlation by pointwise multiplication in the Fourier domain (by the convolution theorem, the correlation off and g is the inverse Fourier transform of FG*, where F, G are the Fourier transforms off, g and * denotes the complex conjugate). This may be more efficient then direct cross correlation off and g (depending on how large g is) if a fast Fourier transform algorithm is used to compute the transforms. Pattern matching is very sensitive to geometrical distortion. This problem can be somewhat alleviated by blurring the pattern (or image) prior to matching. Another possibility is to break the pattern into small pieces,
FIG. 28. Curve detection: (a) input (output of an edge detector applied to a terrain picture); (b) result of curve detection using logical conditions based on 2 x 2 blocks of pixels.
IMAGE PROCESSING AND RECOGNITION
37
find matches to the pieces, and then find combinations of these matches in (approximately) the correct relative positions. Still another general approach is to segment the image and attempt to identify regions in it whose properties correspond to (parts of) the pattern; this approach matches sets of property values rather than correlating subimages. Certain types of (nonlocal) patterns can be more easily detected in an appropriate transform of the picture than in the picture itself. For example, a periodic pattern gives rise to a set of isolated high values in the picture’s Fourier transform. Straight lines (possibly broken or dotted) in a given direction give rise to peaks when we compute the projection of the picture in that direction. Generally, to detect straight lines in all directions, one can use a point-line transformation that maps each point (u, h ) into the line (e.g.) y = ax + h ; this Hough tmnsformution takes collinear sets of points into concurrent sets of lines, so that a line in the original picture will give rise to a high value where many lines meet in the transformed picture. Analogous methods can be defined for detecting other specific types of curves. It should be pointed out that matching operations have other important uses in picture processing, in addition to their use for detecting specified patterns in a picture. Matching pictures (or parts of pictures) with one another is done for purposes of registration (e.g., pictures of the same scene obtained from different sensors), change detection or motion detection (using pairs of pictures taken at different times), or stereomupping (using pairs of pictures taken from slightly different positions). 5.4 Sequential Segmentation
In the segmentation techniques discussed up to now, the decision about how to classify each pixel (as belonging to a given cluster, or as having a high-gradient value or pattern match value) is independent of the decisions about all the other pixels. These techniques are thus “parallel,” in the sense that they could in principle be done simultaneously, and do not depend on the order in which the pixels are examined. In this section we briefly discuss “sequential” techniques in which each decision does depend on the previous ones. An important example of sequential segmentation is curve (or edge) trucking. Once a pixel belonging to a curve has been detected (see Section 5.3), we can examine nearby pixels for continuation(s) of the curve, and repeat this process to find further continuations. The pixels that are examined, and the criteria for accepting them as continuations, depend on what has previously been accepted. As we learn more about the curve (e.g., its gray level, slope, smoothness, etc.), these criteria can be progressively
38
AZRIEL ROSENFELD
refined; they can also depend on the types of curves that are expected to occur in the picture. Note that we may make errors, and backtracking may be necessary. In general, we can extract homogeneous regions from a picture by a process of region growing, in which pixels (or blocks of pixels) that resemble the region are merged with it. Here again, the acceptance criteria can depend on the types of regions that are expected to occur, and can be refined as we become more certain about the properties of the region. Alternatively, we can start with a large region and split it until the pieces are all homogeneous; or we can start with an arbitrary partition of the picture and apply both merging and splitting until we obtain a piecewise homogeneous partition.4
5.5 Fuzzy Segmentation We have assumed up to now that when we segment a picture, the pixels are classified definitely as light or dark, edge or nonedge, etc. A safer approach is to classify the pixels fuzzily, i.e., to assign them degrees of membership (ranging from 0 to 1) in the classes. These fuzzy classifications can then be sharpened by iteratively adjusting the degrees of membership according to their consistency with the memberships of neighboring pixels. To illustrate this idea, consider the problem of detecting smooth curves (or edges) in a picture. We begin by matching the picture with a set of local patterns, representing lines of various slopes (say el, . . . , en), as in Section 5.3. Our initial estimate of the degree of membership m i h ,y ) of a pixel in the class “line at &” is proportional to the strength of the corresponding match. Let (u, v ) be a neighbor of h,y), say in direction 8. If Oi and 8, are close to 8, m,(x,y ) and m,(u, v ) reinforce one another, since they are “consistent,” i.e., they correspond to a smooth curve passing through (x, y ) and (u, v ) . [The amount by which mi@, y ) is reinforced should depend on the strength of mi(u,v ) and on the degree to which Or, 8, and 8, are collinear; the details will not be given here.] On the other hand, if Oi and 8, are very different from 8, mi&, y ) and m,(u, v ) weaken one another. This process of reinforcement and weakening is applied to all pairs of ms at all pairs of neighboring pixels; it results in a new set of membership estimates m;(x, y). When this procedure is iterated, the ms at points on This partitioning process is one example of the usefulness of recursive methods in segmentation. In general, whenever a picture has been segmented according to some criterion, we can attempt to segment the resulting parts according to other criteria. It is sometimes useful to first segment a reduced-resolution version of a picture, and then refine the segmentation at successively higher resolutions.
IMAGE PROCESSING AND RECOGNITION
39
FIG.29. Iterative curve detection: (a) input (terrain seen from a satellite); (b) edges of (a); (c) initial estimate of the strongest m iat each pixel, indicated by the brightness of a line segment at the appropriate orientation; (d)-(f) iterations I , 3, and 6 of the reinforcement process.
40
A 2 R IE L R 0s ENFE LD
smooth curves that correspond to the slopes of the curves become stronger, while all other M S become weaker. This is illustrated in Fig. 29. Iterative methods of the type just described (sometimes called “relaxation methods”) have many applications in segmentation. For example, pixels can be given memberships in the classes “light” and “dark” according to their gray levels, and these memberships can then be adjusted on the basis of neighborhood consistency; this yields less noisy results than would be obtained by immediate, nonfuzzy thresholding. As another example, matching can be used to detect tentatively parts of a pattern, and these detections can then be confirmed by the (fuzzy) presence of other parts of the pattern in the proper relative positions. A variety of applications of this approach have been successfully tested.
5.6 Bibliographical Notes Picture segmentation and matching is the subject of Rosenfeld and Kak (1976, Chapter 8), Gonzalez and Wintz (1977, Chapter 7), and Pratt (1978, parts of Chapters 17-19). Thresholding techniques are surveyed in Weszka (1978). On refinements using rate of change of gray level see Panda and Rosenfeld (1978). Color clustering and local property value clustering are compared in Schachter et a/. (1978). A survey of edge detection techniques is in Davis (1975). Region growing is reviewed in Zucker (1976); and “relaxation” methods of fuzzy segmentation are reviewed in Rosenfeld (1977).
6. Representation The segmentation techniques of Section 5 extract distinguished sets of pixels from a picture, but are not concerned, for the most part, with how these sets define objects or regions. (The curve tracking and region growing methods mentioned in Section 5.4 are an exception.) This section deals with the decomposition of a picture subset into (connected) regions; with methods of representing regions to facilitate their further analysis; and with segmentation of regions into parts based on shape criteria. Section 7 will discuss the measurement of region properties and relationships among regions for purposes of picture description. In particular, the following topics will be covered in this section: (a) Connectivity and connected component decomposition; (b) Representation of regions by lists of runs; segmentation by run tracking; (c) Representation of regions by border curves; border following, chain coding;
IMAGE PROCESSING AND RECOGNITION
41
(d) Representation of regions by skeletons; thinning; and (e) Segmentation of borders and curves. 6.1 Connectedness
Let S be any subset of a digital picture. We say that the pixels P, Q of S are connected (in S ) if there is a sequence of pixels P = P o , P I , . . . ,P,, = Q, all in S, such that P , is a neighbor of P i - l , 1 5 i 5 n . [There are two versions of this definition, depending on whether or not we allow diagonal neighbors. We will refer to the two concepts as “4connectedness” (diagonal moves not allowed) and “&connectedness.”] S is called connected if any two of its pixels are connected. More generally, a maximal set of mutually connected pixels of S is called a connected component of S . Let % be the complement of S ; then 3 too can be decomposed into connected components. If we regard the region B outside the picture as being in 3, then exactly one of these components contains B; it is called the background of S. All other components, if any, are called holes in S. It turns out to be important to use opposite kinds of connectedness forS and %; thus if we treat S as consisting of 4components, we should treat 3 as consisting of 8-components, and vice versa. When we speak of “objects” or “regions” in a picture, we usually imply that they are connected. Thus when we are given an arbitrary picture subset S , we often want to decompose it into its connected components. This decomposition can be represented by assigning labels to the pixels of S . such that all pixels in the same component have the same label, but no two pixels in different components have the same label. An algorithm that does this “component labeling” will be described in the next section. We may sometimes want to consider a region as connected if its parts are separated by very small gaps, or to consider it as not connected if its parts are joined by thin “bridges.” Suppose that we “expand” S by adjoining to it points that are adjacent to it, and then reshrink the expanded S by deleting its border points. As illustrated in Fig. 30, this process tends to eliminate small gaps. (Several successive expansion steps, followed by several successive shrinking steps, will be necessary if the gaps are several pixels wide.) Conversely, suppose we shrink S and then reexpand it; as Fig. 31 shows, this tends to erase thin bridges5 Expanding and reshrinking can also be used to eliminate small holes from S (i.e., small components of 51, while shrinking and reexpanding eliminates small components of S . Shrinking algorithms that preserve connectedness (so that every connected object eventually shrinks to a point) can also be defined; the details will not be given here.
42
AZRIEL ROSENFELD
P
P
P
P P P
P P P P P
P P P P P P
P P P P P P P P P P
P
P P P
FIG.30. Elimination of small gaps by expanding and reshrinking: (a) original S; (b) result of expanding; ( c ) result of reshrinking.
6.2 Representation by Runs; Component Labeling and Counting
Given any picture subset S, each row of the picture consists, in general, of runs of pixels in S (for brevity: S-runs) separated by runs of pixels in 5 (1-runs). Thus we can represent S by a list of the lengths (and positions) of these runs on each row. (Compare the discussion of run length coding in Section 3.1.) The following algorithm, based on the run representation, labels the components of S as described in Section 6.1: On the first row of the picture, each S-run (if any) is assigned a distinct label. On succeeding rows, for each S-run p ,
(a) If p is not adjacent to any S-run on the preceding row, we give it a new label. (b) If p is adjacent to just one S-run p’ on the preceding row, we give it the same label as p ’ . (c) If p is adjacent to more than one S-run on the preceding row, we give it one of their labels, and we also note that all of their labels are equivalent (i.e., belong to the same component of S). After the entire picture has been processed in this way, we sort the labels into equivalence classes, and then rescan S row by row, relabeling the runs (if necessary) so that all equivalent labels are replaced by a single label. This completes the component labeling process. A simple example is shown in Fig. 32. P P P
P P P
P P P P P P P
P P P
P P P
P
P
P P P
P P P
P P P
P P P
P P P
P P P
FIG.31. Erasure of thin bridges by shrinking and reexpanding: (a) original S: (b) result of winking; ( c ) result of reexpanding.
IMAGE PROCESSING AND RECOGNITION
43
The run representation can also serve a s a basis for segmenting the components of S into parts in accordance with how the runs merge, split, or change significantly in size or position. Note, however, that this segmentation process is sensitive to the orientation of S; it should be used only when S has known orientation. To count the objects (i.e., connected components of some set S) in a picture, we label the components as described above and count the number of inequivalent labels. A simplified counting algorithm that is often proposed is as follows: On the first row, count 1 for each S-run:on succeeding rows, for each p , count 1 if p is not adjacent to any S-run on the previous row, but count -(k - 1) if p is adjacent to k S-runs on the preceding row. Unfortunately, this simplified algorithm works only if S has no holes: otherwise, what it counts is the number of components of S minus the number of its holes. 6.3 Representation by Border Curves; Border Following, Chain Coding
Let C be a connected subset of the picture; then the border of C (= the set of points of C that are adjacent to consists of a set of closed curves,
c)
P P P P P P P P P
P P P P P P P P
P P
P P P P P P P P
(a)
P P P P
A A A B B A A A A A B E A A A A A A E A A A A
c c c D c c c
A A A A A A A A A A A A A A A A A A A A A A A
c c c c c c c c
D (b)
(C)
FIG. 32. Connected component labeling: (a) original S: (b) result of first row-by-row scan (using labels A , B , C, . . .): ( c ) result of second scan. Equivalences: A = B , C = D. A = E; equivalence classes: A, B , E , C, D.
AZRl E L ROSEN FE LD
44
QO P
P
Q1 R12
Qo
R02
p2
‘0
R03
R17 ’1
p1
R16 R15 R 1 4
P
R13
(a)
p3
p2
R35 p4
Q2
R34 p 3
R 4 2 R 4 3 R4 4 Q4 p 4 Q3
P
P
R 3 3 R32 R 3 1 (e)
(d)
(f)
FIG.33. Border following. (a) Initial state. (b) Step I : R,, = PI, R,,, = Q I . ( c ) Step 2: R I H= P,, R 1 , = Q,. The algorithm does not stop, even though P2 = Po,since Q,, is not one of the R s . (d) Step 3: R,, = P:,,R P I= QP = Q:,. (e) Step 4: R:,@= P , , Rna = Q4.(0 Step 5: R,,, = P s . R14 = Qs.The algorithm stops, since P , = P,, and Q . is one of the R s .
c
one for each component of that is adjacent to C. One of these curves, where C is adjacent to the background, is called the outer border of C : the others (if any) are called hole borders of C . Note that if C is thin, these curves may pass through a point more than once, and may not be disjoint from one another. Let D be a component of that is adjacent to C , and let P, Q be a pair of adjacent points in C, respectively. The following algorithm constructs the curve along which C is adjacent to D (the “D-border” of C): Set P,, = P, Qo = Q. Given P, and Qi, let the neighbors of P i , in (say) clockwise order starting with Qi, be R i l .R i z , . . . , Ri,. Let R, be the first . Pi+l = Po, of the Ris that lie in C; take Pi+1= R, and Qi+l = R i J P lWhen and Qo is one of the R,s that were examined before finding P i + l ,stop. (This algorithm treats C as 8-connected and D as 4-connected; it must be slightly modified in the reverse case.) A simple example of the operation of this algorithm is given in Fig. 33.6 The successive P i s found by this algorithm are always neighbors of one another (with diagonal neighbors allowed); hence we can represent the
c,
c
A more complicated algorithm exists that tracks all the borders of all the components of a given set S in a single row-by-row scan of the picture. The details will not be given here. @
IMAGE PROCESSING AND RECOGNITION
45
sequence of Pis by a sequence of 3-bit numbers, where each number represents a specific neighbor according to the following scheme: 3 4 5
2 * 6
1 0 7
(Mnemonic: The ith neighbor is at angle 45” from the positive x axis.) Such a sequence is called a chain code. For example, the border in Fig. 33 has the chain code 73520. We can reconstruct C if we are given the chain codes of all its borders, together with the starting point P o and its neighbor Qo for each border. Specifically, for each D-border of C, we mark Po with (say) 1 and Qo with 0. Given that P i and Q ihave been marked, we find Pi+,from the chain code and mark it 1, and we also mark all the neighbors of Pi between Q i and P i + 1in clockwise order with 0s. When this process is complete, that entire border of C has been marked with Is, and its neighbors in D have been marked with 0s. When all borders of C have been processed in this way, we can fill in the interior of C by allowing 1s to “propagate” to unmarked pixels (but not to 0s).
6.4 Representation by Skeletons; Thinning
Given any pixel P in the set S , let A ( P )be the largest “disk” centered at P that is entirely contained in S. ( A need not be circular; it can be of any convenient shape, e.g., square.) We call A(P)maximalif it is not contained in any other It is easy to see that S is the union of the maximal A(P)s. The set of centers of these A ( P ) s constitutes a sort of “skeleton” of S (they are pixels whose distances from the border of S are local maxima); these centers, together with their associated radii, are called the medid u i s trun&rm (MAT) of S. Algorithms exist that construct the MAT of a given S , and that reconstruct S from its MAT, in two row-byrow scans of the picture, one from top-to-bottom and the other from bottom-to-top; details will not be given here. Even if S is connected, its MAT need not be connected; but we can define algorithms that always yield connected skeletons. For example, suppose that we delete each border pixel of S that has at least two neighbors in S and whose deletion does not disconnect the points of S in its neighborhood; this is done simultaneously for all border pixels on a given side of S (north, south, east, or west). When we do this repeatedly, S shrinks to a connected skeleton, consisting of a set of thin arcs and curves. This thinning process is illustrated in Fig. 34, which also shows the
Ace).
AZ RI E L ROS ENFE LD
46 P P P
P P P P P P P P
(a)
P P P
P P P
1
2
1
P P P
(b)
(C)
(d)
FIG. 34. Thinning: (a) original S: (b) first step (removal of north border points); ( c ) second step (removal of south border points); (d) MAT of S (the numbers are the radii).
MAT of the same S . Thinning is most appropriately applied to S s that are elongated. (We can define a set S of areaA to be elongated if it disappears completely when we shrink it by a number of steps that is small compared to fl;e.g., the object in Fig. 34 has area 1 1 and disappears when we shrink it twice.) The branches of the skeleton correspond to “arms’‘ or “lobes” of S ; thus the skeleton provides a basis for segmenting,S into such “arms.” Skeleton representations can also be defined by piecewise approximating S using symmetrical “strips” of varying width; the midlines of these strips correspond to skeleton branches, Analogously, three-dimensional objects can often be piecewise approximated by “generalized cones” of varying width. Methods of constructing such approximations will not be discussed here. 6.5 Segmentation of Curves We have seen in Sections 6.3 and 6.4 that regions can be represented by their borders, which are closed curves, or by their skeletons, which are composed of arcs and curves. We have also seen that arcs and curves can be represented by chuin codes that define sequences of moves from neighbor to neighbor. In this section we discuss methods of segmenting arcs or curves defined by chain codes. Each link in a chain code corresponds to a slope, which is a multiple of 45”; if desired, we can obtain a more continuous set of slopes by local averaging. By analyzing the frequencies of occurrence of the slopes (i.e., the slope histogram or “directionality spectrum”), we can detect peaks corresponding to predominant directions; this is analogous to detecting gray level subpopulations in a picture (Section 5 . I). By measuring the rate of change of slope (i.e., the curvature), we can detect “angles” or “corners” at which abrupt slope changes occur. These correspond to edges in a picture (see Section 5.2); they are useful in constructing polygonal approximations to the curve. (Such approximations can also be defined by piecewise approximating the curve with straight lines; compare the begin-
47
IMAGE PROCESSING AND RECOGNITION x+ x X+
X
X
X
X
X'
X'
X
x x
X'
X
.r
X
X
x'
X X
x
X
X
X
x' x
X
x'
x
X
X
X'
X'
X+
X
x
x ' x
x
X
x' x
X
X
x+
x' x+
X
x + x'
X
x'
x
X
X
X
x'
x x
x+
(a)
55656
70211
21111
10110
10224
45455
42212
12345
55655
55555
55671
10100
( b)
1 2 3 4
Slope
0
Frequency
7 I5 8
I
5
5 6 7 18 4 2
(C)
FIG.35. Curve segmentation. (a) Curve; asterisks and primes show maxima and zerocrossings of curvature. (b) Chain code of (a). ( c ) Slope histogram of (a).
48
A 2 RIE L ROSE NFE LD
ning of Section 3.) Points where the curvature changes sign, or “points of inflection,” segment the curve into convex and concave parts. Figure 35 illustrates these concepts for a simple curve. Given a pair of points on a curve, if the arc length between them is not much greater than the chord length (= the distance between them), or if the arc does not get far away from the chord, then the curve is relatively straight between them; otherwise, the curve has a “spur” or “bulb” between them. We can also detect matches between pieces of the curve and specified slope patterns (using, e.g., chain code correlation, in analogy with Section 5.3).7 Sequential and fuzzy methods can also be used for curve segmentation, as in Sections 5.4 and 5.5 Curve analysis is sometimes easier if we represent the curve by some type of transform. The chain code gives the slope of the curve as a function of arc length; we can take the Fourier transform of this function and use it to detect, e.g., wiggliness or periodicity properties of the curve. The same can be done with other equations representing the curve, e.g., parametric or polar equations. 6.6 Bibliographical Notes
Representation of picture parts, and measurement of their geometrical properties, as well as segmentation of arcs and curves, is covered in Rosenfeld and Kak (1976, Chapter 9), Gonzalez and Wintz (1977, Section 7.2), and in Pratt (1978, especially in Chapter 18). Connectedness and borders in digital pictures were first studied in Rosenfeld (1970). Chain coding is reviewed in Freeman (1974). On the MAT, see Blum (1967); a characterization of connectedness-preserving thinning algorithms is given in Rosenfeld (1975). 7. Description
This section discusses the measurement of region properties and relationships for picture description purposes. Topics covered include (a) Geometrical properties of a region (size, shape, etc.); (b) Properties of the gray level distribution in a region or picture (moments, textural properties); and (c) Spatial relationships among regions. I n particular, the slope patterns corresponding to straight lines can be characterized as follows: They involve at most two slopes, differing by 45”. one of which occurs in singletons and the other in runs of (approximately) equal length.
IMAGE PROCESSING AND RECOGNITION
49
The role of models in the design of processing and analysis techniques is also briefly considered. 7.1 Geometrical Properties
The area of a region in a digital picture is simply the number of pixels in the region. If the region is represented by a list of runs, its area is computed by summing the run lengths. If it is represented by a set of border chain codes, its area can be computed by applying the trapezoidal rule to determine the area inside each chain, and then subtracting the hole areas from the area contained inside the outer border. There is no simple way to compute area (or perimeter) from a MAT representation of a region. The perimeter of a region can be defined as the number of its border points, or as the total length of its border chain codes (counting diagonal links as V?, if desired). The height and width of a region are the distances between the highest and lowest rows, or leftmost and rightmost columns, that the region occupies; they can be computed from the chain code of the outer border by comulatively summing to obtain the highest and lowest x and y values. The extent of a region in any given direction is obtained analogously. The shape complexity of a region (without holes) is sometimes measured by P2/A,where P is perimeter andA is area; this is low for compact shapes and high for “dispersed” ones. An alternative measure is the sum of the absolute curvatures (summed around the outer border of the region), which is high for jagged shapes. The elongatedness of a region can be measured by A IW , where W is the number of shrinking steps required to annihilate the shape (this is easily determined from the MAT representation). There are several essentially equivalent criteria for a region R to be convex: (a) No straight line intersects R more than once. (b) For any two points (x, y ) , (u, v ) in R , their midpoint [@ + u)/2, 0, + v)/2+rounded if necessary-is also in R. (c) R has no holes, and the curvature of its outer border never changes sign. The convex hull of any region R is the smallest convex region that contains R ; it is basically the union ofR with its holes and concavities. A variety of algorithms have been designed for concavity detection and convex hull construction. The geometric properties mentioned above are all invariant with respect to translation (by an integer number of pixels), and should also be essen-
AZRIEL ROSENFELD
50
tially invariant with respect to rotation (rotation by an angle that is not a multiple of 90"requires redigitization; see Section 4.2). Some of them are also essentially invariant with respect to magnification (the obvious exceptions are the size measures in the first two paragraphs). A general method of ensuring that properties will be invariant under geometric transformations is to normalize the input before the properties are computed; this converts the input into a form that is independent of the position (or orientation, scale, etc.) of the original. The following are three representative normalization techniques (see Fig. 36): (a) The autocorrelation and Fourier power spectrum of a picture are the same no matter how the picture is translated; thus properties measured on these transforms are automatically translation-invariant. Analogous transforms can be devised that yield rotation and scale invariance. (b) A picture or region can be normalized with respect to translation by putting its centroid at the origin of the coordinate system; with respect to
Rectangle dimensions Rotated by (degrees) 0 10 20 30 40 50
60 70 80
Width
Height
Area
28 27 27 30 29 27 27 28 28
28 29 29 30 30 31 31 31 30
784 783 783 900
8 70 837 837 868 840
FIG.36. Geometrical normalization: (a) object; (b) same, with background erased; (c) rotation of (a) to make principal axis of inertia vertical: (d) rotation of (a) to make long sides of smallest circumscribed rectangle vertical; (e) dimensions of circumscribed rectangles for various orientations.
IMAGE PROCESSING AND RECOGNITION
51
rotation, by making its principal axis of inertia thex axis; and with respect to scale, by rescaling it so that its moment of inertia has some standard value. (See Section 7.2 on moments.) (c) Alternatively, a region can be normalized with respect to translation, rotation, and scale by constructing its smallest circumscribing rectangle and putting it at the origin, oriented along the coordinate axes, and scaling to make it a standard size. 7.2 Gray- Level-Dependent Properties
The geometrical properties discussed in Section 7. I do not depend on gray level, but only on the set of pixels that constitute the given region. In this section we discuss properties that do depend on gray level. Such properties can be measured either for a region or for a (sub)picture; for simplicity, we consider the latter case. An important class of gray-level-dependent properties are statistical properties that depend only on the population of gray levels in the picture, but not on their spatial arrangement. For example, the mean gray level is a measure of overall lightness/darkness, while the standard deviation of gray level is a measure of contrast. Another class of (basically) statistical properties are those that measure various textural properties of a picture, such as its coarseness or busyness. This can be done in a number of ways: (a) The autocorrelation of a busy picture falls off rapidly; conversely, the Fourier power spectrum of a busy picture falls off slowly. Thus the rate of falloff of these transforms can be used to measure busyness (Fig. 37). (b) The second-order gray level distribution of a picture measures how often each possible pair of gray levels occurs at a given relative displacement. If these gray levels are often quite different even for small displacements, the picture is busy. (c) Alternatively, suppose that we simply measure the mean value of the picture’s gray level gradient; this will be higher for a busy picture than for a smooth one (Fig. 38). In general, statistics of local property values (measured at each pixel) can be used to measure a variety of textural properties of pictures. A related idea is to analyze the frequency of occurrence of local maxima or minima of gray level. One can also measure statistics of the properties of small regions extracted from a picture, or second-order statistics of the properties of neighboring pairs of such regions. Other approaches to texture analysis involve concepts such as random fields or time series, and are beyond the scope of this paper.
52
AZRlE L ROSENFELD
FIG.37. Measuring busyness by the rate of falloff of the Fourier power spectrum: (a) pictures; (b) Fourier power spectra (log scaled): (c) averages of (b) over rings centered at the origin.
Most of the busyness measures just described are sensitive to the contrast of the picture, as well as to its busyness. Thus some form of grayscale normalization should be performed before such measures are computed. For example, one can shift and stretch (or compress) the grayscale so that the mean and standard deviation of gray level have specified values; or one can transform the grayscale to make each gray level occur equally often, as in Section 4.1. A class of gray-level-dependent properties that do depend on spatial arrangement are the moments of the picture. The (i, j ) moment of the picturefis defined to be m = Z r i y y ( x ,y ), where the sum is taken over the entire picture. In particular, rn,,/m, and mol/m, are the coordinates of the centroid of the picture (= its center of gravity, if we regard gray level as mass). Moments of odd degree, such as m,, and m,,, tell us about the balance of gray levels between the left and right, or upper and lower, half planes; while moments of even degree, such as mZ0and m,,, tell us about the spread of gray levels away from the y orx axis. The principal axis is the line through the centroid about which the spread of gray levels is least; its slope 8 is a root of the equation tan2 8 + [(m2, - mo2)/m1,]tan 8 - 1 = 0. (The ratio of greatest spread to least spread can be used as a iJ
IMAGE PROCESSING AND RECOGNITION
53
FIG.38. Measuring busyness by the average gray level difference: (a)-(b) x and .v differences of the pictures in Fig. 37a; (c)-(d) histograms of the values in (a)-(b), log scaled.
measure of elongatedness; but this measure is not sensitive to the elongatedness of, e.g., a coiled-up snake.) We saw in Section 7.1 how the centroid and principal axis can be used for geometrical normalization. One can define combinations of moments that are invariant under geometrical transformations; for example, if we take the centroid at the origin, + mio2 is invariant under rotation. Variations in gray level or texture can provide important clues to the three-dimensional orientation of surfaces in a scene. Texture coarseness decreases with increasing range or obliquity; thus the direction in which it is changing most rapidly is an indicator of surface orientation. Under
54
AZRIEL ROSENFELD
given conditions of illumination, information about the three-dimensional shape of a diffusely reflecting surface can be deduced from gray level variations. If two pictures taken from different positions are available, three-dimensional surface shape can in principle be derived from measurements of stero (or motion) parallax. Occlusion of one region by another, as evidenced by the presence of “2’-junctions” in an image, provides a cue to relative distance. There are a variety of other “depth cues,” involving such factors as relative size and perspective; the details will not be given here. 7.3 Relations and Models
Picture descriptions may involve not only properties of single regions, but also relationships among regions. Some of these are mathematically well defined-e.g., adjacency and surroundedness. A is adjacent to B if there exists a pair of pixels P, Q in A, B, respectively, such that P and Q are neighbors. A surrounds B if any path (= sequence of pixels, each a neighbor of the preceding) from B to the border of the picture must meet A. Other relations among regions, involving relative position, are inherently fuzzy. Examples of such relations are to the leftlright of; above/ below; neadfar; and between. For single pixels, or small regions, one can define the degree of membership in a relation such as “to the left of,” as being + 1 on the negative x axis and falling off to 0 on they axis. For large regions, however, it is much more complicated to define “to the left of,” since parts of one object may be to the left of the other object, while other parts are not. A picture can generally be described by a relational structure. This might take the form of a labeled graph in which the nodes represent objects or regions; each node is labeled with a list of its properties; and the nodes are connected by arcs representing relations. The structural description of a picture is often hierarchical: the picture consists of parts, each of which is in turn composed of subparts, and so on. This suggests the possibility of modeling classes of pictures by “grammars” whose rules specify, e.g., how a given part of a structure, in a given context, can be expanded into a particular substructure. Such grammars have been used successfully in the analysis of various classes of pictures, as well as in analyzing textures and shapes. Image models are playing an increasingly important role in the design of image processing and analysis techniques. A wide variety of models have been used; they range from simple statistical models for the image’s gray level population, to complex hierarchical models for region and subregion
IMAGE PROCESSING AND RECOGNITION
55
configurations. The further development of such models will help to provide firmer mathematical foundations for the field of image processing and analysis. 7.4 Bibliographical Notes
Picture properties are covered in Rosenfeld and Kak (1976, Chapters 9-10), Gonzalez and Wintz (1977, Chapter 7), and Pratt (1978, Chapters 17-18). Convexity in digital pictures was first studied in Sklansky (1970). Textural properties are reviewed in Haralick (1979); and shape description is reviewed in Pavlidis (1978). On spatial relations see Freeman (1975). Syntactic methods in pattern recognition are discussed by Fu (1974, 1977) and Pavlidis (1977). 8. Concluding Remarks
Digital image processing and recognition techniques have a broad variety of applications. Image coding is used extensively to reduce the time or bandwidth required for image transmission. Image enhancement and restoration techniques are very helpful in improving the usefulness of images taken near the limits of resolution (astronomy, microscopy, satellite reconnaissance) or under adverse conditions (motion, turbulence). Pictorial pattern recognition has innumerable applications in document processing (character recognition), industrial automation (inspection; vision-controlled robot assembly), medicine (hematology, cytology, radiology), and remote sensing, to name only a few of the major areas. Many of these applications have led to the development of commercial image processing and recognition systems. It can be expected that there will be a continued growth in the scope and variety of such practical applications over the coming years. ACKNOWLEDGMENT The support of the National Science Foundation under Grant MCS-76-23763 is gratefully acknowledged, as is the help of Mrs. Virginia Kuykendall in preparing this paper. SUGGESTIONS FOR FURTHER READING Boo/is" Andrews, H. C. ( 1970). "Computer Techniques in Image Processing." Academic Press. New York.
" Proceedings of meetings and books on pattern recognition or artificial intelligence have not been listed here.
56
AZRIEL ROSENFELD
Andrews, H. C., and Hunt, B. R. (1977)."Digital Image Restoration." Prentice-Hall. Englewood Cliffs, New Jersey. Duda. R. 0.. and Hart, P. E. (1973)."Pattern Classification and Scene Analysis." Wiley. New York. Fu, K. S. (1974)."Syntactic Methods in Pattern Recognition." Academic Press, New York. Fu, K. S. (1977)."Syntactic Pattern Recognition, Applications. Springer-Verlag. Berlin and New York. Gonzalez, R. C., and Wintz, P. A. (1977)."Digital Image Processing." Addison-Wesley, Reading, Massachusetts. Huang, 'r. S. ( 1975)."Picture Processing and Digital Filtering." Springer-Verlag, Berlin and New York. Pavlidis, T. (1977). "Structural Pattern Recognition." Springer-Verlag. Berlin and New York. Pratt, W. K. (1978)."Digital Image Processing." Wiley, New York. Rosenfeld, A. (l%9). "Picture Processing by Computer." Academic Press, New York. Rosenfeld, A. ( 1976). "Digital Picture Analysis." Springer-Verlag. Berlin and New York. Rosenfeld, A., and Kak, A. C. (1976)."Digital Picture Processing." Academic Press, New York. Winston, P. M. (1975)."The Psychology of Computer Vision." McGraw-Hill, New York. Bihliogr~rphies
Rosenfeld, Rosenfeld, 416. Rosenfeld, Rosenfeld. 194. Rosenfeld, 155. Rosenfeld, 237. Rosenfeld, 183. Rosenfeld, 242.
A. (l%9). Picture processing by computer. Cornput. Surv. I, 147-176. A. ( 1972). Picture processing: 1972. Comput. Cruphics f ~ n u g Process. e 1, 394A. (1973).Progress in picture processing: 1969-71. Comput. Surv. 5, 81-108. A. ( 1974).Picture processing: 1973. Coniput. Gruphics f ~ n a g eProccss. 3, 178A. ( 1975). Picture processing: 1974. Comput. Grtrphics fmuge Process. 4, 133A. (1976).Picture processing: 1975. Comput. Gruphics frnuge Process. 5, 2 15A. (1977).Picture processing: 1976.Comput. Gruphics ftnuge Process. 6, 157A. (1978).Picture processing: 1977.Comput. Gruphics fmuge Process. 7 , 21 1-
Sdc~cteiIPirpers
Blum, H. (1%7). A transformation for extracting new descriptors of shape. f n "Models for the Perception of Speech and Visual Form" (W.Wathen-Dunn, ed.), pp. 362-380. MIT Press, Cambridge, Massachusetts. Davis, L. S. (1975).A survey of edge detection techniques. Comput. Graphics fnicrgc, Proce.s.s. 4, 248-270. Davis. L. S . , and Rosenfeld, A. (1978).Noise cleaning by iterated local averaging. f E E E Trans. Syst., Mun Cyber. 8, 705-710. Freeman, H. (1974).Computer processing of linedrawing images. A C M Comput. Surv. 6, 57-97. Freeman, J. (1975).The modelling of spatial relations. Comput. Grcrphics fmage Proce.ts. 4, 156- I7I. Gordon, R., and Herman, G . T. (1974).Three-dimensional reconstruction from projections: A review of algorithms. f n t . Rev. Cytol. 38, I 1 1-151.
IMAGE PROCESSING AND RECOGNITION
57
Graham, R. E. ( 1962). Snow removal-a noise-stripping process for picture signals. IRE Trfrns. lfl$ T h P f J n6, 129-144. Habibi. A. ( 1977). Survey of adaptive image coding techniques. IEEE Trtrns. Conifnun. 25, 1275- 1284. Haralick, R. M. (1979). Statistical and structural approaches to texture. Proc. IEEE 67. Huang, T. S. (1977). Coding of two-tone images. IEEE Trans. Comnun. 25, 1275-1284. Kovasznay, L. S. G., and Joseph, H. M. Image processing. Proc. IRE 43, 560-570. Legault, R. ( 1973). The aliasing problems in two-dimensional sampled imagery. I n "Perception of Displayed Information" (L. M. Biberman, ed.). Plenum, New York. Limb, J. 0.. Rubinstein, C. B., and'rhompson, J. E. Digital coding ofcolor video signals-a review. IEEE Trcrns. Cornmun. 25, 1349-1385. Max. J. (I%O). Quantizing for minimum distortion. IRE Trcins. Inf. Theory 6, 7-12. Mertz, P.. and Grey. F. (1934). A theory of scanning and its relation to the characteristics of the transmitted signal in telephotography and television. Bell Syst. k h . J . 13, 464-515. O'Handley, D. A,, and Green, W. B. (1972). Recent developments in digital image processing at the Image Processing Laboratory at the Jet Propulsion Laboratory. Proc. IEEE 60. 821-828. Panda, D. P., and Rosenfeld, A. (1978). Image segmentation by pixel classification in (gray level, edge value) space. IEEE Trrins. C o m p . 27. 875-879. Pavlidis, T. (1978). A review of algorithms for shape analysis. Cornput. Graphics Itncrge Proc.e.s.s. 7, 243-258. Peterson, D. P., and Middleton, D. (1962). Sampling and reconstruction of wave-numberlimited functions in n-dimensional Euclidean spaces. Inf. Conrrol 5 , 279-323. Rosenfeld, A. (1970). Connectivity in digital pictures. J . ACM 17, 146-156. Rosenfeld, A. (1975). A characterization of parallel thinning algorithms. Itif: Control 29, 286-291. Rosenfeld. A. (1977). Iterative methods in image analysis. Proc. IEEE Conf. Prrrrern Recog. Imcrge Process., pp. 14-18. Schachter, B. J., Davis, L. S., and Rosenfeld, A. (1978). Some experiments in image segmentation by clustering of local feature values. Pattern Recognition 11, 19-28. Sklansky, J. (1970). Recognition of convex blobs. Pattern Recognition 2 , 3-10. Stockham, T. G., Jr. (1972). Image processing in the context of a visual model. Pro<..IEEE 60, 828-842. Weszka. J. S . ( 1978). A survey of threshold selection techniques. Compur. Crtrphics Ifntrge Process. 7, 259-265. Wilkins, L. C., and Wintz, P. A. (1971). Bibliography on data compression, picture properties, and picture coding. IEEE Trans. 1n.f Theon, 17, 180-199. Zucker, S . W. ( 1976). Region growing: childhood and adolescence. Cotnpur. Crcrphics lrnugr Process. 5. 382-399.
This Page Intentionally Left Blank
Recent Progress in Computer Chess MONROE M . NEWBORN School of Computer Science McGill University Montreal. Quebec 1.
2.
3.
4. 5. 6. 7. 8.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . After Stockholm . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 1975: The Jet Age Arrives . . . . . . . . . . . . . . . . . . 2.2 1976: The Paul Masson Chess Classic and Other Events . . . . . 2.3 1977: CHESS 4.6 Plays Expert Level Chess . . . . . . . . . . 2.4 1978: More Accomplishments . . . . . . . . . . . . . . . . . Tree-Searching Techniques (Modifications to the Minimax Algorithm) . . 3.1 The Horizon Effect . . . . . . . . . . . . . . . . . . . . 3.2 Forward Pruning . . . . . . . . . . . . . . . . . . . . . . 3.3 The Alpha-Beta Algorithm . . . . . . . . . . . . . . . . . . . 3.4 The Killer Heuristic . . . . . . . . . . . . . . . . . . . . 3.5 Iterative Deepening . . . . . . . . . . . . . . . . . . . . . 3.6 The Alpha-Beta Window . . . . . . . . . . . . . . . . . . . . 3.7 Transposition Tables . . . . . . . . . . . . . . . . . . . . 3.8 The Method of Analogies . . . . . . . . . . . . . . . . . . 3.9 Differential Updating of Chess Information . . . . . . . . . . . Chess-Specific Information in Chess Programs . . . . . . . . . . . . . Endgame Play . . . . . . . . . . . . . . . . . . . . . . . . . Speedchess . . . . . . . . . . . . . . . . . . . . . . . . . . . The Microcomputer Revolution . . . . . . . . . . . . . . . . . . Final Observations and the Future . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . 59 62
. . .
. . . 63 . . .
70
. . . .
73 84 92 93 94 94 97 91 97 98 98 98 99
. . . .
. . . .
. . .
. . . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . loo . 106
. . . 110 . . . 113 . . . 114
.
1 Introduction
For several hundred years. man has been fascinated by the idea of machines playing chess . When electronic computers arrived in the late 1940s. Wiener (1948). Shannon (1950a.b). and Turing (1953). all prominent names in the world of cybernetics. suggested how man’s new creation might so be used . How well they might play could only be guessed . Through the years. many have expressed their opinions. including former world chess champions Mikhail Botvinnik (1970) and Max Euwe (1970). 59 ADVANCES IN COMPUTERS. VOL . 18
.
Copyright 0 1979 by Academic Press Inc . All rights o f reproduction in any form reserved . ISBN n - i x n i i a - ?
MONROE M. NEWBORN
Based on the ideas of Shannon and lbring, working programs came into existence in the late 1950s and early 1960s (Kisterer ul., 1957; Bernsteinet ul., 1958; Kotok, 1962). They all played rather poorly. To some people the reason was obvious: good players use an entirely different approach. It was argued that until we better understand the approach used by good human players and then write programs to perform in the same way, there would be little progress. Studies of good players suggested they search very small trees (less than 100 nodes) using powerful heuristics to guide the search (de Groot, 1965; Newell and Simon, 1972). The first attempt to program a computer to do the same was reported by Newell er ul. (1963). Their program played weakly too, mainly because its heuristics were not sufficiently well developed. In fact, the major reason for the lack of success of all early programs was that not nearly enough effort went into any of them. A little over ten years ago, two good programs emerged, first the ITEP (Institute of Theoretical and Experimental Physics) program in Moscow, and a year or two later, MAC HACK VI. It was the ITEP program, developed by George M. Adelson-Velskiy, Vladimir L. Arlazarov, A. G. Ushkov, A. Bitman, and A. Zhivatovsky, that in 196fS1967 proved itself superior to a US program developed at MIT by Alan Kotok and John McCarthy in a four-game match in which moves were telegraphed across the Atlantic. The ITEP program won twice and drew twice. In its two victories, the Soviet program carried out exhaustive searches five levels (or plies) deep, continuing deeper along select lines when necessary to arrive at quiescent positions. When searching three levels, instead of five, the program was only able to draw with the Kotok-McCarthy Program (although even in the two drawn games, it had the better position). The latter searched to a depth of four plies in all four games, following capturing sequences somewhat further. Various heuristics were employed to forward prune supposedly bad moves from positions near the root of the tree but as Botvinnik said, the program sometimes “threw the baby out with the bath water” (Botvinnik, 1970). Two lessons were vividly provided by the match. First, a brute force search of depth D plays significantly stronger chess than one of lesser depths. Second, if one wants to include forward pruning, it is necessary to have very good heuristics. Deeper brute force searches, of course, can be carried out by faster computers. However, some continued to argue that faster computers would never overcome the exponential growth of chess trees, that there may be some improvement in play but not a great deal and that great progress would only come when programs are designed to play as man does. The second lesson was more acceptable and during the next decade. efforts were made to develop good heuristics. MAC HACK VI programmed by Richard Greenblatt with assistance
RECENT PROGRESS IN COMPUTER CHESS
61
from Donald Eastlake and Stephen Crocker, emerged in 1967 (Greenblatt et ul., 1967).It achieved a good balance between a program based on brute force and one that depended heavily on heuristics for move pruning. MAC HACK VI was the first program to successfully compete in human tournaments, receiving a USCF rating in the 1400s'for play in the Massachusetts State Championship in 1967. In the late 1960s,interest in computer chess started to mushroom. New programs appeared on university campuses across America. In 1970 the Association for Computing Machinery hosted the first major computer chess tournament and has continued to do so every year thereafter at its Annual Conferences. The first world championship was held in Stockholm in 1974.During the years leading up to Stockholm, CHESS 4.0,the work of David Slate and Larry Atkin (and Keith Gorlen who left the team a few years ago) of Northwestern University established itself as the best program in the West (Slate and Atkin, 1977). It went through a series of metamorphoses, starting as CHESS 2.0, becoming CHESS 4.0at the time of the Stockholm tournament, and finally CHESS 4.7 as of September 1978. On each move, CHESS 4.0 carried out a sequence of exhaustive searches of two plies, three plies, and so on as time permitted, coupled with very thorough extended analysis. KAISSA, the prodigy of Mikhail Donskoy and Arlazarov, dominated efforts in the Soviet Union (see Adelson-Velskiy et al., 1975). KAISSA, too, carried out exhaustive searches, though not a sequence of them, to some fixed depth and forcing sequences still further. Both programs participated at Stockholm with KAISSA capturing the championship (Hayes and Levy, 1976; Mittman, 1974;Newborn, 1975). An era in computer chess history ended in Stockholm. It was an era in which gradual progress was made. There was progress in software technology making programming, debugging, and testing chess programs much easier. Hardware technology continued to improve as well. MAC HACK VI and CHESS 4.0had each played thousands of games. Computer centers around the world had copies of them. CHESS 4.0,when running on a CDC 6600,was playing B level chess (although Slate would have said C) and notably better than it played two years earlier when running on a CDC 6400. The era was also characterized by unfulfilled expectations and I Most good chess players in America belong to the United States Chess Federation. This organization rates its members. FIDE, the Federation Internationale des Echecs, is the governing world organization and it too rates its members. FIDE also awards the titles of International Grandmaster and International Master. FIDE ratings for International GMs are about 2500+ and International Masters, 2400+. The USCF ratings are Senior Master, 2400+ ; Master, 2200+ ; Expert, 2000+ ; Class A, 1800+ ; Class B, 1600+ ; Class C, 1400+ ; Class D, 1200+. There are about 150 International Grandmasters (GMs) and 400 International Masters (IMs) in FIDE.
62
MONROE M. NEWBORN
growing pessimism over the future by those who were not sensitive to the progress being made and who felt the programmers were going in the wrong direction. The year 1975 marked the beginning of, as Berliner calls it, the Jet Age of computer chess (Berliner, 1978). Computer Chess (Newborn, 1975) covers events up to and through Stockholm. It is the purpose of Section 2 to survey events following Stockholm including the tail end of the pre-Jet Age era. Over 15 games illustrate the rapid progress made, ending with CHESS 4.7’s first victory over a Master. The games show that while CHESS 4.7 is currently the best program, there are several others that are quite strong and improving. Section 3 surveys tree searching heuristics used in most programs. Chess-specific information in chess programs is considered in Section 4. Three special sections follow: the first discusses endgame play, the next examines speed chess by computers, and the third looks at chess on microcomputers. We conclude in Section 8 with a few comments on the future and discuss briefly what has been learned. 2. After Stockholm
Only three months separated the World Championship from the ACM’s Fifth United States Computer Chess Championship in San Diego on November 10-12, 1974. Slate and Atkin came prepared t o defend their title, a title which they had won for the first time in 1970 and had held ever since. But they were unable to find a CDC 6600 for the tournament (they used a CDC 6600 in Stockholm) and wound up playing on the slower CDC 6400 at Northwestern University. RIBBIT, the work of Ron Hansen, Russell Crook, and Jim Parry of Waterloo University defeated CHESS 4.0 and won the tournament with a perfect 4-0 record. They used a Honeywell 6050 computer. RIBBIT had won the first Canadian Computer Chess Championship earlier in the year and had finished fourth in Stockholm, and they certainly figured to be contenders. But the concensus was that CHESS 4.0 was not a t its best on the CDC 6400. RIBBIT’s victory is presented here. Move times are denoted in parentheses. Moves from book are denoted by a “B.” WHITE: RIBBIT
BLACK: CHESS 4.0
1 P-K4(B) P-QB4(B) 2 P-QB3(B) P-Q4(B)
3 P x P(B) 4 P-Q4(B)
Q x P(B) P x P(B)
RIBBIT is now out of book but CHESS 4.0 has two more moves to go. 5 P x P(74) 6 N-KB3(84)
N-QB3(B) B-NS(B)
7 N-B3(67)
Q-Q3(76)
RECENT PROGRESS IN COMPUTER CHESS
63
One move out of book and CHESS 4.0 is on the run!
8 9 10 11
P-Q5(241) B-QN5+(41) B x B+(117) B-K3(317)
N-NS(174) B-Q2(163) K x B(86) Q-QR3(61)
12 N-K5+(26) 13 P-QR3(234) 14 Q-R4+(41)
K-Kl(118) Q-Q3(98)
...
A low point in CHESS 4.0’s career. RIBBIT has established an overwhelming position (see Fig. 1) and goes ahead by a Knight and Pawn during the next few moves. After the exchange of Queens on move 18, CHESS 4.6 finds itself in a hopeless position. 14 . . . 15 P x N(475) 16 N X P/6(53) 17 N x P/7+(65) 18 Q x Q+(32) 19 R-Q1+(92) 20 0-O(231) 21 P-QN4(332) 22 P-R3(69) 23 WB-Kl(43) 24 R-Q3(446) 25 B-Q4(49) 26 B x N(57) 27 R x P+(122) 28 R x R(92) 29 P-N4+(130) 30 P x P(92) 31 K-B2(92) 32 P-N4+(158) 33 N-K4(213) 34 P-QN5(89) 35 K - B3( 183) 36 N-B2(160) 37 R-Q6(125)
N-QB3(119) P x P(60) P-K4(44) Q-Q2(107) K x Q(1) K-K3(51) N-B3(82) B-K2(109) P-R4(58) P-R5(85) P-KS(137) WKRI-KI(64) B x B(75) K-B4(77) R x R(88) P x Pep(237) R-K8+(295) R-QBS(l11) K-N3(78) B-K4(187) R-B7+(88) R - KR7(247) B-B3(140) K-R2(13l)
38 39 40 41 42 43 44 45 46 47 48 49 50
51 52 53 54 55 56 57 58
59 60 61
R-Q5(161) K-N3(91) K x R(57) P-N6(6) P-N7(18) N-B6(79) R-Q7(258) R x P(214) P-N8 = Q(l55) N x B(103) R-B5(96) N-Q7(545) R-B6(82) K-N3(76) N-KS(l13) P-N5(74) N x P(67) P-R4(348) P-RS(66) P-R6+(37) N-KS(35) P-N6(22) K-N4(23) R-B8 mate(2)
B-N7( 185) R x N(87) B x P(121) B - B8(97) B-B5(203) B - B2(24 1) B - B5(302) B-Q3(158) B x Q(88) K - N 3(2) K-R3( 1) P-N3(3) K-N4( 1) K-R3(1) K-N2(2) K-Nl(4) K-R2(1) K- N I1( ) K-N2(2) K-R2(1) K-R 1(1) K-Nl(1) K-RI(1)
2.1 1975:The Jet Age Arrives
In 1975, Control Data Corporation came out with a new line of superpowerful computers, the CDC CYBER 170 series. David Cahlander of CDC arranged for Slate and Atkin’s newest version of their program, CHESS 4.4, to use a CYBER 175 at CDC’s corporate headquarters in Minneapolis for the 1975 ACM tournament, the Sixth North American Computer Chess Championship (the tournament’s name was changed in
64
MONROE M. NEWBORN
FIG. I .
Position after 14 Q-R4+.
deference to the previous year’s champion and because usually several Canadian programs compete). The tournament was set for Minneapolis on October 19-21. RIBBIT returned under the new name of TREEFROG. CHAOS, the consistently tough program of Ira Ruben, Fred Swartz, Joe Winograd, William Toikka, and Victor Berman, was back for the third time, having finished second, or in a tie for second, in every tournament that it had previously participated in. DUCHESS, the work of Tom Truscott, Bruce Wright, and Eric Jensen of Duke University, was returning for its second try. Eight other programs participated in the four-round event (Levy, 1975). CHESS 4.4 swept the tournament without facing any trouble along the way. Its fourth round victory over TREEFROG established without a doubt a renewed superiority over the other programs. The strength of the CYBER 175 was the biggest reason for its success. The addition of transposition tables was also a big factor. CHESS 4.4 was searching trees having several hundred thousand nodes. CHESS 4.4’s victory over TREEFROG is presented here. The program started on the CYBER 175, had machine problems, switched to Northwestern’s CDC 6400 on move 4, back to the 175 on move 8, back again to the 6400 on move 14, again to the 175 on move 23 where the program really put the pressure on TREEFROG. From move 35 on, CHESS 4.4 coasted to victory on the 6400. WHITE: CHESS 4.4 1 2 3 4
P-K4(B) P x P(B) P-Q4(B) N-KB3(B)
P-Q4 N-KB3 N x P B-N5
BLACK: TREEFROG 5 6 7 8
B-K2(116) 0-O(39) P-B4(118) N-B3(60)
P-K3 N-QB3 N-B3 B-N5
RECENT PROGRESS IN COMPUTER CHESS
65
CHESS 4.4 changed to a CDC 6400 after move 4 and is now back on the CYBER 175. 9 P-Q5(76) B x N/QB 10 P x N(83) Q x Q 11 R x Q(40) B-NS
12 P x P(36) 13 P-KR3(40)
R-QNl B-KB4
Back to the CDC 6400. The position is slightly in CHESS 4.4's favor but at the level the programs are playing, the game is wide open. 14 15 16 17 18
P-R3(40) P-KN4(40) R-Q2(32) N-Q4(97) R x B(81)
B-B4 B-B7 B-N6 B X N P-K4
19 20 21 22
R-Q2(115) P-NS(122) R-Q5(79) B-R5+(89)
R X P N-K5 P-KB3 K-K2
For the last 9 moves CHESS 4.4 has been playing weakly but still has the advantage. White weakened his position in an attempt to push the loss of the advanced Pawn over his search horizon. The CDC CYBER 175 takes over for the next dozen moves and clinches the game. 23 24 25 26
B-B3(64) R-R5(52) P-N4(25) B-K3(28)
27 B x N(39) 28 R x P/5+(55) 29 Wl-Kl(40)
P-B3 B-B7 Wl-QNl? N X P/4
P x B K-Q3
...
CHESS 4.4 is playing for bigger game than the Black Pawns hanging around the board (see Fig. 2). 29.. . 30 R-K6+(45) 31 R x P+(78)
R-KB2 K-B2 K-Q1
32 R/6-K6(86) 33 R-K8+(96) 34 B-Q5(66)
R-N3 K-Q2 R-B4
The end came on the CDC 6400.
FIG.2. Position after 29 W l - K I .
MONROE M. NEWBORN
66
35 R/l-K7+(81) 36 B-K6(88)
K-Q3 R-QB3
37 B x R(131) 38 P-B5+(67)
B x B Resigns
An error was made entering one of the last few moves into TREEFROG and when the authors realized it, they decided to resign the game. On October 19, after the third round was completed, Tournament Director David Levy played a simultaneous exhibition against the programs. In 1968, Levy made a $2000 wager with four leading computer scientists that no computer would defeat him in a match consisting of an even number of games within the next ten years. Since then, Levy has willingly taken on many chess programs and a few of his games appear in this section. The idea of a simultaneous exhibition originated with Richard Harbeck, who made local arrangements for the Minneapolis affair. One program passed up the exhibition giving Slate and Atkin the opportunity to enter two versions of CHESS 4.4, one running on a CDC 6400 and the other on a CDC CYBER 175. The programs were set to play at three minutes per move. Levy won ten games, drew two, and lost none. His draws, the first by a Master against a computer in any sort of public event, were against CHESS 4.4 running on the CDC CYBER 175, and TREEFROG running on the less awesome Honeywell 6080. The games ended at about 1 AM and perhaps Levy was not at his best by then. Levy’s game with CHESS 4.4 went as follows. WHITE : Levy 1 P-Q3
BLACK: CHESS 4.4
...
Designed to take CHESS 4.4 out of book. l 2 3 4
. . . P-KN3 B-N2 N-KB3
P- K4( 173) N-QB3(73) P-Q4(107) N-B3(51)
5 0-0
B-K2(86)
6 P-B3 7 P-QN4
P-KS(55)
0-O(50)
Levy attempts to keep the game quiet, hoping to build up a solid position. CHESS 4.4 has no patience and wants to get into tactical complications. Levy eagerly exchanges Queens when given the opportunity on move 9. 8 P x P P x P(81) 9 Q x Q R x QW)
10 N/3-Q2 11 N-B4
B-KB4(53) B-N5(60)
CHESS 4.4 will now cleverly fox Levy out of a Pawn and establish a clear advantage (see Fig. 3). 12 13 14 15
R-K1 N/l-Q2 N x B B-N2
B x P/7(127) B X N(116) R-Q6(98) P-QN4(111)
16 N-R3 17 WR-Ql 18RxR
R-Nl(138) N-K4(46)
...
RECENT PROGRESS IN COMPUTER CHESS
FIG. 3.
67
Position after 1 1 . . . B-NS.
Again, Levy wants t o simplify the position. 18 . . . 19 R-K2
N x R(34) N x B(36)
20 R x N 21 P x P
P-B4(96) P-QR3(100)
CHESS 4.6 would rather keep its Queenside Pawns together and leave Levy’s strewn about. 22 N-B2 23 N-N4
R-QBl(45) R x P(74)
24 N x P 25 N-N8?
R-B3(72)
...
Levy traps his own Knight! He receives some compensation in having two passed Queenside Pawns. 25 . . . 26 P-QR4 27 P x P 28 P-N6
R - N3(62) R x N(62) R-QBI(55) R-Nl(69)
29 B-R3 30 P-N7 31 B-B8
N-Q4(122) N x P(44)
...
Although CHESS 4.4 is up a Knight, the position is a bit too difficult for the program to come up with a winning procedure. The remaining 21 moves can be seen to be a standoff although CHESS 4.4 has an extra piece and better chances (Fig. 4). 31 32 33 34 35 36 37 38 39
... R-B2 K-N2 R-Q2 R-Q7+ R-Q6 R-QB6 R-B4 K-BI
K - B l(64) B - B3( 1 17) K - K2(48) P - N3(69) K- B l(67) B - K4( 120) P- B3(63) K - B2( 144) P- B4(2 14)
40 41 42 43 44 45 46 47 48
R-B5 R-B6+ R-B5 R-B4 R-B7 R-B6 R-B5 R-B7 R-B6
K-B3(98) K-N4(137) B-Q5(88) B-Rl(82) P-R3(272) B-K4(149) B-N2(96) B-B3(443) B-Rl(88)
68
MONROE M. NEWBORN
FIG.4.
49 50 51 52 53 54
R-B5 R-B6 R-B5 R-N5 P-R3 R-B5 55 R-B4 56 R-B5
K-N5(49) K - R4(50) N-R5(9) B-B3(10) N - B6(8) B -Q5( 1 1) B-Rl(7) K - N4(22)
Position after 31 B-BE.
57 P-N4 58 R-B4 59 R-B5 60 R-B6+ 61 R-B5 62 R-B7+ Drawn by
B-Q5(26) B-K4(10) K-B3(8) K-N2(35) B-Q5(52) K-B3(109) agreement
In Levy’s game against TREEFROG, he again tried to keep the position closed, was forced into a highly tactical battle and emerged with an advantage and forced mate in four on move 46 (see Fig. 5). The mating tree that Levy overlooked, perhaps because he was tired, is shown in Fig. 6. Levy was happy to settle for a Pawn thinking he was on the road to victory. The game continued: 46 . . . 47 P-B6 48 N-B4
B x PI5 B-K4 R-KR7+
49 K-N4 50 K-B3
P-R4+ R-R6?
Levy’s second error in five moves. The game ended in a draw on move 71 when it became clear that TREEFROG, with a Rook and Bishop, lacked the technique to defeat Levy, who had a lone Bishop and King. Two other events took place in 1975 indicating the growing international interest in computer chess. On August 2-3 at the University of Calgary, the Second Canadian Computer Chess Tournament was held. Tony Marsland was the organizer and he implemented his idea of handicapping programs based on the speed of the computers they were running on. Versions of five programs competed: CHESS 3.0, WITA (Marsland’s program), COKO 5 (Dennis Cooper and Ed Kozdrowicki), TREEFROG, and X , a program of unknown origin running on a PDP-I0 computer [most
RECENT PROGRESS IN COMPUTER CHESS
69
FIG.5. Position in TREEFROG (White) vs LEVY (Black) after White’s 46th move.
likely MAC HACK VI or TECH]. X won the tournament with a 21-3 score. Soule and Marsland (1975) conclude that “the results are difficult t o assess.” T h e First German Computer Chess Championship was held at Dortmund on October 8-10. Eight teams participated. The event was part of the Jahrestagung der Gesellschaft fur Informatik. Reinhard Zumkeller was the organizer and David Levy served a s Tournament Director. TELL, the work of Johann Joss from Zurich and DAJA, the work of Ludwig Zagler and Sigfried Jahn of the Institut fur Informatik in Munich, finished the three-round Swiss tournament tied for first place with 21 points apiece. T E L L won a playoff game for the championship. The level of play was clearly below the level of the North American tournaments. As Helmut Richter says, “Comparing Dortmond with the First United States Computer Chess Championship, you’ll find remarkable parallelisms” (Richter, 1976).
B-K5
B-K4
“ A
any move
-
,B-R7
mate
FIG.6. Mating tree: Black to mate in four in the position in Fig. 5.
mate
MONROE M. NEWBORN
70
2.2 1976: The Paul Masson Chess Classic and Other Events
At the initiative of Martin Morrison, currently Executive Director of the United States Chess Federation and at that time the organization’s Technical Director, an invitation was extended to CHESS 4.5 to participate in the Paul Masson Chess Classic played on July 23-24. Slate felt that the program was stronger than its rating of 1579 and so he elected to enter it in the B section of the tournament. In the past, the USCF had not been very supportive of activities in computer chess, but in the last two years, articles and information on computer chess have frequently appeared in the USCF’s publication, Chess Life and Review. In 1977, the USCF established a set of rules under which programs participate in USCF-rated tournaments (Morrison, 1977). In fairness, the earlier lack of interest may have been due to the weak play shown by the programs, although even the first World Computer Chess Championship in Stockholm went unmentioned. Played in a vineyard on the hillside around Sarasota, California, 756 players gathered for the largest western tournament ever held. CHESS 4.5, to the astonishment of everybody including Slate and Cahlander (Cahlander was in California, Slate at CDC’s new experimental CYBER 176’ in Minneapolis) won the B section, with a perfect 5-0 record (Morrison, 1976). In an article reporting on CHESS 4.5’s performance, Berliner (1976) observes that “to produce a perfect score against that caliber of competition should require a rating in the neighborhood of 1950.” CHESS 4.5 was entitled to a prize of $700 but turned it down, an understanding agreed upon before the tournament started. CHESS 4.6 moved very quickly averaging 55, 54, 59, 25, and 71 sec in games 1 through 5, respectively. In Round 1 , Neil Regan (1693) errored on move 19 in a somewhat complex position and resigned 12 moves later. The program’s second-round opponent, Mark Arnold ( 1704) attempted an unsound sacrifice on move 8 and never recovered. He resigned on move 22. Against Irog Buljan (1751) in Round 3, CHESS 4.5 displayed its endgame talents. After move 48 R x P/2, the position shown in Fig. 7 was reached. Berliner doubts whether “any program other than CHESS 4.5 could win this game against a competent defense. He says “CHESS 4.5 makes it look easy.” The win went as follows: BLACK: CHESS 4.5
WHITE: Buljnn 48 . . . 49 R-Q8
R- R5 K-R2
50 R-Q6 51 R-N6
N-B3 N-R4
I Cahlander says that the relative speeds of the CDC 6400, CYBER 74, CDC 6600, CYBER 175, and CYBER 176 are I , 1, 2.75, 6, and 12, respectively.
RECENT PROGRESS IN COMPUTER CHESS
71
FIG.7. Position after 48 R x P/2 in Buljan (White v s CHESS 4 . 5 (Black).
52 R-QB6 53 K - N I 54 P-R4 55 K-RI 56 R-B3 57 K-R2 58 K-N1 59 K-B2 60 K-Bl 61 R-B4 62 K - N l 63 K-N2 64 R-B6+ 65 R-B3
R-R7+ N-B5 R-KN7-t R-N6 K-N3 R-R6+ R x PI5 R-R7+ P- R4 N-Q6 R-R6 R x P R-B3 N-B5+
66 K-R2 67 R-B5 68 R-QR5 69 R-N5 70 R-N8 71 K - N l 72 R - N l 73 R-Rl 74 R-Ql 75 K-RI 76 R-Q6 77 K-R2 7 8 K x P
P- R5 K-R3 P-N4 P-N5 P-N6+ R- B3 P-R6 R-B3 N-K7+ R-KN3 P-N7+ P-N8 = Q + Q-KR8 mate
In Round 4, Wesley White (1742) fell prey after only 3 1 moves. In the final round, Herbert Chu (17841, the program's highest rated opponent, got into a highly tactical battle. Although he had an early advantage, he fell behind on move 19 and resigned on move 30. The performance was by far the most impressive by a program t o date even if one assumes that the California wine may have had an inhibiting influence on the humans. This however was merely the beginning. N o one imagined what was just around the corner. The next major event was ACM's Seventh North American Computer Chess Championship in Houston, October 19-21 (Levy, 1976a). CHESS 4.5 (and the CDC CYBER 176) returned to defend its title. CHAOS was the only other program given any chance, but the authors of CHAOS had done little to improve their program since the previous ACM tournament. TREEFROG's group had lost interest after graduating from Waterloo and
MONROE M. NEWBORN
72
did not return. Except for CHESS 4.5, there was perhaps a lull in progress and enthusiasm at Houston. Some of the older programs had retiredTREEFROG, COKO, The Dartmouth Program, TECH and TECH 11, for example-and the newer ones such as DUCHESS, BLACK KNIGHT, AND BLITZ I11 were just rounding into form. This latter group would catch up quickly, using the experience of those that came earlier to their advantage. But for the present, CHESS 4.5's edge over the other programs was the largest ever. It was searching trees with 150,000-800,000 nodes per move! CHAOS, running on an Amdahl470 was searching trees with up to 100,000 nodes per move. CHESS 4.6 won the tournament defeating WITA in Round 1, DUCHESS in Round 2, and BLACK KNIGHT in Round 3. Its victory over CHAOS in Round 4 was the most lopsided performance played between these two rivals. It is presented here t o show the contrast. Cahlander provided this writer with CHESS 4.6's printout of the game and the data are presented here in digested form. After every White move, there is indicated (1) the time in seconds, (2) the maximum depth of the iterative exhaustive search, (3) the number of nodes examined x lo4, and (4) the number of plies that the game followed the continuation expected by the program. If the opponent made the expected reply, that counts for one move. WHITE: CHESS 4.5
BLACK: CHAOS Sicilian Defence
1 P-K4(B) 2 N-KB3(B) 3 P-Q4(B)
P-QB4 N-QB3 P x P
4 N x P(B) N-B3 5 N-QB3(B) P- K3 6 N x N(54,6,19,3) . . .
This guarantees that CHESS 4.5 will either give CHAOS an isolated Rook's Pawn or trade Queens preventing CHAOS from castling. Nothing else looks better. 6 7 8 9 10 11 12 13
. . . P- K5(85,6,27,4) N x N(255,7,82,1) B-Q3(77,6,24,0) 0-0(71,6,23,0) Q- K2(66,6,20,1) P-QN3(108,6,33,0) P- QR4(77,6,23,0)
P/N x N N-Q4 PIB x N P-N3 B-KN2 B-N2 Q-K2 Q-N5
14 15 16 17 18 19 20 21
B-R3(78,6,24,0) P-B4(110,6,33,0) B-Q6(88,6,28,0) P-R5(93,6,29,0) Q-B2(149,6,45,1) B x B(201,7,67,2) Q-R4(279,7,85,0) K-R1(169,8,58,0)
Q-B6 P-B3 P-B4 K-B2 B-KBl WKR x B Q-B4+ K-N2
In the position shown in Fig. 8, CHESS 4.5 cons CHAOS out of a Bishop for two Pawns and a victory two moves later.
RECENT PROGRESS IN COMPUTER CHESS
FIG. 8. Position after 21
22 P-QN4(164,7,57,0) Q x P/5 23 R/R-N1(113,7,45,2) Q x P/4
73
. . . K-N2
24 R x B(64,6,23,0) 25 B-N5(129,7,51,?)
Q-R5 Time forfeit
CHAOS calculated for over 40 min, carrying out deeper and deeper iterations looking for a way to proceed and finally lost on time. CHESS 4.5 saw the continuation 25 . . . Q x P/B7 26 R x P/Q R-B2 27 Q-KB6 K-R3 28 R x R. Once again, Tony Marsland organized a computer chess event in Canada, this time a workshop in Edmonton on June 2 6 2 7 (Marsland, 1976). Eight programs participated in a “handicapped” tournament with OSTRICH (Newborn and Arnold) coming out the champion, winning 34 out of 4 points. In Amsterdam on August 9-1 1, eight teams participated in the European Computer Chess Championship. The three-round Swiss tournament was won by MASTER with ORWELL and T E L L tying for second place (Bakker, 1976). 2.3 1977:CHESS 4.6 Plays Expert Level Chess
With continuing support from Dave Cahlander and the CDC CYBER 176, Slate and Atkin entered CHESS 4.5 in the 84th Minnesota Open (Minneapolis, Minnesota, February 18-20, 19771, a six-round Swiss tournament open to anyone who wanted t o participate. To the great surprise of everyone, the program trounced five players: Warren Stenberg (1969), Charles Fenner (2016), Gerald Ronning (1965), Rick Armagost (1947), and Robert Johnson (1954). Its only defeat was delivered by Walter Morns (2175) (see Cahlander, 1977a). CHESS 4.6 won the tournament and thereby qualified for the Minnesota State Championship that was played the following weekend in Minneapolis. This contest was a “closed tournament,” closed to all but the best players in the state, and it was a round robin, that is, everyone played everyone else. The competitors in this
MONROE M. NEWBORN
74
event studied games played the previous week by CHESS 4.5, looking for weaknesses. This marks the first time in chess history that computer games were studied seriously by good players concerned with the possibility of losing! Nels Truelson (2079), Peter Thompson (2142), and Lasloe Ficsor (2110) defeated the program in Rounds 1, 2, and 3, respectively. The program defeated Rick Linden (1850) in Round 4 and drew with John Greene (1899) in Round 5 . On the basis of the 16 games played in the Paul Masson Chess Classic, the Minnesota Open, and the Minnesota State Championship, CHESS 4.5 received a performance rating of 2099. We present the endgame of the CHESS 4.5 versus Stenberg game in Section 6 when specifically discussing endgame play. Berliner (1977a) comments on the CHESS 4.5 vs. Fenner game and the CHESS 4.5 versus Johnson game. Fenner has the dubious distinction of being the first Expert ever to have lost a tournament game to a computer. The game, with some of Berliner’s comments in quotations, follows (see page 72 for an explanation of the information associated with CHESS 4.5’s moves). WHITE: CHESS 4.5
BLACK: Fenner Sicilian Defense
1 2 3 4 5 6
P-K4(B) N-KB3(B) P-Q4(B) N x P(B) P-QB4(B) B-Q3(B)
P-QB4 P-K3 Px P P-QR3 N-KB3 Q-B2
7 8 9 10 11 12
0-0(110,6,34,0) N-N3(67,6,19,0) N-B3(64,4,18,2) B-N5(83,6,24,1) B x N(128,6,36,2) Q-K2(59,6,18,2)
B-B4 B-R2 N-B3 N-K4 P x B P-Q3
“It appears that Black has gotten the better of the opening having secured the Bishop pair but appearances are deceptive. White has a good game and now finds a way to assert his space advantage.” 13 K-R1(47,6,15,4) 14 P-B4(59,6,19,2)
B-Q2 N x B
15 Q x N(345,7,101,0) 0-0-0 16 WR-Q1(99,6,30,0) ...
“Despite the natural appearance of this move, the immediate P-KB5 is stronger, as it would force Black to move one of the Rooks to KBl to defend against the threat of P x P.” CHESS 4.5 saw 16 . . . B-Nl 17 P-KR4 P-QN3 18 P-KBS K-N2 19 P x P B x P. This is a tough position for a program to find a good continuation. CHESS 4.5 feels that White has a Pawn advantage.
+
1 6 . . . B-B3
17 P-KB5(95,6,28,0)
B-N1
“This move attempts to find counterchances, by mate threats along the diagonal. However, the simple 17 . . . Q-K2 is in order.”
RECENT PROGRESS IN COMPUTER CHESS
18 P-N3(99,5,25,0)
. .
75
.
“CHESS 4.5 shows fine judgment in avoiding 18 P x P P x P 19 R x P P-Q4 20 Q-R3 P x P/K5 after which Black’s pieces gain new life and his chances impr0v.e against the game continuation. Now Q-K2 appears in order for Black after which White can continue with 19 P x P 20 N-Q4 with the better game.” The iterative exhaustive search was limited to a depth of 5 levels the lowest in the game thus far, because, most likely, the alpha-beta window had to be modified on the last iteration (see Section 3.6). P-KR4?! 18 . . . 19 P x P(62,6,21,0) P-R5 20 R x P(71,6,22,3) P x P/6
21 Q x P/3(88,6,27,1) WQI-Nl 22 P x P(67,6,22,3) ...
“This is the fly in the ointment. After the incorrect 22 Q-B4, Black would play P-Q4!! with many threats and the better game. Now if 22 . . . R x Q 23 P-B8 = Q + R x Q 24 R x R+ K-Q2 25 R-B7+ wins.” CHESS 4.5 predicted the game would continue: 22 . . . Q x P 23 R x Q R x Q 24 N-R5 B-B2 25 N x B P x N with an evaluation indicating a two-Pawn advantage. 22 . . . Q x P 23 R x Q(161,7,53,2) R x Q
24 N-Q5(197,7,64,0)
...
“Here Black had the temerity to offer a draw, which was declined” (see Fig. 9). 24
...
B-Kl?!
“Setting a cute trap, which unfortunately for Black is not quite sound.” 25 N-N6+(67,7,26,2)
. . .
FIG.9. Position after 24 N-QS.
MONROE M. NEWBORN
76
CHESS 4.5 sees 25 . . . K-Ql R X P/7 26 B-B2 N-B5 27 R-N3 N x P, and feels White is ahead by the equivalent of three Pawns. 25 . . . K-Ql 26 R x P/7(266,8,96,0) B-B3
27 R X B+(67,8,24,6) K-B2 28 R-QB8+(1,3,3,3) ...
“A very clever move, which escaped me when I first saw the game. The point is that on 28 . . . K x N , 29 R x P!! and with pieces loose all over the place, Black is lost.”
28 . . . R x R 29 P x R(102,9,38,2) B x P+ “30
...
K x N,
30 K-N1(269,lO,112,0)
R- KRl?
31 R x P+ was better but hopeless.”
31 N-Q5+(178,8,70,0)
K-B3
32 N-R5+(204,8,90,?)
Resigns
“On 32 . . . K-B4,33 P-QN4 mate, or 32 . , . K-Q2,33 N-B6+ wins the Bishop. A crisp beautiful game; probably the best ever by a computer, and against a rated Expert.” CHESS 4.5 saw winning the Bishop through 32 . . . K-Q2 33 N-B6 K-B2 34 N X B R-R4 P-N4. CHESS 4.6 examined just under 400,000 positions per move; each nonbook move averaged 49 sec. One month later in New York at the Essex House just off Central Park, CHESS 4.5 gave a simultaneous exhibition itself. Dave Cahlander arranged it as a promotional event a t which CDC officially launched the CYBER 176. Taking on ten opponents including Eric Bone (2150), Burt Hochberg (editor of Chess Lijii and Review), Walter Goldwater (an A player and president of the Marshall Chess Club), and one other A player, the program won eight, drew one, and lost one (to Bone). Lasker, who played the Bernstein Program in 1958, said at that time that no program would reach Master status. He observed, after playing CHESS 4.5 that “ I am still not ready to admit that a computer will ever attain Master strength, though I will concede that the one made by Control Data Corporation plays better than I had expected anyone would be able to make it do” (Lasker, 1977). Goldwater ( 1977) wrote after losing that he was very upset for two days, but after finding out that CHESS 4.5’s rating was higher than his, he did not mind quite as much. He said he “phoned various friends-Robert Byrne, Nat Halper, Milton Hanauer-they were upset, too. It was not that they felt I had played worse than usual, but rather that they could see that this was now something to cope with.” As part of the CYBER’s coming out party, it also played four games of speed chess with Levy, winning two and losing two. Levy estimated its playing strength at the 2300 level but argued that “it has insufficient posi-
RECENT PROGRESS IN COMPUTER CHESS
77
tional understanding and strategic planning to enable it to perform at Master level under tournament conditions” (Levy, 1977a). On April I , Levy went to Carnegie Mellon University to play a twogame challenge match against CHESS 4.6. He won the first game, and because he could not lose the match, the second game was called off. Donald Michie was a Visiting Professor there and arranged the affair (Michie, 1977). With an audience of 400 watching, Levy steered CHESS 4.5 into a Sicilian opening in which he felt the program had played incorrectly in the past. After Levy’s ninth move, the position was as shown in Fig. 10. Surely enough, on the first move out the book, CHESS 4.5 continued as Levy had anticipated with 10 N x N (Levy, 1977a). Berliner, who served as commentator for the game, observes that by move 17, CHESS 4.5 had recovered and that in fact Levy was then “probably lost” (Berliner, 1977b), although Levy (1977a) does not agree. On move 26, Berliner says that CHESS 4.5 made the “fatal error.” The game went as follows. BLACK: Levy
WHITE: CHESS 4.5 P-K4(B) N-KB3(B) P-Q4(B) N x P(B) 5 N-QB3(B)
1 2 3 4
P-QB4 P-Q3 Px P N-KB3 P-KN3
10 N x N(70)? P x N 1 1 0-O(209) N-Q2 12 P-B4(193)
6 7 8 9
P-B3(B) B-K3(B) Q-Q2(B) B-QB4(B)
13 B-K2(80) 14 P-QN3(59)
B-N2 0-0 N-B3 P-QR3 B-K3
...
N-N3
Levy expected this, pointing out in his analysis that “programs are quite prone to weakening Pawn moves because they have little understanding of strong and weak squares.”
FIG. 10. Position after 9 . . . P-QR3.
78
MONROE M. NEWBORN
14..
.
N-Bl
15 P-QR3(163)
...
Berliner gives CHESS 4.6 credit for “a fine move which meets the threat of Q-R4.” Levy felt that 15 B-Q4 was better. 15 . . . 16 P-QN4(93)
Q-R4 Q-B2
17 P-B5!(121)
...
Opinions differ here. Berliner argues that Levy is probably lost. Levy contends that “White’s K5-square is weak and his King’s Pawn is in danger of becoming isolated.” 17 . . . B-Q2 18 B-R6(644) . . . Berliner says that CHESS 4.5 “misjudges the situation and allows the exchange of Queens. With 18 R-B3 this could have been prevented and the attack with B-R6 would be extremely strong.” Levy agrees that 18 B-R6 was not best. He suggests that “the correct plan begins with 18 K-Rl followed by R-B3, B-R6, and R-R3.” 18 19 20 21 22
... K-Rl(115) Q x Q(25) R-B3(118) B x B(143)
Q-N3+! Q-Q5 B x Q B-N2 K x B
23 24 25 26
R-QNl(107) W3-Bl(l53) WN-Ql(46) P-QR4?(129)
N-N3 WB-QNl P-B3
...
Both Berliner and Levy see this as CHESS 4.6’s fatal error. Berliner says that “Black’s Queenside Pawn structure collapses.” 26 27 28 29 30 31 32 33 34
... P-QR4 P-N5(68) P x P/N4 P x P/5(134) R-QBl R-Q3(143) R- B4 R-N3(243) WR-QB1 Wl-B3(118) P-RS P-R6 P-R4(54) P x P(391) PXP R-K3(249) B-K3
35 36 37 38 39 40 41 42 43
P-RS(245) N-Q5(75) R - QR3( 142) P x B(148) B-Ql(318) K-R2(137) B-N3(118) R x Q(63) Resigns
P-N4 P- R7 B x N R x PI7 R-Q7 R-B8 P-R8 = Q R x R
IFIP Congress 77 hosted the Second World Computer Chess Championship in Toronto on August 6-9. Sixteen programs participated in the four-round Swiss style event that was the biggest show yet for the computer chess world. KAISSA, running on an IBM 370/168 returned to defend its title. CHESS 4.6 came as the principal challenger. DUCHESS had been greatly improved as had BELLE, and both figured to be contenders. But at the end of the four days, there was no doubt that CHESS 4.6
RECENT PROGRESS IN COMPUTER CHESS
79
was best, defeating BCP in Round 1, MASTER in Round 2, DUCHESS in Round 3, and BELLE in Round 4 (and KAISSA in a friendly match following the tournament since the two had not met during regular play). Searching exhaustively t o a depth of six plies on most moves and deeper in endgame play was too much for any of its opponents. DUCHESS, improved in the months leading up to the tournament, upset KAISSA in the first round and, although the strongest challenger, was not a threat to the champion. The game between BELLE and CHESS 4.6 to decide the championship went as follows. Slate assisted with the analysis. As can be seen, BELLE played quite strongly but was in contention for no more than the first 24 moves. WHITE: BELLE 1 P-K4 2 N-KB3 3 P-Q4
N-QB3 P-K3 P-Q4
BLACK: CHESS 4.6 4 N-B3 5 P-K5
B-N5 N/Nl-K2
Slate was concerned with the weakness of Black’s Kingside. He knows that CHESS 4.6 has no algorithm specifically designed to be aware of the King’s vulnerability. 6 P-QR3
.
..
BELLE invites CHESS 4.6 to isolate a Pawn, something that CHESS 4.6 rarely passes up. 6 . . . BxN+ 7 P x B N-R4
8 B-N5+ 9B-Q3
B-Q2
...
This adds strength to White’s Kingside threats.
9 . . . 10 N-NS
R-QBl P-KR3
11 N-B3
.
.
,
Whew! Slate was worried about 11 Q-R5, threatening mate on KB7. 11
...
P-QB4
Threatening t o isolate more Pawns! 12PxP R x P 13 B-K3 RX P 14 B x P/7 N-B5
15 0-0 16RxR 17 B-QB5
This is necessary to avoid 17 . 17..
.
Q-R4
18 B-Q6
R X P/6 N x R
...
. . P-QN3. N-B5
MONROE M. NEWBORN
80
FIG.11.
Position after 18
. . . N-BS.
(See Fig. 11.) Slate was worried about 19 B x N/4. This would strengthen White’s Bishop’s hold on Q6. White instead rushes to the endgame with
19 Q-Rl N-B3 20 Q x Q N/3 x Q
21 R-R1
B-BI
This gives Black space to move and eliminate a back row mate.
22 P-B3?
N-B3
23 R-R4
...
Slate pointed out that 23 B x N is better, solidifying the position of White’s remaining Bishop.
23
. . . NxB!
Slate was concerned that CHESS 4.6 might not play 23 . . . N x B because 24 P x N gives White a passed Pawn that the program will believe is a serious threat. A search of seven plies is necessary to see how to proceed correctly.
24PxN 25 R-KN4 26B-B2
K-Q2 P-KN4 KXP
27 R-QR4 28 R-Rl
P-N4 P-QNS
CHESS 4.6 will now have a passed Queen’s Pawn, which is what it wants. 29PxP N x P 33 N-K2 34 N - N l 30 B-Nl B-Q2 31 K-Rl P-B4 35 R-R5 32 N-Q4 R-QBl! White cannot play 36 R x B because 3 7 . , . N x P mate! 36 P-B3 B-B8 38PxP 37 P-R4 R-N7 39 K-R2
B-N4 R-B8 R x B!
of the mate threat 36 BxP+ Px P
. . . N-Q6
RECENT PROGRESS IN COMPUTER CHESS
40 R-R4 41 K-N3
B x P+ B-R4
42 K-R3 43 R-R8
P-B5 B-N3
CHESS 4.6 threatens mate in one with 44
. . . B-B4
44 45 46 47 48
B-B6 N-Q6 R-N7+ N-B7mate
K-N4 K-R3 K-R2 R-Q8+ R-KN8
R-N7+ R x N R-N5 K-K4 B-K5
49 50 51 52
81
R-N7 R-KR7 R-R3 K-Rl
mate.
As exciting as the tournament itself was the presence of Dr. Mikhail Botvinnik. He was a guest of the tournament and his presence among the participants added immeasurably to the stature of the event. His own program PIONEER, is under test in the USSR and nearing the day when it will be ready to compete. Botvinnik discussed his ideas with the participants, his pleasure in seeing the progress made so far, and his concern that for programs to achieve significantly stronger levels of performance, much more selective search is necessary. When given the difficult position in Fig. 12 (credited to G. Nadareishvili), PIONEER produced the correct solution after considering only 200 moves. The computer time required was about 3f hr! The correct continuation is 1 P-N6
2 P-N7
K-B3 B-R2
3 P-K4!
...
According to PIONEER, not 3 K x B because 3 . . . N-B6 4 P-N8 = Q N-N5+ 5 Q x N K x Q 6 P-R6 P-B5 7 K-N7 P-B6 8 P-R7 P-B7 9 P-RS = Q P-B8 = Q 10 Q-R3+ K-N5. 3 . . . 4 P-K5+
N-B6 N x P
5 K x B N-B2 6 P-N8 = Q N-N4-t
FIG.12. Endgame position with White to move and win (G. Nadareishvili).
MONROE M. NEWBORN
82
7 Q x N 8 P-R6 9 K-N7
K x Q P-B5 P- B6
10 P-R7 P- B7 11 P-R8 = Q P-B8 12 Q-R6+!
=
Q
PIONEER’S tree, redrawn in a readable form from the computer’s printout by Botvinnik is shown in Fig. 13. The principal continuation was 25 plies long. On September 16-18, CHESS 4.6 traveled to London via telephone and with Cahlander’s help participated in the top section of the Aaronson Chess Tournament. Grandmasters Hort (Czechoslovakia) and Kotov (Soviet Union) were the class of the field. The Minnesota Chess Journul (1977) reports that “the electronic monster lost its first game, then won two, and drew the last three for a plus score of 3C24. The three opponents with which it drew in the last three rounds were very strong: Joppen (Switzerland) rated 2200; Seewald (Netherlands), 2 160; Flear (Britain), also 2160. The computer’s third-round victim was P. Lewin from Britain with a 1680 rating, while its adversaries in the first two rounds were unrated.” This is the first time a computer has drawn with a Master in tournament play. After having had a stunningly successful year, CHESS 4.6 went into the ACM’s Eighth North American Computer Chess Tournament in Seattle as a heavy favorite. Twelve programs participated in the four-round event held on October 13-15 (Bailey, 1977); eight of them had participated two months earlier in Toronto. Jack Perkins of NBC-TV covered the tournament and his feature story appeared on the TODAY Show on October 23, 1977. CHESS 4.6 had to be happy with a tie for first place honors with an improved DUCHESS. Undefeated going into the final round, the two met and played each other to a draw. Truscott attributes some of the improvement in DUCHESS to better tree-searching techniques. DUCHESS, as does CHESS 4.6, makes extensive use of transposition tables. Levy played a simultaneous exhibition against the programs, winning eight, drawing with CHAOS, and losing to CHESS 4.6. The tournament included the participation of the first microcomputer, an Intel 8080, programmed by Arnold Epstein. DUCHESS’S draw with CHESS 4.6 was an exciting game. It seemed for a good part of the game that DUCHESS was going to upset CHESS 4.6, but the champion hung on. We present here the first 32 moves. The remainder of the game appears in Section 5 where endgame play by programs is discussed. WHITE: DUCHESS 1 P-K4 2 P-Q4
P-QB4 P XP
BLACK: CHESS 4.6 3 P-QB3 4 N x P
P X P N-QB3
MONROE M. NEWBORN
84 5 N-B3 P-Q3 6 B-QB4 P-K3 70-0 N-B3 8 Q - K 2 B-K2
9 R-Ql 10 B-K3 11 B-Q2
P-K4 N-KN5 N-Q5
CHESS 4.6 does not search deeply enough to see that this loses a Pawn. 12NxN 13 N-N5 14 B-B4 15NxP/4
P x N Q-N3 N-K4 N x B
16QxN 17 Q-R4+ l8Q-N3 19QxQ
B-N5 B-Q2
0-0 P x Q
We see that CHESS 4.6 would rather castle as it did on move 18 than avoid doubled Pawns. DUCHESS has slightly better Pawns but CHESS 4.6 has a pair of Bishops. 20 21 22 23 24 25
P-QR3 WR-B1 R x R B-K3 P-B3 R-Ql
WB-B1 R x R R-R5 B-Q1 R-R4 B-KB3
26 R-Q2 27 N-K2 28 P-QR4 29 N-Q4 30K-B2
R-QB4 R-B3 B-K3 R-B8+
...
CHESS 4.6's check allows White's King to pick up an important tempo in its eventual move to the center to support its Pawns. 30.. . 31N-N5
B-Q2 BXN
32 P x B R-QN8
(See Fig. 14.) The game is completed on page 104 in Section 5. Bobby Fischer has been closely following developments in computer chess. Some time in the spring of 1977, he played an early version of CHESS CHALLENGER out of curiosity, commenting on its performance in Penrod's Computer Chess Newsletter (Penrod, 1977a). He also played three games against MAC HACK VI (Penrod, 1977b). It is not clear whatcomputerorwhatversionofMACHACKV1 was used. Thegamesmay be the only published games played by Fischer since winning the World Championship in Iceland. 2.4 1978: More Accomplishments
The pace of activities continued strongly into 1978. The first tournament exclusively for microcomputers was held on March 3-5 in California (see Section 7). CHESS 4.6 participated in the Minnesota Twin Cities Open on April 29-30. It entered as the highest rated player and lived up t o expectations
RECENT PROGRESS IN COMPUTER CHESS
FIG.14. Position after 32
85
. . . R-QN8.
by winning the tournament with a 5-0 record. Going into the event, the program had a rating of 1936 and Douglas indicates that its performance raised that number by about 35 points. In other recent tournaments, CHESS 4.6 received performance ratings over 2 100; these performances raised its previously established rating that until the last year or so was somewhere in the 1600s. Chestor, an electronic chess board controlled by a microprocessor was used for the first time by CHESS 4.6. It sensed moves made on a special chess board, transmitted them to the CYBER 176, and indicated the CYBER’s responses by blinking lights (lightemitting diodes) on a special board. On May 6, Grandmaster Walter Browne gave a simultaneous exhibition and had the misfortune of losing to CHESS 4.6 on one of the boards. The narration, in Chess Life and Review by John Douglas (1978), is delightful. The game lasted 63 moves. CHESS 4.6 predicted 35 of Browne’s 58 nonbook moves. In one position CHESS 4.6 searched a tree having 2,158,456 nodes. Douglas indicates that after move 32, Browne had used 22 min thinking at the board while CHESS 4.6 had consumed 2 hr 44 min, a ratio of almost 1 to 8. WHITE: W. Browne 1 P-Q4 2 P-QB4 3N-KB3
N-KB3 P-B4 P x P
BLACK: CHESS 4.6 4 N X P 5N-N5 6 N/l-B3
P-K4 B-B4 0-0
Douglas observes that “Browne’s Knight move brings CHESS 4.6 out of the opening book. At this point CHESS 4.6 has used 2 min and Walter Browne has hardly broken stride as he passed. But now the skid marks in front of the electronic chessboard are added to each time Browne passes by.”
MONROE M. NEWBORN
86
7 P-K3 8 B-K2 9N-R3 10 N-B2
P-Q3 P-QR3 N-B3 B-B4
11 0-0 12 P-QN3 13 B-N2
Q-Q2 K-Rl R-KNl
Douglas notes that “CHESS 4.6 is having trouble finding something to improve its position. It predicts 14 Q-Q2 P-R3 15 WRl-Q1 N-K5 1 6 N x N B x N.” 14 N-R4 15 B-R3
B-R2 P-R3
16R-Bl
...
The editor of Chess Life and Review points out that 16 B x P is met with 16.. . B x N ! 16 . . . 17N-N4 18BxN 19 Q-Kl 20 B-KB3
WR1-QI NXN Q-B2 B-B4 B-Q6
21BxB 22 B-K2 23 P-B3 24 P-B4 25 N-B3
P x B B-B4 P-K5 B-Q2 Q-R4
“Now the visiting Master stops, does a double-step, smiles: he’s got his thing now. The Queen is out of play. Browne begins a Kingside attack, smiles at the spectators, savors his move.” 26 Q-R4
...
“Thump, smile. It’s lucky that the electronic board can only sense the position of the piece and not the force with which it is moved, for Browne’s forceful play intimidates the spectators.” 26 . . . 27 R-QB2
B-B3 P-QN4
28 P-KN4 29 N-QI
P-N5 R-Q3
CHESS 4.6 expects 30 P-N5 N-R2 31 N-B2 Wl-Q132 B-N4 Q-N3 33 B-B5 but Browne played: 30 N-B2 31 R-Ql
Wl-Q1 R X R+
32BxR
...
“Small thump, walk away, stop, look back, frown.” 32.. . 33 Q-N3
R-Q3 Q-Q1
34 R-Bl
..,
“Now Browne is defending. Things are not going well at some of the other boards, either, but it is here that Browne spends most of his time” (see Fig. 15).
RECENT PROGRESS IN COMPUTER CHESS
87
FIG.15. Position after 34 R-BI
3 6 P X P N-R2 34. . . R-Q7! 37 P-N6 P X P 35 P-N5 P x P CHESS 4.6 is now a whole Pawn up! 38 Q x P 39 Q-B5 40Q-B4 41 P X Q 42 N-K4 43BxP 44NxP 45 R-Q1 46 P-QR3
Q-R5! B-Q2 Q x Q P-K6 P-K7 R x B B-Bl R-Kl P XP
47 R-Rl P-N4 48 P X P R-K4 49 P-N4 P-R4 50 N-Q3 R x P+ 51 K-B2 P XP 52 N X P R-QR4 53 K-K3 B-K3 54 K-Q4 N-N4 55N-B2 . . .
Browne offers a draw, but CHESS 4.6 decides to play on. 55.
..
56 N-N4 57 K-B5
P-R7 R-R5 N-KS+
58 K-N5 59 N-B6
B-Q2+ N-B6+
CHESS 4.6 hammers in “the final nail.” 60 K-B5 B x N 61KxB RxP+ 62 K-Q6 R-Q5+
63 K-K5 64 Resigns
R-Q8
The Jerusalem Conference on Information Technology hosted Israel’s first major computer chess event on August 7-9. CHESS 4.6, DUCHESS, CHAOS, OSTRICH, TELL, and BS ’66 ’76 participated. The programs ran on computers in Israel: CHESS 4.6 on a CDC CYBER 74 at the Hebrew University, DUCHESS on an IBM 370/158, CHAOS and BS ’66 ‘76 sharing an IBM 370/168, OSTRICH on a Nova 3 provided by Data
MONROE M. NEWBORN
88
General sales representatives, and TELL on a HP 2000. In general, each program played on a somewhat less powerful system than it was accustomed to. CHESS 4.6, on the CDC CYBER 74, was only able to search trees containing typically 40,000-150,000 nodes, rather than the gigantic trees they are able to handle on the CDC CYBER 176. CHESS 4.6 was unable to use its transpositions table due to a lack of sufficient memory space and was unable to think on its opponent’s time for various reasons. While the programs were, perhaps, not at their best, they were still quite strong. Shimon Kagan, the Tournament Director, and an International Master, played speed chess against CHESS 4.6 on Israeli national television. Due to a lack of time, the game was adjudicated a draw after about 20 moves. DUCHESS won the tournament with a perfect record much to the delight of Truscott. It defeated CHESS 4.6 in Round 2 and CHAOS in Round 3. Kagan commented to the audience halfway through the DUCHESS-CHAOS game that it could have been a game played between two grandmasters. Against CHESS 4.6, DUCHESS went ahead by a Pawn on move 12, an exchange somewhat later, and eventually into an endgame in which it had two passed Pawns on the Queenside. After playing very passively for a number of moves, DUCHESS finally got down to the business of pushing its passed Pawns forcing CHESS 4.6 to resign on move 60. Following Seattle, DUCHESS was improved in several ways. Its opening book was improved, searches along checking sequences were improved (DUCHESS follows all checking sequences of up to three moves, rather than just two as it did at Seattle), criteria for trading were made more sophisticated, the King safety algorithm was improved, and the “Knight-on-the-rim-is-dim” algorithm was added. It was also programmed to think on its opponent’s time. DUCHESS participated in a three-round B level human tournament at Chappel Hill, North Carolina in the Spring of 1978 and received a USCF performance rating of 1783. In that event, DUCHESS won two games and lost one. The DUCHESS-CHESS 4.6 game follows and shows the vincibility of the champion. WHITE: DUCHESS 1 P-K4 2 P-Q4 3 P-K5 4N-KB3 5 B-QN5
N-QB3 P-Q4 P-B3 B-B4 Q-Q2
BLACK: CHESS 4.6 60-0 7 B-Q3 8 P x P 9 P-B3 IONxP!
P-QR3 B-N5 NxP/3 P-K4 B x Q
10 . . . N x N is no better. White still is able to hold on to its one pawn advantage by continuing 11 P-B3.
RECENT PROGRESS IN COMPUTER CHESS
I I N x Q 12RXB 13 B-B5 14 B-K6 15 B-N4 16R-KI
N x N B-K2 R-Q1 N-Bl N-N3 0-0
17 P-QN3 18 P-QR4 19 R-K6 20 B-Ql 21RxR 22 P x P
89
P-N4 P-N5 N-R4 R-Q3 B x R N-B3
CHESS 4.6 seems to be floundering about using the smaller CYBER 74. 23 B-B3
. . .
(See Fig. 16.) In this position CHESS 4.6 is willing t o play 23 . . . R x B feeling that it gains sufficient compensation with 23 . . . R X B P X R 24 N x PIQ. This exchanges a Rook for a Bishop and Pawn and gives Black a passed Pawn and White isolated Pawns. The reader might compare this move with move 25 (and in particular the principal continuation found for that move) in the Stenberg versus CHESS 4.5 endgame on page 102.
FIG.
23 . . . 24 P x R 25 N-Q2 26 B-N2 2 7 N x N
R x B N x P/Q N-R5 N/Q x P/B+ N x N +
16. Position after 23
28 29 30 31 32
K-N2 B-B3 B-Kl R-BI R-B6?
B-B3.
N-Q7 N-K5 N-B3 N-R4
...
DUCHESS plays weakly for the next 20 moves giving C H E S S 4.6 every possible chance. White could have played 32. P-N5 leading to a big advantage in a few moves. 32 33 34 35 36 37
... K-BI B-Q2 R X P/6 K-K2? K-K3
N-B5+ N-Q6 B x PI7 P-R4 N-B5+ N-R6
38 K-Q4? 39 R-R8+ 40KXP 41 B-NS? 42 B-K7 43 R-KB8
N x P K-B2 N-N5 K-N3 B-N6 P-R5
MONROE M. NEWBORN
90
44 K-B6 45 R-BI 46 R-KRI 47 B-Q8 48 B x P 49 K-N7 50KxB
P-R6 N-K6 P-R7 N-B2 N-Q5+ BXB N x P
51 R X P 52 K-Q6 53 K-Q5 54 K-B4 55 R-R2 56KxN 57 R-KN2
K-B4 P-N4 P-N5 P-N6 K-K5 K-Q4 K-K4
DUCHESS has finally quenched the fires! 5 8 R x P K-Q4 59 P-R5 Resigns David Levy accepted two more challenges, the first from CHEOPS, the latest effort of Richard Greenblatt. and the second from CHESS 4.7. In the last yearor so, Greenblatt added ahardware packagetoMAC HACK VI. The package was designed to grow trees at nonquiescent terminal positions arrived at by the main PDP computer. Greenblatt told Levy the hardware is capable of evaluating 150,000 positions per second. Feeling that the program was considerably stronger than it had been, Greenblatt extended a challenge to Levy. On August 20, Levy defeated the program in 43 moves. The strength of CHEOPS is difficult to assess based on only one game but I do not believe it is up to the level of the other top programs. CHESS 4.7’s final attempt came in Toronto at the Canadian National Exhibition late in August. A six-game challenge match was played. The first game on August 26 ended in a draw with Levy narrowly escaping defeat. Levy had no difficulties in games two or three. To win his bet he merely had to draw the fourth game. He was quite confident that if he played his own conservative style game, the machine would eventually create its own weaknesses, defeating itself. But he wanted the match to end with him conquering the machine at its own game-in a highly tactical encounter. And thus, he managed to become the first Master to lose a game to a computer in a tournament environment. The fourth game with assistance in annotations by Leon Piasetski and Kevin Spraggett is presented here WHITE: CHESS 4.7
BLACK: Lery Latvian Gambil
1 P-K4(B)
2 3 4 5
N-KB3(B) P x P(B) N-KS(B) N-N4(B)
P-K4 P-KB4 P-K5 N-KB3 P-Q4
6 7 8 9 10
N x N+(B) Q-RS+(B) Q x Q+(B) N-B3(B) P-Q3(92)
Q x N Q-B2 K x Q P-B3 P x P
RECENT PROGRESS IN COMPUTER CHESS
91
The game is far from the highly tactical battle that Levy promised. But the blame is not his. 11 B x P(106) 12 B-KB4(198) 13 P-KN4(86)
14 P x B(88) 15 0-O(182)
N-Q2 N-B4 N x B+
B-B4
.
..
15 P-B3, and then if 15 . . . P-R4, 16 0-0-0 P x P 17 P-Q4 gives White a better position. The bias on castling is very high. 15
...
P-KR4!
Now White's Kingside is in serious trouble. If 16 P-KR3, Black plays 16 . . . P x P 17 P x P R-RS! White must play for time: 16 N-R4(125)
B-Q5
Levy preferred to avoid 17 . . the center of the board. 18 P-Q4(153) 19 P-KR3(106) 20 WB-Kl(236)
B-Q3 P-QN3 B-Q2
17 B-K3(200)
.B
B-K4
x B 18 P x B giving White strength in
21 N-B3(51) P x P 22 P x P(108) R-R5 23 P-B3(67) WI-Rl
It looks as though Levy is well on the way to winning his bet (see Fig. 17). 24 K-Bl(120)
B-N6
Levy overlooks 24 25 R-K2(84) 26 K-N2(103) 27 B-Nl(67)
. . . R-R6
B-BI B-Q3 R-R6
leading shortly t o a clearly won position. 28 WI-Kl(223) 29 K-B2(74) 30 R-K3(135)
R-N6+ R/1-R6 B-R3
Zvonko Vranesic, watching in the audience, felt this move is a mistake. Levy may have been concerned with White's pressure on the King's file
FIG. 17. Position after 23 . . . W I - R l .
MONROE M. NEWBORN
92
and may have skipped 30 . . . B-B5 for that reason. But if after 30 B-B5, White plays 31 R-K7+, then in Piasetski’s opinion, 31 K-B3 32 R/I-K3 R-R8 maintains the advantage.
31 N-K2(68) B X N 32 R/1 x B(65) P-B4
..
33 P-B4!(158)
... ...
.
This move, thought out on Levy’s time, finally takes pressure off White and turns the game around. Now Black faces great problems to even draw.
33.. . R x R 34 R x R(334) R-R5
35 K-N3(233) 36 B-B2(109)
R-R8 R-Q8
According to Piasetski and Spraggett, 36 . . . P-B5 is a better line of defense, offering Black better drawing chances.
37 R-R3!(237)
.
..
Levy is now in deep trouble.
37 38 39 40 41
... R X P+(143) R-Q7!(188) K-N2(337) R x P/5(172)
PXP K-Bl R-Q6+ B-B4 R-Q7
42 43 44 45 46
P-N4(126) R-Q8+(212) R-Q7+(498) R x P/4(77) K-B3!(165)
B X P K-B2 K-Bl R-N7
...
This is necessary to avoid the pin 46 . . . B-B4. CHESS 4.7 is on the road to victory. It is searching to a depth of 10 plies on every move!
46 47 48 49 50 51
... R-Q8+(234) B-R4+!(90) P-N5(156) R-Q7+(294) P x P(136)
B-B4 K-K2 K-B2 P-N3 K-Bl R x P
52 P-B5(283) 53 K-N4(473) 54 K-R5(247) 55 R-QB7(579) 56 P-B6(150)
R-R6+ R-R5+ R-Q5 B-K2 Resigns
However, Levy went on to win the fifth game and the match with a 3i-l; score as well as the $2000 bet. 3. Tree-Searching Techniques (Modifications to the Minimax Algorithm)
The minimax search ulgorithm specifies that, given a position in a twoperson game, a tree of all sequences of moves from that position to positions that are won, lost, or drawn be generated and then, assuming each
RECENT PROGRESS IN COMPUTER CHESS
93
player is attempting to do his best, the algorithm determines the sequence of moves that will optimize each player’s chances. The first move on this sequence is the move found best to make in the given position. The entire sequence is called the principal continuation. This strategy, as it stands, is too simple to use in chess programs although it works quite well in programs intended for more elementary games. In most chess positions, such a tree would be astronomical in size containing a completely unmanageable number of move sequences. In chess, it is impossible to follow all move sequences to positions that are won, lost, or drawn. It is possible only to search several moves ahead and assign the resulting positionscores based on how good they “seem to be.” Thus the initial papers on computer chess suggested using ajinite depth minimax search. All sequences of moves are generated out to some fixed depth, say D, the resulting positions constructed and scored, and then the minimax algorithm used to find the principal continuation. The score of each terminal position is determined by a scoringfunction, a function that examines the material on the board as well as other factors and arrives at a numerical value for the position. Shannon originally suggested a scoring function that considers ( 1 ) material, (2) pawn structure, and (3) mobility. Turing’s is only slightly different. More recent programs use more sophisticated functions, but as we see in Section 4, they are still very crude and reflect the chess knowledge of about class C players (if such an analogy is fair to make). 3.1 The Horizon Effect
Both Shannon and Turing were concerned with problems that would be caused by placing a fixed limit on the maximum depth of search. They suggested a variable depth search, one in which all moves are searched out to some fixed depth and certain ones further yet. Turing argued that unless a position is quiescent, that is, void of certain features (Kings in check, pieces en prise, certain types of threats, etc.) that make its evaluation unclear, further search is necessary along sequences leading beyond this position. All good programs carry out “extended searching” and some are able to search to depths of 15 to 20 plies along highly forced lines of play. A search based on the methods described by Shannon and Turing is vulnerable to the horizon effect. Berliner’s dissertation deals extensively with this subject (Berliner, 1974). Consider, for example, the position in Fig. 18 with White to move. Assume a program is carrying out an exhaustive three-ply search along with extended searching. The program will see in its extended searching that the three-ply sequence 1 B-N3 P-B5 2 anything loses a Bishop for at best a Pawn, but will delude itself into
94
MONROE M. NEWBORN
FIG. IS. Position illustrating the horizon effect. From Berliner (1974).
believing that 1 P-K5 P x P 2 B-N3 loses only a Pawn. The loss of the Bishop through 1 P-KS P x P 2 B-N3 P-B5 anything will not be seen because the first three moves of this sequence lead to a quiescent position and no extended searching will be carried out. The program will continue by playing P-KS, which loses not only the Pawn but fails to solve the problem with the Bishop. It will find this out on the next move. 3.2 Forward Pruning
Forwurdprrrning was introduced in the Bernstein chess program in 1956. Rather than searching all moves at each position in a tree, Bernstein’s program selected only seven. His program looked four plies deep into a tree with a maximum of 7‘ = 2401 terminal nodes. The Kotok-McCarthy Program, MAC HACK VI, and early versions of Slate and Atkin’s program used forward pruning. However, experience has shown that forward pruning near the root of the tree is very risky and all better current programs do very little of it with, perhaps, the exception of CHAOS. 3.3 The Alpha-Beta Algorithm
The alpha4wta algorithm, a modification t o the minimax algorithm is a fundamental part of all current chess programs. A program using the alpha-beta algorithm searches a small subtree of the tree searched by a program without it. The subtree always contains the principal continuation found by the minimax algorithm and further, the alpha-beta algorithm always picks the same one. In Knuth and Moore (19751, the history of the algorithm is reviewed. They give McCarthy credit for discovering it, saying that he discussed it with Bernstein as early as 1956. Newell, Shaw, and Simon used a weaker version of the algorithm in their chess program
RECENT PROGRESS IN COMPUTER CHESS
95
(Newell et al., 1963). Hart and Edwards (1961) published a memorandum describing it but, according to Knuth and Moore, they made no attempt “to indicate why the method worked, much less demonstrate its validity.” Brudno (1963) is credited by Knuth and Moore as being first to present a proof of why the algorithm works. The algorithm is described in a number of books and articles, in particular, Nilsson (1971), Knuth and Moore (1975), and Newborn (1975). The concept of one move refuting another is at the heart of the algorithm. The essence is illustrated by the position in Fig. 19a and the tree in Fig. 19b. Suppose the computer is playing White and White is to move. Suppose further that a four-ply search is being carried out. Figure 19b shows the following. The first move searched by White is 1 P x P, which the computer finds after looking four levels deep wins a Pawn. Later, suppose when searching 1 K-Rl as shown, the first four-ply continuation considered happens to be 1 K-Rl P x P 2 K-R2 P x N. After scoring position P””, the computer sees that 2 . . . P x N refutes 1 K-RI P x P 2 K-R2. That is 1 P x P looks better than 1 K-Rl P x P 2 K-R2 and searching more moves at P”’ will never change this situation. There is no need to consider any more replies in position P”’ because after finding one refutation there is no need to find another. Further, after checking the three other third-ply moves, the computer will realize that 1 K-Rl is refuted by 1 . . . P x P and that no other moves in position P’ need be searched. We see that the refutations reduce the number of continuations that are necessary to examine. Slagle and Dixon (1969) showed that the alpha-beta algorithm has the potential of reducing the number of D level terminal positions (the root is level 0) in a tree with fanout N at every node from N U , required without the alpha-beta algorithm, to as few as: 2 NU/2
-
Nw+i)/z+
D even
1
1
N U - I )-/ ~
D odd
For example, with N = 35 (typical of many chess positions) and D = 4, rather than scoring about 1,000,000 terminal positions, a program may get away with scoring no more than about 2000. In practice, several times this number is usually scored. Being intrigued over the number that is “usually scored,” Fuller et a/. (1973), Knuth and Moore (19751, Newborn (1977a), and Baudet (1978) carried out theoretical analyses of the expected number of terminal nodes scored by an alpha-beta search under various assumptions about the randomness of the terminal node scores. The essence of their results is that even with random terminal node scores, the alpha-beta algorithm cuts down tremendously on the number of terminal nodes scored. The models that were studied led to conservative results. Even better results
96
MONROE M. NEWBORN
P"'
Other moves eventually searched
(b)
FIG.19. (a) Position and (b) tree illustrating the alpha-beta algorithm.
would have been achieved if terminal node scores were assumed to depend on branch scores and each branch score was a number chosen randomly from the set {0,1,2, , . . ,k}. To model chess, the value ofk could be taken as 9. Based on this model, this writer feels that the number of terminal nodes scored would be found to be O(ND/2log N), where N and D are as previously defined. Gillogly (1972) and Griffith (1976) carried out empirical studies of the algorithm's efficiency. Their results are difficult to assess. Gillogly's results (1972) in fact led Fuller ef al. (1973) to an erroneous functional form for the expected number of terminal nodes scored based on a model in which random numbers from a uniform distribution are assigned to terminal node scores. The speedup that results by introducing the alpha-beta algorithm into the minimax algorithm depends on the order in which moves are examined at each position in the tree. If the move that turns out to be a refutation is searched first, the speedup that results is greater than if this move is searched after others have been tried. Thus most chess programs have
RECENT PROGRESS IN COMPUTER CHESS
97
extensive heuristic algorithms that weed out moves that look like they may be refutations: these moves are searched before trying others. 3.4 The Killer Heuristic The killer heuristic was described by Gillogly when detailing TECH (Gillogly, 1972). He observed that if move m, refutes move m,, then there is a good chance that m, might also refute another move, might be a “killer.” He suggested that when considering other positions in which m, can be made, it should be searched first. In Akl and Newborn (1977), it is shown that the benefits from the killer heuristic are closely related to the fanout at each node: For large fanouts, the killer heuristic seems to be more effective in speeding up the search than for small fanouts. Theoretical analyses have not been attempted and await an ambitious mathematician. 3.5 Iterative Deepening Iferarive deepening has been used with surprising success in CHESS 4.6 and DUCHESS. To search D + l plies requires anywhere from 2 to 10 times as much time as to searchD plies. If the best move at ply 1 turns out to be the first one searched, the alpha-beta algorithm generally has more cutoffs and the search goes more quickly than otherwise. The probability that the best move at ply 1 found by a D ply search will turn out to be the best move at ply 1 found by a D + 1 ply search is quite high. Statistically then, it is worthwhile to find the principal continuation for a D ply search and use this continuation as the “seed” for a D + 1 ply search. CHESS 4.6 carries out iteratively deepening searches to a depth of typically 6 plies, searching to greater depths along certain tactical paths. In the endgame, CHESS 4.6 often carries out iteratively deepening searches to depths of 10-12 plies and much deeper along selective paths. Programs vary in the amount of the tree saved at each iteration. DUCHESS saves the best move at the first level and the best reply at the second level to each first level move. Good (1968) discussed “progressive deepening” in his “five-year plan,” but he did not have exhaustive searches in mind. He argued that good players carry out progressively deepening searches aiong highly selective lines of play.
3.6 Alpha-Beta Window An alpha-beta window is used in several programs. The alpha-beta search can be speeded up by having a program begin the search assuming that it will not find a win or loss of more than, say, a Pawn in the given
98
MONROE M. NEWBORN
position. Such a search is said to have a one-Pawn window. If at some time during the search a loss of more than a Pawn is observed, the search is started anew with a wider window. CHAOS uses an alpha-beta window. Each time the program finishes one iteration of its iteratively deepening search, the window's width is adjusted for the next iteration. 3.7 Transposition Tables
Transposition rubles are used by a number of programs including MAC HACK VI, CHESS 4.6, DUCHESS, and CHAOS (Greenblattetal., 1967; Slate and Atkin, 1977). Positions and their scores are stored as the search progresses. When a new position is reached in the tree, it is checked to see whether it has been searched previously, whether a transposition of moves has lead to the same position. If so, a score can be given to the position by simply assigning it the score stored in the transposition table. However because of the nature of the alpha-beta search, sometimes this score is, in fact, only a bound on its true score. Sometimes, though, this is enough. Programs that use transposition tables require computers with large memories. CHESS 4.6 has an input parameter for deciding how large the table should be. Mittman claims CHESS 4.6 speeds up by a factor of about 50% when using maximum table size. Cahlander indicates factors of more than 14:1 have been observed. Truscott points out that improvement increases as search depth increases and also as fewer and fewer pieces remain on the board. DUCHESS is programmed to try to make moves that lead to positions that are in the transposition table and thus increase its benefits. Positions are stored in a hash table (Knuth, 1973) and are quickly inserted and retrieved. DUCHESS encodes each position into 22 bytes using a Huffman code (Knuth, 1968). 3.8 The Method of Analogies
Adelson-Velskiy et al. (1975) have developed what they call the method of unalogies to aid KAISSAs search. An attempt is made to see whether a newly reached position is analogous to, that is, has certain of the same characteristics as a position reached earlier. If so, it is not necessary to consider successors of this position; it is assigned the score given to the earlier found analogous position. In some sense this is a generalization of the technique of using transpositions tables. 3.9 Differential Updating of Chess Information
Diferential updating of chess information (Scott, 1969) is used to varying degrees in all programs. At each terminal node, a program can add up the
RECENT PROGRESS IN COMPUTER CHESS
99
material value of the pieces on the board to determine the material difference in that position. Alternatively, the program can calculate the material at the root of the tree and differentially update this value as the search progresses. If there is no change in material when going from one position to another, then there is nothing for the program to do. In general, differential updating of chess information results in a significant speedup in a program. Factors other than material can be differentially updated as well. Scott says that each move affects the moves of about three or four other pieces, rather than the 20 or 30 usually on the board. A program can simply update the information related to these few pieces thus saving considerable time. 4. Chess-Specific Information in Chess Programs
Most chess programs understand very little about the game that they are programmed to play! Slate would agree that CHESS 4.6 knows explicitly no more than a class C human player. Good programs know the rules, how much time they have to make their moves, the strength of their opponents, and whether or not to play for a draw in a given position. They also know that mate in one is better than mate in two. Arbitrary values are assigned to the pieces (K = Q = 9, R = 5 , B = N = 3, and P = I), although some programs vary these numbers during the course of a game. Many programs incorrectly exchange a Rook and Pawn for a Knight and Bishop because they believe they are making an even exchange. They accept sacrifices when they do not find within the search horizon that the opponent can recoup his losses. They refuse to make sacrifices unless the ultimate gains become clear before the search horizon is reached. They understand when to and when not to trade pieces. When behind, they avoid exchanging non-Pawns. The concept of King safety appears in most programs. The King is encouraged to castle and to stay on the side of the board until material drops to some arbitrary level: then the King is encouraged to move to the middle of the board. Pawn structure is also understood. Programs want to create and push passed Pawns, sometimes though not hard enough (as DUCHESS versus CHESS 4.6 shows on page 88) and they avoid isolated and doubled Pawns. They understand the concept of mobility (they can count the number of moves in a given position), but tactical concepts such as forks and pins are usually discovered by searching. But this is about all, except for books and some special endgame algorithms. Opening libraries, usually simply called books, are growing in size. BELLE has the largest, about 100,000 positions, CHESS 4.6, CHAOS, DUCHESS, KAISSA, and several others have books ranging in size from QJ,
100
MONROE M. NEWBORN
3000 to 30,000 positions. Positions, rather than trees of move sequences, are usually stored and thus it is possible for a program to leave its book and return later in the game. Book moves are made in less than a second giving the program more time for middle game moves. Before long programs will use optical scanners to generate their books and then the source will have to be debugged very carefully! 5. Endgame Play
Until recently, endgame play by programs was thought to be their greatest weakness. While the rest of their game was at best class C, their endgame play was atrocious. In the match between the KotokMcCarthy Program and the ITEP Program in 1966-1967, both sides agreed games would end at move 40 in order to avoid embrassing play. In the 1970 ACM Tournament, COKO(White) and J.BIIT(Black) waltzed to a draw in a game in which absolute human novices would have realized how to proceed. After move 81 they reached the position shown in Fig. 20. The game ended in a draw with: 82 K-K3 K-B4,83 K-B3 K-K4,84 K-K3 K-Q4,85 K-Q3 K-K4,86 K-K3 K-Q4,87 K-Q3 K-K4. Drawn by repetition! A seven-ply search would have shown COKO that its King can capture a Pawn if nothing more. COKO, however, was spending barely two or three seconds on each turn looking for moves with immediate tactical implications and none existed. COKO was simply not programmed to search deeper when time and material on the board permitted. In endgame play with so few pieces on the board, the better programs of today perform exhaustive searches to depths of 10-12 plies, and there is no doubt that they would win this game given COKO’s position.
FIG.20. Position reached by COKO (White) vs. J. BIIT (Black) at the 1970 United States Computer Chess Championship.
RECENT PROGRESS IN COMPUTER CHESS
101
The power of exhaustive search along with some very simple heuristics is enough to permit chess programs to play much more respectable endgames than was believed possible. Heuristics to encourage the King to move toward the center of the board, toward passed Pawns, or go into opposition, and heuristics to encourage certain Pawn structures go much further than was imagined. In the endgame, transposition tables become much more effective than in the middle game because a higher percentage of positions transpose. This allows CHESS 4.7 and DUCHESS to search considerably deeper than would otherwise be possible. Two examples of endgame play follow. The first is a victory by CHESS 4.6 over Warren Stenberg (USCF 1969), a professor of Mathematics at the University of Minnesota. CHESS 4.6 had a clear advantage entering the endgame, was helped by a blunder on White’s 26th move and by White being in serious time trouble. But Stenberg, the first class A player ever to lose to a computer in tournament play, could have recovered if CHESS 4.6 had played as weakly as computers were renowned for. The game was adjourned after move 45 and Stenberg went home and straight to sleep, knowing that he had no real chance. Cahlander provided this writer with CHESS 4.6’s printout and what follows is a condensed presentation of the important data. With each Black move by CHESS 4.6, there are indicated the time (in seconds), the search depth of the last iteration, the number of terminal nodes evaluated, and the continuation expected. It is a lot of data, but it is important to record for posterity. Underlined moves in the continuations correspond to moves actually played. Note that 17 out of 32 moves by its opponent were predicted by CHESS 4.6. On move 49, the program predicted its opponent’s reply, its own move, and the second reply by the opponent. The position shown in Fig. 21 was reached after Black’s move 24. CHESS 4.6 begins the endgame with a slight advantage and has greater
FIG.21. Stenberg (White) vs. CHESS 4.6 (Black) after Black’s move 24.
MONROE M. NEWBORN
102
freedom for its King and Rook throughout. It keeps White's King on the side of the board and is eventually helped to a premature victory when White errs on move 57. The game, from move 25 on, follows: White Stenberg
Black CHESS 4.6
25 P-K4
R-QBI
Time Search Terminal (sec) depth nodes 58
7
183,842
Continuation expected 26 P-B4 28 P-QN3
R-B5 R x P
27 N-B2 29 N X R
P-B4 P XN
CHESS 4.6 seems willing to trade a Rook for a Knight and Pawn and the positional advantages of an isolated White Queen's Pawn and a passed Pawn of its own. But this gives White the edge. White Stenberg
Black CHESS 4.6
Time (set)
Search Terminal depth nodes
26 N-K3
P-B4
197
8
669,233
27 P-QN3
PXP
97
7
342,981
83
7
267,663
P-B4
1 I7
7
375,313
30 P-Q6
P-K3
I67
8
567,651
31 N-N6+
K-B3
90
8
307,245
32 N-B4
R-QI
130
7
415,153
33 N-R5+
K-Q2
297
9
989,004
34 N-B4
B-B6
71
7
232,663
35 K-BI
R-KNI
69
7
222,322
36 N-Q2
P-RS
83
7
260,734
37 N-B4
P-B5
21 I
8
746,997
38 N-Q2
B x N
%
8
363,181
39RxB
P-B6
100
8
350,800
4OPxP
PXP
112
9
448.606
28 N-B4
R- KN 1
29 R-QI
Continuation expected 27NxP 29 N-K3 28 30 P-Q6 29 R-QI 31 K-R2 30 p-Q6 32 R-R2 31 R-Q2 33 R-Q6 32 N-B4 34 R-R2 33 R-Q2 35 R-R5 34 N-B4 36 K-N2 35 P-N3 37 R-QBI 36 P-N3 38 N-N6 37 N-B4 39 N x PI4 38 N-Q2 4oPxP 39 RxB 41PxP 40 Pxp 42 R-Q4 41 R-Q3 43 R x PIR 45 P-R4
B x P R-K7 R-KNI P-R5 B-B6 K-K3 P-K3 K-N4 K-B3 P-R4 K-B4 R-QRI K-B4 K-Q5 B-B6 K-K3 P- K4 R-KNI B-B3 K-B3 P-B5 P X P B x N P x P P-R4 P x P P XP P-R4 R-KBI K x P
28 30 29 31 30
R-Nl N-QB4
R-B7
R-Q1
B-B6 K x P P- K4
PX P P-Q6
31 R-Q2
K-B3
32 34 33 35 34
K-B2
P-Q7 K-BI R-Q2 R-B2 R-R2
B-B3 R-QRI
35 P-N3 37 R-QBI 36 K-N2
P-K4 R-KNI K-K3
37 K-K2
P-R5
38 N-Q2 40K-NI 3 9RxB 41 K-KI 40 R-Q4 42K-KI 41 R-Q3 43 K - K l 42 R-Q4 44R-Rl
P- B6 P- B6 P-B6 R-KBI P-R4 K-Q4
RECENT PROGRESS IN COMPUTER CHESS White Stenberg
Black CHESS Time Search Terminal 4.6 (sec) depth nodes
41 R-Q3
R-KBI
42 K-KI
R-B5
260
10
I,002,080
96
9
372.7%
43 K-Q2
R- 8 4
213
10
753,462
44 K-B2
P-B4+
264
8
923,960
45 K-N2 46 K-B2
R-KN4 R-KB4
133 175
9 9
464,549 613,76 I
47 R-Q4
R-B4+
I35
9
520,486
48 K-N2
P- R4
175
9
650,408
49 R-Q3
R-K4
185
9
656,255
50RxP
R-K7+
83
9
302,371
5 1 K-BI
K x P
51
8
184,078
52 K-QI
R-R7
170
10
647,082
53 R-Q3+ 54 K-KI
K-B4 P- K4
(data not available) 95 9 340,505
55 P-B4
PXP
99
9
395,263
56 R-KB3
K-Q5
II
8
48.988
57 R x P +
K-K6
44
10
197.344
103
Continuation expected 42 K - K I 44RxR 46KxP 43 K-Q2 45RxR 44 K-K3 46RxP 48 K-K4 45 K-N2 47 K-BI 46RxP 47 K-Q2 49RxR 48 K-QI 50 K-Q2 52KxP 49 R-Q3 51 K-Q2 50
RxP
52 R-Q3+ 51 K-BI 53 R-B7 52 R-Q3 54PxR 53 R-Q3+ 55KxR 57 K-K2 55 R-Q2
57 K-Q2 56 K-QI 58 K - Q I 60RxP 57 R-B2 59 K-K2 58 K-QI 60 K - N I 62 K-N2
R-B4 P x R
43 K-Q2 45 K-K3
R-Q4 K x P
R-B4 P x R R-B3 R x R
44 K-K3 46KxP 45 K-K4 47KxR
R-Q4 K x P R-92 K x P
R-K4 K x P R-N7 P-R4 P x R R-Q4 K x P
46RxP 48 R-B7 47 K-B2 48 K-K3 50KxP 49RxR 51 K-K3
R-K7 P- K4 K x F R-Q4 K x P P x R K-K4
R-B4 K-B3 R-K7+ K-K4 K x P R - R7" K-K4 K-K5 K-K4 K-Q5
50 K-B2 52 K-K3 51 K-BI 53 R-Q2 52 R-B8
R- B2 P- K4 K x P R-K5 K-K4
53 R-K3+ 55 K-Q2 54 R-Q2 56 P-B3
R x R P- K4 R x R P- K4
56 58 57 59
K-Q5 R-R8 R-KR7 K-Q4
R-R6 P-R5 R-KB7 K-B3 R x P R x R P- B6 K x R R-K6
R-N2 K-B2 K-KI R-KB3
58KxR 60 K-B2 59 K-BI 61 K-R2
K-K5 R-K7 R x PIR
58 Resigns An abbreviated continuation was printed out because the terminal position was found in the transpositions table.
104
MONROE M. NEWBORN
DUCHESS displayed strong endgame play when drawing with CHESS 4.6 at ACM’s Seattle tournament. This game gave DUCHESS a first-place tie with CHESS 4.6 and for most of their hardfought battle, it looked as though DUCHESS would upset the recently crowned world champion. After Black’s 32nd move, the game reached the position shown in Fig. 14 and repeated in Fig. 22. Leo Stefurak (2102 USCF) of the Seattle Chess Club, while observing the game, felt that 33 B-B4 R x P 34 R X R B x R 35 B x P gives DUCHESS (White) a winning advantage. In reply to 33 . . . B x P, he felt that 34 B x P gives DUCHESS a larger advantage. The endgame went as follows: WHITE: DUCHESS 33BxP R x P 34RxR B x R
BLACK: CHESS 4.6 35 K-K3
K-BI
White has a small advantage at this point. His King is slightly stronger and Black’s Pawn on Q3 is weak. 36B-Q4 B x B + 38 K-Q5 K-Q2 3 7 K x B K-KI White still has a small advantage at this point but probably not enough to win. White might keep his chances alive with 39 P-B4, but instead he makes a move that forces a draw. 39 40 41 42 43
P-N6 P-B4 P-R3 P-R4 P-N4 44 P-R5 45PxP
P-N3 K-K2 P-B3 K-Q2 P-R3 P XP K-K2
46 47 48 49 50 51 52
K-Q4 K-B4 P-B5 K-N4 K-N5 K-R4 K-N4
K-K3 K-K2 K-QI K-Q2 K-KI K-K2 K-QI
FIG.22. Position after Black’s 32nd move in DUCHESS (White) vs. CHESS 4.6 (Black) at Seattle.
RECENT PROGRESS IN COMPUTER CHESS
105
FIG.23. Endgame position from Fine: Problem #70: White to move and win.
53 K-B4 54 K-Q4 55 K-Q5
K-Kl K-Q2 K-K2
56 K-Q4 K-K1 57 K-B4 K-B2 Adjudicated a draw
Both sides were carrying out exhaustive 14-18 plies searches when the game ended. Three years ago, this writer studied the performance of a simple King and Pawns versus King and Pawns endgame program (Newborn, 1977b) on a set of positions from Fine (1941). The program was designed to carry out an exhaustive search to some fixed depth. One of the problems, shown in Fig. 23, was far beyond its capabilities. It was estimated that the program would require 25,000 hr of IBM 370/168 time to search deeply enough to find (and understand) the correct solution. Several months ago Cahlander asked CHESS 4.6 to try the problem. CHESS 4.6 carried out an iteratively deepening search to a depth of 26 plies and found the correct principal continuation! At 25 plies the continuation was wrong. How was CHESS 4.6 able to search so deeply? The answer-transposition tables! In this position a deep search runs across a large number of transpositions. CHESS 4.6 searched to a depth of 26 plies in 632 sec and examined 3,294,754 terminal positions! A number of special endgame programs have been developed during the last ten years. Barbara Huberman, a student of John McCarthy, considered KRK,' KBBK, and KBNK endgames (Huberman, 1968). For several types of endgames and based on certain features on the board, positions were divided up into several categories. The categories were then ordered from best to worst (for the winning side). Her strategy involved a breadth first search during which the program tried to find a move that lead to a better position. As soon as one was found, the program was satisfied and search terminated. This resulted in winning, but not necessarily optimal KRK = King and Rook versus King, KBBK
=
King and two Bishops vs. King.
106
MONROE M. NEWBORN
play. Endgames with only Pawns on the board were studied by Tan (19721, Piasetski (1976), Michalski and Negri (1976), Clark (1977), and even Bellman (1965). The most exciting work on special endgames, however, is the (unpublished) work of Ken Thompson. He has developed a large database for the KQKR endgame that plays best chess-wins as quickly and loses as slowly as possible. With each of about 5,000,000 positions, a number n is stored where n is the number of moves for White to move and win. When playing in a game, the program carries out either a one-ply or a two-ply search, depending upon which color it is playing (White must be on the move in terminal positions), and searches the database to assign scores to the terminal positions. The database was created by having the program first generate the set of positions with mate on the board. Then, working backward, the computer figured out the number of moves n necessary to reach the nearest of these positions. The program gave a simultaneous exhibition against six very strong players at the 1977 United States Open Chess Championship in Columbus, Ohio. Playing with the Rook, the program managed to draw four games against, among others, Eugene Meyer (USCF 2350) and John Peters (USCF 2400), while losing to John Meyer, a Life Master in the USCF and to a small horde of players who, having been defeated earlier, banded together on the sixth board. The humans were given about two minutes to make each move. Thompson has also built large databases for KPKP, KPPK, KRKN, and KRKB. His attempt to use the same approach on KRPKR came to a temporary halt a while ago when he realized that for perfect play, some positions required “Knighting” the Pawn when it reached the eight rank. Levy ( 1976) reports that a program developed by Arlazarov and Futer handles such endgames perfectly. Their work will appear in the next volume of Machine Intelligence (Arlazarov and Futer, 1979). Michie has been interested in endgame programs that ‘select moves by using advice stored in an advice table (Michie, 1976). By using 12 pieces of advice and a four-ply exhaustive search, Michie and Bratko (1978) designed a program to play KRKN endgames that allowed “drawn positions to be indefinitely defended (until terminated by the 50-move rule).” Michie’s technique is an excellent one for endgame play but may be very difficult to extend to middle game play. 6. Speed Chess
A pleasant surprise in the last two years has been the strong performances given by the better programs in “speed chess,” or “blitz chess”
RECENT PROGRESS IN COMPUTER CHESS
107
as it is sometimes called. A speed chess game is one in which each side normally is given a total of 5 min in which to make all his moves. Most chess games last between 40 and 60 moves and thus it is necessary to set a pace somewhere around 5 sec per move in order not to lose on time. This is not enough time to play one’s best but even at 5 sec per move, Masters and Grandmasters still play pretty good chess. Masters can compete on equal footing with class A and class B players in speed chess when given only 1 or 2 min versus 5 , 6 , or even 7 min for their opponents. Results from simultaneous exhibitions, where a Master plays 50 A, B, C, and D class players, along with a few beginners, indicate that even a 40-1 time differential still allows the Master to play better. When Slate and Atkin eliminated forward pruning and incorporated iterative deepening and transition tables, and when they obtained the services of the CDC CYBER 170 series computers, they found somewhat to their amazement that their program was able to compete in speed chess against Masters. Levy estimated in 1977 that CHESS 4.6 was playing speed chess at around the 2300 USCF level (Levy, 1977a). When two humans play speed chess, they each waste a few seconds in “overhead” during the course of a game. This is the time required to physically make moves on the board and stop the clock once a move has been decided upon. When a computer plays, it is necessary for an operator to provide assistance. He must observe the opponent’s moves, type them into the computer, and make the computer’s replies on the board. This “overhead” can amount to 2-4 sec per move. So the rules when playing Slate and Atkin’s program in speed chess are as follows. The human is given 5 min to make all his moves as usual. The computer has an internal clock that paces it to play at a rate of 5 sec per move. If the game lasts 60 moves, CHESS 4.5 loses unless on move 60 it announces that it sees mate in the next few moves. When playing speed chess, Slate and Atkin’s program searches trees having about 10,000-20,000 nodes on the CYBER 176. It carries out exhaustive searches to a depth of 4 or 5 plies in most middle game positions and deeper in the endgame. After making a move, CHESS 4.5 begins calculation of its next move assuming the opponent will respond with the move on the principal continuation. Thus the longer the opponent thinks, the longer the computer thinks in turn! This effect is quite dramatic in speed chess where the human works very hard to find a move and quite often the computer replies instantly. CHESS 4.5 and later versions of the program have established an impressive list of speed chess victories-against GM Michael Stean and David Levy of England, Hans Berliner, Lawrence Day, and Zvonko Vranesic of Canada, and GM Robert Hubner of West Germany. A game played against Berliner at Carnegie Mellon in March of 1977 follows. It
MONROE M. NEWBORN
108
should be pointed out that Berliner evened his speed chess score against the program by defeating it in a blitz game several months later at the Second World Computer Chess Championship. Berliner has considerable experience playing computers and one wonders to what extent this improves his chances. The game was a highly tactical battle and, although ahead a Bishop to a Pawn by the 16th move, Berliner finds himself outmaneuvered and lost by move 20. WHITE: CHESS 4.6
I P-KB4 2N-QB3 3 P-K4 4N-B3
P-KN3 B-N2 P-QB4 N-QB3
BLACK: Berliner 5 6 7 8
B-B4 0-0 P-KS N-K4
P-K3 N/l-K2 P-Q3
...
CHESS 4.6 does not see that this gives up a Pawn. It may have seen 8 N-K4 P x P 9 P x P N x P 10 B-N5+ N/K2-B3 I I N x P but not the continuation I I . . . Q-N3 12 B x N + N x B 13 P-Q4 B x P+ 14NxBQxN. 8 9 10 11
. . . P x P B-N5+ P-Q3
PXP N x P N/2-B3 0-0
12NxN N x N 13 B-K3 P-QR3 14 B-R4 P-QN4 15 B x P/N? . . .
White unnecessarily gives up a Bishop for two Pawns and an isolated Black Pawn. This evidently satisfiec CHESS 4.6. The Bishop can be saved by 15 B-N3 P-B5 16 P x P P x P 17 B-R4. P x B 15 . . . 1 6 B x P R-Kl
17 P-Q4 18 P-B3
N-B3 P-K4
Berliner wanted to gain space with 18 . . . P-K4 and evidently underestimated the strength of CHESS 4.6's next move. 19 N-Q6
...
(See Fig. 24.) From here on Berliner is in trouble; he resigns on move 28. 19
...
PxP
20 Q-B3!
N-K4
Berliner didn't see that this move hangs the Rook. The correct move was simply 2 0 . . . B-Q2. 21 Q X R 22NxR 23 P-QR4 24PxP
P-Q6 Q x N P-R4 N-N5
25 R-R7 26QxB+ 27 Q-B7 28 Q x P
Q-K4 K-R2 Q-K7 Resigns
RECENT PROGRESS IN COMPUTER CHESS
FIG.24.
109
Position after 19 N-Q6.
After a few games, strong players seem to pick up some of the program’s weaknesses and adapt to its style, generally doing much better than at first. But I doubt very much whether a player rated below 2000 USCF can adapt sufficiently to the style of CHESS 4.6 to become the better player. Tom Truscott has indicated that strong chess players adapt to DUCHESS after a number of games. DUCHESS seems to be playing speed chess at around the 1950 USCF level and has defeated several Masters. It is paradoxical that computers are playing relatively better speed chess than “slow” chess. Church and Church (1977) argue that in speed chess the level of play “deteriorates considerably.” Evidently, the play by Masters deteriorates perhaps more than we imagined, the only possible way to explain the rating differential between speed chess and slow chess. In a sense this seems to add support to the work of Simon and Chase (1973). They contend that a Master has stored in his memory some 50,000 chess patterns and has the ability to recognize any of these patterns or combinations of them and make the appropriate response quite quickly. In Newborn (1978), this writer argues that therefore, “just as two writers have much the same vocabulary, especially if the two are in close communication, two chess players store a large number of patterns with which they are both familiar. They often anticipate each other’s ideas. How many times has a chess player made a bad move, leaving a piece en prbe for example, only to have his opponent not notice the error. Computers are not programmed to think in terms of patterns on the board in the way people are. Thus when playing computers at speed chess, humans are often thrown off stride by the unusual patterns that arise on the board. The human is forced to revise his thinking between successive moves more frequently than when playing another human. This factor, I believe, accounts for the somewhat weaker play of humans vis-a-vis computers in speed chess.”
110
MONROE M. NEWBORN
Levy (1977a) argues the difference is because computers play relatively better tactical chess than positional chess, that they do not leave pieces en prise and they never miss one-move combinations, etc., and that in speed chess, tactics play a more important role than in slow chess. 7. The Microcomputer Revolution
For well under $10o0, it is now possible to have a computer in your own home, a computer as powerful in many respects as any in existence around I%O. It can be plugged in the wall, it makes no noise, and it weighs about 30 kg. Microcomputers, as they are called, are not as fast or as able to store as many numbers or as easy to work with as are most current minicomputers or full size computers, but the situation is changing fast. Even with their present capabilities, microcomputers can be programmed to play chess and play respectably indeed. There is a growing number of enthusiasts and programmers busy in their homes, in basements and studies, writing their own programs. Others, who have bought small computers and who are not interested in programming, have obtained one of several commercially available chess programs. Still others have purchased one of the special chess playing machines that are now available for under $500 and which are built around various microprocessors. The more notable of these are CHESS CHALLENGER, BORIS, and COMPU-CHESS. Several things distinguish microcomputers from larger computers when it comes to their ability to play chess (or to carry out any large numerical computation): (1) Most microcomputers use an 8-bit word size thus frequently making addressing problems more dficult and making handling large numbers (greater than 22’ = 2 128) more cumbersome. A gain of perhaps 50% in speed may be achieved by using a 16-bit word size instead of an 8-bit word size as is done by most minicomputers. Still further speedup results by using a 32-bit or larger word size as is the case for IBM 370 series computers and CDC CYBER 170 series computers, respectively. (2) Memory size is small, typically 4K-8K &bit words, as opposed to anywhere from a minimum of 128K 32-bit or 64-bit words in larger computers. Programs running on large computers such as CHESS 4.7, DUCHESS, and CHAOS store several hundred thousand bytes of information as the tree search progresses and this cannot be done using a small memory. Transposition tables are definitely out until bigger word sizes and bigger memory sizes are available.
RECENT PROGRESS IN COMPUTER CHESS
111
(3) Languages and compilers are not as sophisticated as they are on larger systems. Most chess programs for microcomputers are written in either assembly language or BASIC. Compilers for BASIC often yield code that executes very inefficiently. FORTRAN, the most popular high level language for chess programs, is becoming available and should improve the situation. (4) Operating systems are also not in the same league with those available on larger systems. They are slower and much less flexible. Instead of using high-speed disks for program development, small computers use diskettes or inexpensive magnetic tape recording equipment. High-speed line printers are only available for large systems. When developing a chess program, program listings and printouts of parts of the tree are often required. These tasks take considerable time on small systems.
Their major asset is that: ( 5 ) They are cheap to use, essentially free once they have been obtained. Developing chess programs takes vast amounts of computing time and in the past this has slowed the progress of many good programs. The development of CHAOS, for example, has been handicapped because of this.
1978 marked the first year that a microcomputer participated in an ACM tournament. The program, written by Arnold Epstein and called 8080 CHESS, ran on a Processor Technology system. This system uses an Intel 8080 as its central processing unit. The program was in over its head, outclassed by programs on larger computers. In a few years, however, the situation will be very different. With falling memory prices and faster hardware, there will be many good programs running on microprocessors and on all kinds of special purpose contraptions as well, and they will play very strong chess. About six months after Seattle, on March 3-5, 1978, San Jose became the site of the first microcomputer tournament. Eleven microcomputers battled for five rounds during the West Coast Computer Fair. Sargon, the program of Dan and Kathy Spracklen, won with a 5-0 record. Their program ran on a Jupiter I1 Wave Mate, a 2 MHz 2-80 based system with 8K of memory. Of the special purpose systems, CHESS CHALLENGER and BORIS each finished with a 3-2 record. Unlike the ACM tournaments, all 11 computers appeared in person for the three-day performance (First Microcomputer Chess Tournament, 1978). CHESS CHALLENGER was improved in the last few months. Its most current version played OSTRICH at McGill University on July 30, 1978 and an interesting game ensued. OSTRICH'S positional play was the
112
MONROE M. NEWBORN
major difference; neither side made a serious tactical error. OSTRICH has a provisional Quebec Chess Federation rating of just over 1500 (QCF ratings are slightly less than USCF ratings) and based on its game with CHESS CHALLENGER, the latter deserves a rating around the 1200 level. This is consistent with CHESS CHALLENGER’S performance in the C category of the 1978 Quebec Open where it won 7 of 9 points. Its game with OSTRICH went as follows:
WHITE: OSTRICH P-K4(B) N-KB3(B) B-NS+(B) N-B3(53) 5 Q-K2(60) 6 0-O(110) 7 P-Q3(140) 8 B x B+(79) 9 Q-Ql(141) 10 B-N5(106) 11 P-KR3( 160) 12 P-R3(146) 13 B x N(121) 1 2 3 4
BLACK: CHESS CHALLENGER
P- QB4(32) P -Q3(30) B -Q2(77) N-KB3(181) N-B3(182) P-K4(657) N-Q5( 199) Q x B(75) B -K2( 153) P-QR4(640) R-KNl(547) P - R3( 176) P x B(152)
14 15 16 17 18 19 20 21 22 23 24 25 26
P-R5(275) K-Rl(148) B -Q 1(261) N-QS(223) N-N6( 167) P-B3(231) R-QBl(392) R-R2(125) N-Q2!( 180) N x N(174) R-N3( 174) Q x N(92) P/B x P(160) P-Q4(141) Q -N4(80) P x P(151) Q -B5(238) R-Ql(342) R/2-R1(321) R -B 3(333) Q x P(273) N-K3(373) Q x Q(233) P x Q(131) R x P(112) . . .
Up to this point, CHESS CHALLENGER has been holding ground. The position is shown in Fig. 25.
26.. . 27 Wl-Ql(179) 28 N-B5(277) 29 R/4-Q2(151)
P-N4( 123) B-K2(146) R-B7(190) R x R(90)
30 31 32 33
R X R(98) R-QS(72) R x P/5(100) R-N8+( 191)
F I G . 25. Position after 26 R x P.
P-R4(107) R-Nl(161) R-N3(126) B-Ql(63)
RECENT PROGRESS IN COMPUTER CHESS
34 35 36 37 38 39 40 41
N x P+(146) N-B5+( 178) R-N4(153) R x P(243) P- QN3(328) P-B4(134) P-N3(125) P-QN4(86)
K -K2(47) K -Q2(77) B-B2(92) R-N 1( 110) B -N3( 125) B -B2(2 1 5) R-Nl(83) R-Kl(84)
42 43 44 45 46 47 48 49
P-N5(63) R-Q4+(37) N-N7+(115) R-B4(431) N-B5+( 103) R-B6(66) P-QR4(54) P x P(42)
113
R-QN l(93) K - K3(47) K - K2(73) B-Q3(90) K-K3(246) R-Ql( 197) P-R5(54) Resigns
8. Final Observations and the Future
Chess programs have increased remarkably in strength and are presently playing at levels unimagined only a few years ago. In 1970, the best programs played 1400 level chess; by 1974, 1600 level; now in 1978, over 2000. The increased strength can be attributed to a maturing of techniques and faster computers. Before long, the weakest programs participating in the ACM tournaments will be playing at the Expert level. With advances in both hardware and software continuing at the same rates as they have during the last ten years, it is highly probable that programs will be playing Master level chess by 1984, Grandmaster level chess by 1988, and better than any human by 1992. (These are conservative estimates!) This speculation is based on the data shown in Fig. 26. Improvement will continue to depend on the process of modifying existing programs, observing how they play, and then making further modifications. Stronger and stronger human players will be required to be involved in this feedback loop as deficiencies become more subtle. Tree-searching techniques and data structures will continue to improve. Rather than giving computers large amounts of chess-specific information, programs will continue to discover this information by searching. Strong programs based on extensive move pruning are still a long way off. Throughout history, man has regarded himself as a special creature. Once he believed his planet and then his solar system were at the center of the universe, only to discover later that this is not the case. From earliest times, he saw himself as being in a different class than the other inhabitants of the earth. Then one hundred years ago, Darwin showed him that he has many relatives. And he always imagined that his intellectual powers are supreme, perhaps uniquely God-given. This belief is being challenged in the twentieth century. With computers competing on an equal footing against all but the best of human players, one more piece of evidence to the contrary has been found. Man has always been slow in accepting new perspectives of his place in
MONROE M. NEWBORN
114 L."d
,
01 PI."
lUSCF ralingl ,2800
Grandmas1.r
Msslcr
I
2200-. Expert 2000..
At
1800..
I
i
i
i
i
I
i
i
&
s
c
Three m mmule move x 104
FIG. 26. Relationship between playing strength of chess programs and the number of nodes scored per move.
the universe. Old theories have been defended in the light of overwhelming scientific evidence to the contrary. Some will now defend the supremacy of man's intellect by claiming that chess, once thought to be a legitimate test, is, in retrospect, merely a problem of calculation. They will argue that although computers play good chess, they will never write great poetry or compose outstanding music until they are programmed to reason as man does. In time, however, computers may accomplish these feats as well using techniques quite different than man's. What computer chess has taught us is that the effort required will be tremendous. ACKNOWLEDGMENTS This writer would like to extend a special thanks to Dave Cahlander for the information he provided about the activities of CHESS 4.7. Also to be thanked is Ilan Vardi, formerly a student at McGill and now a graduate student at MIT. Vardi, one of Montreal's top chess players, assisted with the analysis of several games. Finally, the author would like to thank Mrs. Maura Crilly and Mrs. Diane Chan for their gracious help in preparing the manuscript.
REFERENC ES Adelson-Velskiy, G. M.,Arlazarov. V. L., and Donskoy, M. V. (1975). Some methods of controlling the tree search in chess programs. Artif. h i e / / . 6, 361-371.
RECENT PROGRESS IN COMPUTER CHESS
115
Akl, S., and Newborn, M. M. (1977). The principal continuation and the killer heuristic. Proc. Annu. Conf. Assoc. Comput. Much.. pp. 466473. Arlazarov, V. L., and Futer, A. V. (1979). Computer analysis of a Rook end-game. 1n “Machine Intelligence 9”. (J. E. Hayes, D. Michie, and L. 1. Mikulich, eds.). Chichester: Ellis Horwood, and New York: John Wiley. Bailey, D. (1977). The Eighth North American Computer Chess Championship, October 15-17, in Seattle. Northwest Chess, 8-10. Bakker, 1. (1976). “Eurbpean Computer Chess Championship Booklet.” Tournament Committee, KNSB Office, Passeerdersgracht 32, Amsterdam-C, The Netherlands. Baudet, G. M. (1978). On the branching factor of the alpha-beta pruning algorithm. Artif. Intell. 9, 177-199. Bellman, R. (1%5). On the application of dynamic programming to the determination of optimal play in chess and checkers. Proc. Narl. Acad. Sci. 53. Benko, P. (1978). The “Amateur” World Champion: An interview with Max Euwe. Chess Life Rev. 33, 410-413. Berliner, H. (1974). Chess as problem solving: The development of a tactics analyzer. Ph.D. Dissertation, Computer Science Dep., Carnegie-Mellon University, Pittsburgh. Berliner, H. ( 1976). Outstanding performances by CHESS 4.5 against human competition. Sigarr Newslett. (60), 12-13. Berliner, H. (1977a). Two games from the Minnesota Open. Sigrrrr Newslett. (62), 9-10. Berliner, H. (1977b). CHESS 4.5 vs. Levy. Sigart Newslerr. (62), I I . Berliner, H. (1978). A chronology of computer chess and its literature. Arrif. Intell. 10, 201-2 14. Bernstein, A., and Roberts, M. De V. (1958). Computer vs. chess player. Sci. A m . 198, 96-105. Bernstein, A., Roberts, M. De V., Arbuckle, T., and Belsky, M. S. (1958). A chess playing program for the IBM 704. Proc. West. Joint Cornput. Conf. 13, 157-159. Botvinnik, M. M.(1970). “Computers, Chess, and Long Range Planning.” Springer-Verlag, Berlin and New York. Botvinnik, M. M. (1975). Will computers get self-respect? Sov. Sport June 15. Brudno, A. L. (l%3). Bounds and valuations for shortening the scanning of variations. Probl. Kihem. 10, 141-150. Byrne, R. (1978). Fischer vs. the computer. The New York Times July 30, p. 30. Cahlander D. (l977a). The Computer is a fish, or is it? Sigarr Newslett. (62). 8-9. Cahlander, D. (1977b). Simultaneous play of CHESS 4.5 (unpublished). Church, R. M., and Church, K. W. (1977). Plans, goals and search strategies for the selection of moves in chess. In “Chess Skill in Man and Machine” (P. Frey, ed.), pp. 131-156. Springer-Verlag, New York. Clark, M. R. B. (1977). A quantitative study of King and Pawn against King. In “Advances in Computer Chess 1” (M. R. B. Clark, ed.), pp. 108-118. Univ. of Edinburgh Press, Edinburgh. de Groot, A. D. (1%3). “Thought and Choice in Chess.” Mouton, The Hague. Douglas, J. R. (1978). GM Walter Browne vs. CHESS 4.6. Chess Life Rev. 33, 363-364. Euwe, M. (1970). Computers and Chess. In “The Encyclopedia of Chess” (A. Sunnucks. ed.). St. Martins, New York. Fine, R. (1941). “Basic Chess Endings.” David McKay, New York. First Microcomputer Chess Tournament. (1978). Chess Life Rev. 33, 31 I . Fuller, S. H., Gaschnig, J. G., and Gillogly, J. J. (1973). An analysis of the alpha-beta pruning algorithm. Dept. of Computer Science Report, Carnegie-Mellon University, Pittsburgh, Pennsylvania.
116
MONROE M. NEWBORN
Gillogly, J. J. (1972). The technology chess program. Artif: Intell. 3, 145-164. Goldwater, W. (1977). My game and animadversions. Chess Life Rev. 32, 313-314. Good, 1. J . (1968). A five year plan for automatic chess. I n "Machine Intelligence, 2" (E. Dale and D. Michie. eds.), pp. 89-1 18. Univ. of Edinburgh Press, Edinburgh. Greenblatt, R. D., Eastlake, D. E., and Crocker, S. D. (1%7). "The Greenblatt chess program." (Proc. Fall Joint Computer Conf.)), pp. 801-810. AFlPS Press, Montvale, New Jersey. Griffith, A. K. (1976). Empirical exploration of the performance of the alpha-beta tree search heuristic. IEEE Trans. Comput. pp. 6-10. Hart, T. P., and Edwards, D. J . (1961). The tree prune (TP) algorithm. MIT Artificial Intelligence Project Memo No. 30, R.L.E. and Computation Center, Massachusetts Institute of Technology, Cambridge, Massachusetts. (Revised October 28, 1%8: Edwards, D. J . , and Hart, T. P. The a-p heuristic.) Hayes, J., and Levy, D., (1976). "The World Computer Chess Championship." Univ. of Edinburgh Press, Edinburgh. Hubermann, B. J. (I%@. A program to play chess end games. Technical Memo CS 106, Computer Science Dept., Stanford University, Stanford, California. Kaplan, J. (1977). Let's go, big beige machine! Sporrs Illus. Aug. 22, p. 42. Kister, J., Stein, P., Ulam, S., Walden, W., and Wells, M., (1957). Experiments in chess. J. Assoc. Compur. Mach. 4, 174-177. Kotok, A. (1%2). A chess playing program for the IBM 7090. B.Sc. Thesis. Artificial Intelligence Memo No 41, Massachusetts Institute of Technology, Cambridge, Massachusetts. Knuth, D. E. (1968). "Fundamental Algorithms: The Art of Computer Programming 1." Addison-Wesley , Reading, Massachusetts. Knuth, D. E. (1973). "Sorting and Searching: The Art of Computer Programming 3." Addison-Wesley, Reading, Massachusetts. Knuth, D. E., and Moore, R. N. (1975). An analysis of alpha-beta pruning. Arf$ Infell. 6 , 293-326. Lasker, Edward. (1977). But will it fly'? Chess Life Rev. 32, 314. Levy, D. (1976a). " 1975 U.S. Computer Chess Championship.'' Computer Science, Woodland Hills, California. Levy, D. (1976b). "Chess and Computers." Computer Science, Woodland Hills, California. Levy, D. (1977a). Invasion from cyberland. Chess Life Rev. 32, 312-313. Levy, D. (1977b). " 1976 U.S. Computer Chess Championship.'' Computer Science, Woodland Hills, California. Marsland, T. A. (1976). 1976 Canadian computer-chess workshop. Sigart Newsletr. (60), 22. Marsland, T. A. (1977). A comprehensive list of computer chess literature. Tech. Report TR77-4. Dept. of Computer Science, University of Alberta, Canada. Michalski, R., and Negri. P. (1975). "An experiment in inductive learning in chess endgames: the King-Pawn case." Machine Intelligence 8 (eds. G . W. Elcock and D. Michie). Chichester: Ellis Honvood, and New York: John Wiley. Michie, D. (1976). An advice-taking system for computer chess. Comput. Bull. December, 12-14. Michie, D. (1977). David Levy challenge game, 1 April 1977. Sigarr Newslefr. (62). 10-11. Michie, D., and Bratko, I., (1978). Advice tables representations of chess end-game knowledge. Proc. AISB Summer Conf. (D. Sleeman, ed.). Homburg. Mittman, B. (1974). First world computer chess championship at IFIP Congress 74 Stockholm, August 5-8. Commun. Assoc. Comput. Much. 17,604-605. Momson, M. E. (1976). 4th Annual Paul Masson American Class Championship, Chess Life Rev. 31, 553.
RECENT PROGRESS IN COMPUTER CHESS
117
Momson, M. E. (1977). “Official Rules of Chess” (M. E . Momson, ed.). David McKay, New York. Newborn, M. M. (1975). “Computer Chess.” Academic Press, New York. Newborn, M. M. (1977a). The efficiency of the alpha-beta search on trees with branchdependent terminal node scores. Art$. Infell. 8, 137-153. Newborn, M. M. (1977b). PEASANT An endgame program for Kings and Pawns. I n “Chess Skill in Man and Machine” (P. Frey, ed.), pp. 119-130. Springer-Verlag, New York. Newborn, M. M. (1978). Computer Chess: Recent progress and future expectations. Proc. Jerusalem Conf. lnf. Techno/. North-Holland, pp. 189-192. Newell, A., and Simon, H. (1972). “Human Problem Solving.” Prentice-Hall, New York. Newell, A., Shaw, J., and Simon, H. (1%3). Chess playing programs and the problem of complexity. IBM J . Res. Dev. 2, 320-335. Nilsson, N. (1971). “Problem Solving Methods in Artificial Intelligence.” McGraw-Hill, New York. Penrod, D. (1977a). Comput. Chess Newsleft. ( I ) , Santa Barbara, California. Penrod, D. (1977b). Comput. Chess Newsleft. (2). Santa Barbara, California. Piasetski, L. (1976). An evaluation function for simple King and Pawn endings. M.Sc. Thesis, McGill University, Montreal. Richter, H. (1976). The first German computer chess championship at Dortmund. Sigart Newsleft. (56), 2. Scott, J. J. (1%9). A chess playing program. I n “Machine Intelligence 4” (B. Meltzer and D. Michie, eds.), pp. 255-266. Univ. of Edinburgh Press, Edinburgh. Shannon, C. (1950a). Programming a computer for playing chess. Philos. Mag. 41,256-275. Shannon, C. (1950b). A chess playing machine. Sci. A m . 182, 48-51. Simon, H. A., and Chase, W. G. (1973). Skill in Chess. Am. Sci. 61, 394-403. Slagle, J. R.,and Dixon, J. K. (1969). Experiments with some programs that search trees. J . Assoc. Comput. Mach. IS, 85-99. Slate, D. J., and Atkin, L. R. (1977). CHESS 4.5-the Northwestern University chess program. I n “Chess Skill in Man and Machine” (P. Frey, ed.), pp. 92-118. SpringerVerlag, New York. Slate, D., and Mittman, B. (1978). CHESS 4.6-Where do we go from here? Proc. Jerusalem Conf. lnf. Techno/. North-Holland, pp. 184-188. Soule, S., and Marsland, T. A. (1975). Canadian computer chess tournament. Sigarf Newsleu. (54). 12-13. Tan, S. T. (1972). Representation of knowledge for very simple pawn endings. Technical Memorandum MIP-R-98, Edinburgh: Department of Machine Intelligence. Turing, A. M. (1953). Digital computers applied to games. I n “Faster than Thought” (B. V. Bowden, ed.), pp. 286-295. Pitman, London. Wiener, N. (1948). “Cybernetics.” Wiley, New York.
This Page Intentionally Left Blank
Advances in Software Science M . H. HALSTEAD' Department of Computer Sciences Purdue University West Lafayette. Indiana
1. 2. 3.
4. 5. 6. 7. 8. 9. 10.
I1. 12 . 13 . 14. 15 . 16. 17 . 18. 19. 20 . 21 . 22 . 23 .
Introduction . . . . . . . . . . . . . . . . . . . Basic Metrics . . . . . . . . . . . . . . . . . . Volume . . . . . . . . . . . . . . . . . . . . . Potential Volume . . . . . . . . . . . . . . . . . Implementation Level . . . . . . . . . . . . . . . Language Level . . . . . . . . . . . . . . . . . The Vocabulary-Length Equation . . . . . . . . . The Mental Effort Hypothesis . . . . . . . . . . . Extension to "Lines of Code" . . . . . . . . . . . Programming Rates versus Project Size . . . . . . . Clarity . . . . . . . . . . . . . . . . . . . . . . Error Rates . . . . . . . . . . . . . . . . . . . Measurement Techniques . . . . . . . . . . . . . The Rank-Ordered Frequency of Operators . . . . . The Relation between q I and q2 . . . . . . . . . . The Use of qz* in Prediction . . . . . . . . . . . . Grading Student Programs . . . . . . . . . . . . . Semantic Partitioning . . . . . . . . . . . . . . . . Technical English . . . . . . . . . . . . . . . . . . Learning and Mastery . . . . . . . . . . . . . . . . Text File Compression . . . . . . . . . . . . . . . Top-Down Design in Prose . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
.
1 Introduction
Software Science is an intellectually exciting discipline currently undergoing rapid development. In this article we shall attempt to cover its present state. both with respect to its accomplishments and to those areas in which unanswered questions remain . For this purpose we shall define "software" not just as the collection of
.
I Deceased; the contribution was further edited by his colleague. S . Zweben Department of Computer and Information Science. Ohio State University. Columbus . Ohio.
119 ADVANCES IN COMPUTERS. VOL 18
Copyright 0 I979 hy Academic Pre% Inc . A l l rights of reproduction in any form r e w v e d ISBN O - l ? - O I ? l I X - ?
120
M. H. HALSTEAD
computer programs capable of producing a desired result when serving as a set of instructions in a machine, but much more generally as any communication that appears in symbolic form in conformance with the grammatical rules of any language. Consequently, in the following treatment the term software is intended to apply to anything from a bit pattern in machine language to this article itself (provided that in the latter case the editor succeeds in making it conform to English grammar). Similarly, we shall accept as a definition of science any discipline that contains the following five components: (1) Sound rnetrics, which can be quantitatively applied in the measurement of any item in its domain. (2) Reproducible experiments, which can be and are repeated with comparable results at independent laboratories by different scientists. (3) Derivable relationships among the various properties of the items in its domain. (4) Ability to explain observed phenomena in terms of more basic properties. ( 5 ) Ability to predict the result of an experiment before it has been performed, rather than merely to explain it after it has been completed.
Thus defined, software science can be treated as a proper basis or foundation for the field of Software Engineering, but not as synonymous with it. This is not unlike other branches, in which the engineering usually preceded and indeed stimulated the development of the underlying science. It is interesting to note, however, that it was only after the development of thermodynamics for power engineering, electrodynamics for electrical engineering, or statics, dynamics, and strength-of-materials for mechanical engineering that those branches could be considered quasicomplete, highly competent, and dependable engineering disciplines. Such a goal for software engineering clearly motivates much of the work in software science. 2. Bask Metrks
Any attempt to find a universal set of metrics that could be applied to any computer program, be it the bit pattern in a computer, or the expression of an algorithm in FORTRAN, might at first glance appear destined to be unfruitful, if not merely difficult. But without universal, measurable parameters, we would be in the position of trying to develop the science of thermodynamics before the advent of a temperature scale. Actually, however, this problem is rather simple, once we recognize the fact that any
ADVANCES IN SOFTWARE SCIENCE
121
software must consist of an ordered string of operators and operands. When a program is translated from one language to another, as from FORTRAN to machine language for example, the actual operators and operands may indeed change, but both versions must still consist of combinations of operators and operands. No other category of entities need be present. This is perhaps easiest to see with respect to the actual machine language of a computer, in which each instruction contains an operation code in one segment and an operand or the address of an operand in the remaining bits of the word. Nothing else is required, and the two categories are mutually exclusive, provided no metalinguistic usage occurs. The same dichotomy exists in higher level languages, and we can define an operand as a variable or a constant. An operalor is therefore an entity that can alter either the value of an operand or the order in which it is altered. From these simple definitions, it has been possible to obtain quantitative measures for many useful properties of programs or prose, such as program length, volume and level, potential volume, language level, clarity, implementation time, and error rates. Thus far it has not been found either necessary or desirable to differentiate among operators according to their power or function, nor to treat separately the variables or constants among operands. This would not be the case if software science were concerned with program execution time, but since it is not, all operators contribute equally. Since a program consists of an ordered string of operators and operands, and of nothing else, it can be characterized by four basic measures. These are the total number of occurrences of operators, called N,, and the total number of occurrences of operands, denoted by NO,as well as the number of different or unique operators called q, and the count of unique operands in the program, denoted by q,. It follows that the length N of any program is merely the sum of N, and N,. A minor advantage of this definition of program length is its independence of language and of the number of characters in identifiers in any particular version. Similarly, the vocabulary of a program (or more precisely the count of items in the vocabulary), is simply the sum of q, and q2and is represented by q. Here again, the measure does not depend upon the language, but only upon the particular program being expressed in that language. In other words, the availability of a particular operator does not produce a contribution to q unless that operator is actually used in the program under analysis. While at first glance it might appear that the language itself sets an upper limit to the number of unique operators in a program, this is not actually the case. Provided a language permits the definition of new functions, procedures, or subroutines, then there is no upper limit. In fact, because a transfer of control to a specified point is a unique operator, this facility alone guarantees that v1 is not limited by the language.
M. H. HALSTEAD
122
3. Volume
The quantitative concept of the volume of a computer program can be arrived at in either of two equivalent ways, once its length and vocabulary have been defined. The simplest approach is to define the volume of a given implementation as the fewest number of binary digits or bits with which it could be represented. This approach has the essential advantage that it is independent of the number of letters or characters in an identifier. Because the number of different entities in any program is given by its vocabulary, it follows that the number of bits required to provide a unique pointer or designator for each entity must be given by log, 7. Using this number of bits to specify each item in the string of length N gives the volume
v =N
log,
7)
(1)
or, in terms of the four basic metrics
v
( N , + N2) Iog2(7)* +
(2) The reason for calling this property the volume, rather than the area, of a program will be discussed later, after the relation between N and 7 has been explored. =
772)
4. Potential Volume
While the volume is Ldfined as the minimum number of bits L..at wou be required to represent a given implementation, it is apparent that if the same program were translated into some other language, the resulting version would have a different volume. For example, if the translation was to a more powerful language, then the resulting version would have a smaller volume. If it was translated to a less powerful, or lower-level language, then its volume would increase. Consequently, the volume of a given version of a given algorithm must inversely reflect the level at which it is written, and the ratio of the volumes of two versions of the same algorithm must give the inverse of the ratio of the levels at which they have been implemented. Now if an algorithm could be translated into ever more powerful languages, its volume would continue to decrease. But, barring extrasensory perception, this decrease must reach a limit for any particular algorithm. To express that algorithm, even in the most powerful language conceivable, would still require operators and operands, hence its volume would be finite.
ADVANCES IN SOFTWARE SCIENCE
123
The concept of the most powerful conceivable, or potential language is quite simple. It merely implies that the language already contains any function, procedure, or subroutine that might be required, ready to be invoked. Of course such a potential language could never exist, but the idea of one is most useful. Expressed in this language, any algorithm would require exactly two operators, the name of the required procedure, and a grouping operator. The number of operands required would depend upon the algorithm itself, and would equal the number of conceptually unique input and output operands that would require specification. Notationally, all properties expressed in terms of a potential language have asterisks. Using the previous definition of volume, the potential volume is given by V* 3 N* log,
(3)
r)*
but because no repetition would be required in a potential language, it must follow that N* = q*, hence
v* = 7)* log, 7)*
( 4)
Expanding to operators and operands gives
v* = (ql*+ and since as noted earlier vl*
=
7)2*)
b3,(7)1* +
7)2*)
(5)
2, the potential volume is simply
v* = (2 + 7),*) log,(2 + q,*)
(6)
Consequently, the potential volume of a program is a single-valued function of the count of conceptually unique input and output operands that would be required to call the subroutine or invoke the procedure if the program itself had been written as a subroutine or procedure. It follows that the potential volume, unlike the actual volume, must be completely independent of the language or languages in which the algorithm is, or could be, expressed. Further, it is the minimum possible volume associated with any given algorithm, and in that sense represents an absolute value against which others can be compared. 5. Implementation Level As mentioned earlier, if a given program is rewritten in a less powerful language, its volume can be expected to increase. Even if it were rewritten in the same language, its volume might still change. For example, a wellwritten program in some higher level language could be reexpressed more verbosely, and still achieve the same result when executed. It would still
124
M. H. HALSTEAD
convey the same amount of intelligence, even though it had been implemented at a lower level. This illustrates the reason why the “information content” or entropy of the well-known mathematical information theory of Claude Shannon (1935) is inadequate for the quantitative study of programs. It is strictly comparable to the implementation volume V, hence it must change for equivalent versions of the same message, depending upon the power or level of the language in which it is expressed. The potential volume, on the other hand, does not have this disadvantage. This in turn suggests that by the simple definition of implementation level L as L = V*N
(7)
the product of volume times level will be completely language independent, because then
v* = LV
(8)
While the program level is always defined by Eq. (7), it is possible to derive a close approximation to it without reference to the potential volume by considering only the components of the basic parameters, length, and vocabulary. Calling this approximation i to differentiate it from the actual level, one can proceed in the following way. First, note that with respect to unique operators, the larger the number of them employed, the lower the level of implementation. Since the minimum number is ql* = 2, we should expect that L will vary with the ratio
-
L VI*hl With respect to operands, it seems clear that every repetition of an operand name must be accompanied by a diminution of the level, so that
-
L 7)2/N2 Because only operators and operands occur in a program, other terms should not be required. The simplest combination of these two terms that will meet the condition that L = 1 for a potential language is their product, where the constant of proportionality is one. This gives the relation
i = (7)1*/7)1)(7)2/N2)
( 9)
or Equation (10) has been found experimentally to give a sufficiently close approximation to the defined value of L that it is frequently used inter-
ADVANCES IN SOFTWARE SCIENCE
125
changeably. In fact, before the concept of potential volume was developed, the discovery that the product LV was constant under translation was based on c, leaving L to be defined later. Actually, the form of Eq. (10) provides additional insight not obvious in Eq. (7). For example, since every transfer of control to a unique location contributes to ql,it is clear that the use of GO-TOs in a program reduces the level, since q1appears in the denominator. On the other hand, if the need for a particular transfer could only be avoided by setting a flag (an additional operand) and testing its value at many points, this also would lower the level by reducing the ratio qz/N,. Consequently, this suggests that while GO-TOs are best avoided in general, this is not invariably true.
6. Language Level
When the same program is rewritten in different languages, the product LV remains constant. But when different programs are written in only one language, that product must increase as the potential volumes increase. Furthermore, it has been found experimentally that as the potential volume is increased, the implementation level decreases proportionately. Consequently, for programs written in the same language the product of implementation level times potential volume tends to remain nearly constant over a wide range of program sizes. This finding permits the definition of a language level A as A = LV*
(11)
which by invoking Eq. (7) can also be expressed as A
=% )*'v(
(12)
For the various languages tested, the mean value of language level behaves as intuition would suggest, increasing from assembly language through FORTRAN, ALGOL 58, and PL/I to technical English prose and on to English outlines. And as intuition suggests, the variance about the mean is not only large, but increases as the mean increases, suggesting that the higher the level of the language employed, the more ways there are to use it. An interesting point that will be useful later is that even though the values of the mean language level increase properly from assembly language at 0.88 to PWI at 1.53, general purpose procedure oriented computer languages appear to have values fairly near one. Even English, at A = 2.16, is not greatly above unity.
M. H. HALSTEAD
126
7. The Vocabulary-Length Equation
The first unexpected, and at that time counterintuitive, finding in the field now called software science was that the lengths of published programs are determined almost exclusively by the size of their vocabulary components ql and q2. The relationship, which was observed before it was predicted, can now be obtained in the following way. First, note that because all of the items in the program's vocabulary must occur in the program, the length cannot be less than the vocabulary. The upper limit would at first appear to be unbounded, but if we merely postulate that no identical substrings of length greater than q appear, then there is also an upper bound. Since to replace a common subexpression in a program would require the addition of a new operand, hence an increase in q, this condition appears reasonable. The resulting upper bound is not too reasonable because the number of different ways in which q things can be selected from a group of r) things is qV.This upper limit can be reduced somewhat by noting that operators and operands tend to alternate in computer programs, giving q 5N
(13) From the upper limit in Eq. (13), we can obtain the observed relation in either of two ways. First, we can note that unlike a program, which must be organized, the upper limit is completely disorganized. Simply assuming that organization will reduce the upper limit to its logarithm then yields the observed relation. Alternatively, we may note that in addition to the ordered set of N items in the program we seek, the upper limit must also contain all possible subsets of that ordered set. The family of all possible subsets of a set of N elements is called its power set, and itself has 2N elements. Consequently we have Iql'J1
x
r)2Vz
or N
= I0g2(q1V1
x q2y
which is equivalent to N
=
log, q1-1
+ log, 721)s
which is usually written as
fl = r)l log2 71 + 7 2 log2 q z
(14)
where the caret is placed over N to indicate that it is a calculated, rather than an observed value.
214
c
/ I
T
+
. /,, -
I
K
FIG. I . Data for replotting the vocabulary-length equation. Table entries are logarithms to the base two of N and N.
I/
4 ' \
.a-
T I
N
!(Halstead. 1972a: 14 ALGOL]
N
5.79 6.67 7.23 8.58
k
5.83 6.66 7.13 8.20
.- [Bohrer,
. l [ B u l u t PI o l . . 1974b 429 FORTRAN] N
2.81 3.58 4.58 5.57 6.57 7.55 8.53 9.46 10.46
11.38 I2 M 13.38
.I' 3.70 4.67 5.30 6.03 6.77 7.56 8.28 8.97 9.86 I I .08 12.78 13.21
1975: 15 FORTRAN 1 N
N
5.46 6.85 7.65 8.51
5.64 7. I8 7.62 8.43
\[Magidin and Viso. 1976:
'1976: 121 Pull
so ALGOL]
N
4.00 5.86 6.62 7.55 8.51 9.39 10 52
N
5.00 5.72 6.56 7.13 7.95 8.48 9.91
,.[Laemme1 and Shooman, 1977: 17 Mixed]
* [Elshoff,
N
6.93 7.63 8.53 9.70 10.63 I I .63 12.49 13.38 14.18
N
7.01 7.61 9.03 9.83 10.70 11.70 12.55 13.43 14.22
N
,I'
5.77 6.67 7.39 8.51
5.86 6.66 7.48 8.48
-. [Halstead. 1972b: N
6.39 7.70 8.36 9.77
12 English and rruns. 1 N
6.43 7.64 8.35 9.84
128
M. H. HALSTEAD
Before considering the accuracy, the implications, or the uses of Eq. (14), a frequently useful approximation to it should be given. Whenever ql = q2,the relation reduces exactly to
A
= r)
10&(7/2)
(15)
Even for large values of q2,the error must always be smaller than l/log2r), hence Eq. (15) can serve as a good first approximation to Eq. (14). Returning to the vocabulary-length equation itself, it has been tested by a number of investigators at several institutions over a wide range of program sizes. Figure 1 shows the general agreement observed. But even though correlation coefficients between N and of over 0.98 have been reported for large groups of commercially implemented programs (Elshoff, 1978a), it is clear that equivalent programs could have been written such that the relation would not be true. Consequently, the fact that any relation between vocabulary and length can be observed suggests that software (including technical English, for the relation has been found there as well) must be generated in a much more constrained way than had previously been suspected. This in turn suggested two avenues for further investigation. First, a study or classification of ways in which programs could be altered so that they remained equivalent but did not conform to the vocabulary-length relation. Six such ways have been identified and called Impurity Classes. They include (1) complementmy operations, such as adding and immediately subtracting one operand from another, (2) ambiguous operands, the use of one operand name to serve different purposes, usually in different parts of a program, (3) synonymous operands, or the use of two different variables whose values are always the same, (4) common subexpressions, or the failure to assign a name to the result of a frequently used calculation, ( 5 )unwarranted assignment, or the assignment of an operand name to the result of a calculation used only once, and (6)unfactored expression, or the failure to factor a factorable expression. It is interesting to note that the optimization pass of a good compiler is designed to remove just such impurities. It is perhaps more significant, however, to observe that programs written by beginning students usually contain many of them, while programs published in the computing literature rarely do. Furthermore, not only are impurities quite rare in published programs, but few of them are found in samples of commercial programs written by professional programmers for use in their own shop. From this observation, one might suspect that the inclusion of impurities in a program would somehow increase the work or mental effort required to implement it. While we shall see later that this is indeed the case, the important idea at this point is the possibility that the basic software metrics might be related to mental effort in a quantifiable way.
ADVANCES IN SOFTWARE SCIENCE
129
8. The Mental Effort Hypothesis
For a programmer fluent in a language and working from a problem statement he understands, the generation of a computer program must consist of the judicious selection of N symbols from a list of q symbols. If each selection requires a search of the list, then the most efficient process generally available is a binary search. On the average, each binary search must require log, q mental comparisons. The total number of mental comparisons required to generate a program in this way is therefore the product of length times the logarithm of the vocabulary. But this is merely the previously defined volume V = N log, q
Now if we express the difficulty D as the number of elementary mental discriminations (e.m.d.) required to make one average mental comparison, the total number of e.m.d. required to generate the complete program is simply
E
=
VD
(16)
Now if we reexamine the concept of the implementation level L, we note that it was intended to be the inverse of difficulty. Substituting the reciprocal of the level for difficulty in the equation above gives the effort E in units of elementary mental discriminations.
E
=
VIL
(17)
In these units, Eq. (17) can be converted directly to units of time, merely by knowing the rate at which the brain makes elementary mental discriminations. According to psychology, this rate is nearly constant, and does not vary significantly with intelligence. In honor of John Stroud (1966), who first reported this rate, we designate it by S or the Stroud number. For time in seconds, S is taken as 18. It should be pointed out, of course, that if a programmer is “time-sharing,” only a fraction of the available e.m.d. per second will be applied to the programming task. But, provided he is concentrating, perhaps with the mental equivalent of the hardware “inhibit-all-interrupts” instruction, the relation between time and effort
T
= EIS
(18)
can be used to convert Eq. (17) to the timing equation
T
=
VILS
(19)
Despite its remarkable simplicity, Eq. (19) has demonstrated a remarkable ability to predict observed programming times ranging from 5 min to 11,700 man-months. Some of these results will be shown later, but first it
130
M. H. HALSTEAD
is worth noting that, strictly speaking, it should be applicable only to programs as they are originally written. Any effort devoted to rewriting to improve efficiency or clarity will not be measured. But the fact that this limitation has not produced any marked effect in tests of the hypothesis suggests that, over large samples, significant amounts of effort have not been devoted to this aim. This point will be examined in more detail later, when program clarity is discussed. By using the known relations between properties and appropriate algebraic manipulation, it is possible to express the fundamental effort and timing equations in a number of alternate forms. For example, in terms of the four basic parameters, we have By using the vocabulary-length equation and assuming that N , = N z , this can also be expressed in terms of the vocabulary components, as ( 18b) E = f ( 7 1 / 7 2 ) ( 7 1 log2 71 + 7 2 log2 r)2Y Iog2(71 + 7 2 ) In terms of program volume and language level, Eq. (18) reduces to E = Vl.5A-0.5 (18~)
and the timing equation becomes T = S-l.oV1.5A-~.s
(194
This form of the expression is convenient in those cases in which the components of the vocabulary are not known directly, but an estimate of the language level is available. In terms of language level and potential volume, two properties that at least in principle should be available from the job specifications or problem statement, the effort becomes
E
=
(V*)3/Az
( 184
From this formulation, it is easy to observe that for any given problem, the effect of the language level follows an inverse square law. Consequently, the effect of any deviation of A from its expected value will be magnified. Since the programming languages for which data are available typically show a variance of about two-thirds of their means, Eq. (18d) should be used with caution for any single program. 9. Extension to “Lines of Code”
Provided that one knows the level of the language in which a program is written, and the number of executable statements it contains, it is possible
131
ADVANCES IN SOFTWARE SCIENCE
I
2O' 20
. 24'
28
212
216
T (minutes)
! [Halstead,
1972aJ 7 Machine language programs (6 pts) log, T log, f 8.23 7.17 9.08 9.17 11.23 12.28 12.55 12.44 14.81 13.74 18.46 18.27
./(Halstead, 1975al 36 FORTRAN, PL/I, and APL programs ( 5 pts) log, T log, f 2.46 0.63 3.58 2.66 4.36 4.09 5.42 5.37 7.08 6.93
I
220
224
I
228
.- [Gordon and Halstead, 1976) 1 1 FORTRAN programs (4 pts) log, T log, 7 2.32 2.32 4.46 3.64 5.36 5.54 6.52 6.71
-[Halstead, 1977al Line of regression of 16 COBOL programs log, T log, t Line from: 14.22 13.87 To : 25.18 26.91 -- [Walston and Felix, 19771 Line of regression of 60 IBM projects log, T log, 7" Line from: 14.87 14.26 To : 26.81 26.73 -- - [Shen and Halstead, 19781 Line of regression of 60 DOD projects log, t log, T Line from: 15.80 16.25 To: 26.56 26.67 FIG.2. The programming effort relationship. Data for replotting Fig. 2.
M. H. HALSTEAD
132
to apply these equations to it, even though none of the basic software metrics are directly available. For example, with a large enough sample of programs in a given language the total count of operators and operands in an average executable statement will be found to remain reasonably constant. Consequently, the length N can be closely approximated from the number of executable statements P. For machine language on a one-address machine with index registers, the relation is N = (8/3)P
(a)
while for FORTRAN N = 7.5P (b) is used. (If total statements, rather than executable statements is the only length statistic available, then it must be reduced appropriately.) With an estimate of N, the vocabulary-length relation expressed as in Eq. (15) can be used iteratively to obtain the expected vocabulary r). The expected volume corresponding to the given number of source statements can then be calculated from N and r) with Eq. (1). Finally, using a mean value of A, say 1.16 for FORTRAN, the expected effort can be calculated with Eq. (18c), and the expected programming time with (19a). Figure 2 shows the degree of agreement between the hypothesis and all of the observational data currently available. 10. Programming Rates versus Project Size
In what is certainly one of the most conclusive studies to date, Claude Walston (1977) of IBM compared the timing hypothesis with the observations in his IBM large project database. Some 60 programming systems ranging from 3 to 11,700 man-months in duration were available to him. Using the accepted engineering approach, he reduced the transcendental equations of software science to a power curve in P and T, and compared his result with the comparable expression obtained by a statistical leastsquares-fit to his data. He reported that from the hypothesis he obtained P
=
1.27 T2‘3
(20)
and from the observations he found P
=
0.925 To.?’
(21)
where P is in thousands of lines of source code, and T is in man-months of effort. Walston concluded that the two results were “interestingly close,” as indeed they are.
ADVANCES IN SOFTWARE SCIENCE
133
But an even more interesting result can be obtained from his results, due to the fact that they cover such a wide range, extending nearly to a loo0 man-year project. Here, perhaps for the first time, the effect of job size on programming rates is definitively present. By using Eqs. (20) and (21) for hypothesis and observation, respectively, and solving for PIT for the range of T represented in Walston’s database, we obtain the following comparative figures (Table I). In light of these results, it now appears that the hypothesis, as confirmed by observation, gives an adequate indication not only of average programming times, but also of the effect of total project size upon rates of programming. 11. Clarity
As mentioned during the discussion of the effort hypothesis, any mental effort diverted from the straightforward generation of a program, and applied instead to the task of making it more easily understood by others can not be measured as a function of the final value of VIL. But that is still only half of the effect. To the extent to which the effort to polish the program was successful, it will also have reduced the volume, increased the level, or both. As a consequence, applying the effort equation to the final, polished version would be expected to indicate how much time it should have takeii to write the program properly in the first place. In considering this problem, both Gordon (1977) and Fitzsimmons and Love (1977) recognized that the processes active in the generation of a new program, and those employed in understanding a previously written program might be sufficiently similar that they could both be quantified by the same metric. With this in mind, Gordon first examined separately each of the six impurity classes mentioned earlier. They had originally been identified experimentally as programming practices that violated the vocabularyTABLEI
PIT (source lines per man-months) T(man-months)
Theory
Observation
12 120 I200 12,000
555 257 120
439 220
55
55
110
134
M. H. HALSTEAD
length relation. When present in a program, three of them increase the length, and the other three reduce it. But Gordon was able to demonstrate mathematically that each of these impurities moved the effort E in the same direction, increasing it. He then made a search of the literature on programming practices, and obtained 46 examples in which various authorities had illustrated a “brute force” and a “clearer” way of programming a problem. For each of the 46 pairs of programs or program segments, he measured the basic software parameters and calculated E. In 90% of the cases he found that E decreased with the improvement, and for the exceptional cases he pointed out convincing arguments for doubting the opinion expressed by the expert. On the average, the effort evaluated for the improved, more easily understood version was 66% of the effort calculated for the original. In an interesting experiment conducted by Fitzsimmons and Love (1978), each of three different programs was written in nine different versions, where each version was intended to represent a difference in ease of understanding, either in terms of mnemonic complexity or of control complexity. For each problem, eight of the nine versions were issued to a different programmer, who was asked to first memorize it, and then functionally to reproduce it. The investigators then measured the percentage of statements correctly recalled, and found that as one would expect, the percentage tended to decrease as the number of statements increased, with a correlation of -0.70 (indicating that variation in the number of statements accounted for 49% of the observed variance). They then replaced the number of statements with the calculated number of e.m.d. E, and found that the coefficient of correlation improved to -0.81. (Indicating the E accounted for 65% of the variance, for a 32% improvement over number of statements.) Summarizing this experiment, they reported “Once again, this experiment supports the hypothesis that E is a strong indicator of the comprehensibility of a program.” This result may prove useful in a number of ways. For example, by providing a quantitative measure of what appears to be the most important variable related to program clarity, it should assist in any attempt to identify and evaluate others. In the past, studies of the effect of different styles of indentation, of the effect of mnemonic operand names, of the value of comments, and even of the presence of flowcharts have all been inconclusive. This should not be taken as evidence that these things have no effect on program understanding, but only that their individual effects are small. But because their cumulative effect, as estimated from the correlation coefficient of -0.81, may contribute about half as much as the E measure, their possible importance is still worth investigation. In any experimental science, the most efficient method of determining the effect
ADVANCES IN SOFTWARE SCIENCE
135
of one variable requires holding other variables constant, and varying the studied variable over as wide a range as possible. This becomes increasingly important if the effect of the variable under study is small with respect to the others. Otherwise, it may be hidden in the noise and undetectable. But until a “second-order’’ variable has been indentified, isolated, and understood, we have no way of knowing whether or not there might be conditions under which its effect could become dominant. The identification of E as a dominant measure of clarity should consequently permit the design of more sensitive experiments aimed at evaluating the other factors that must also contribute to clarity. A more immediate application of such a measure might be found in the area of large program maintenance. It has long been recognized that virtually any useful program has a dynamic life cycle, requiring occasional or even frequent updating and modification to keep it abreast of changing conditions. In addition to those modifications required by changing environments, the original definition of a “completely debugged program” as a program that will never be used again implies that error detection may also be required during the maintenance phase of a large project. But in the usual programming branch, as soon as a program has passed an acceptance test, the original programmers are assigned to new tasks, and less experienced programmers are assigned to maintain the project. At this point, by measuring E (and T ) for the program, there is available an estimate of the time each newly assigned programmer will require before he can be expected to understand completely the program. While such an estimate may be valuable by itself, it also suggests a more interesting possibility. For a program that is expected to have a long life and frequent changes in maintenance personnel, additional work by the original programmer to reduce the E value of the program may be easily and completely justified. In any particular case, of course, this will depend upon how much improvement in E can be accomplished with a given amount of additional effort. None of the relations of software science provides a direct answer to this question, but important guidance is available. It was mentioned earlier that for any particular language, the variation around the mean value of its language level A is quite large for individual programs. Consequently, comparing the actual value of A for a completed program with the average value for the language should establish whether or not the program is above or below the average clarity of programs in that language. Then for those jobs markedly below average, and the large variance guarantees that an appreciable fraction will be, added effort devoted to reducing the volume and increasing the implementation level should be quite productive. On the other hand, those programs that were well and smoothly written the first time, as evidenced by
136
M. H. HALSTEAD
an above average A, should remain in their original form, and additional time should not be wasted trying to improve them. A longer range application of the availability of a quantitative measure of effort and clarity might lie in the design of higher level preprocessors and problem-oriented languages. Returning to the concept of a potential language, which, in essence, would have a named function already available that would itself accomplish any task one might wish to specify, it is obviously unattainable. But in light of that fact, we might ask how the problem is handled in natural language, where again an infinite number of words would be impractical. In natural language, in order to reduce the effort of communication, jargons are developed with respect to any particular area in which many people are engaged. This is as true of any particular sport, hobby, or art as it is of any profession, or subfield within a profession. By means of a jargon, in which words may have distinct meanings that are different from normal usage and in which new words may be readily coined, the effort of communication is minimized within that specific field. Similarly, in computer languages, we must expect to reduce the range of applicability if we would raise the level. Consequently, we should expect that the most fruitful direction available is in the design of preprocessors or special purpose languages. In this regard, it is interesting to note that the input language Ellpack 77 for elliptic problems on rectangular domains shows a A = 2.38 2 0.92 for a small number of the programs analyzed by Rice (1978). This level is well above any mean figures reported for procedure oriented languages, and even slightly above technical English. Since the bulk of the work involved in implementing any problemoriented language is virtually independent of the input language design for it, sample measurements of A for alternative designs being considered might be quite helpful. 12. Error Rates
The nature of errors or bugs in software is rather different from those in hardware. In hardware, a sudden malfunction can seldom be attributed to a problem in initial design or fabrication, but is usually the result of a component failure. But in software, the Occurrence of incorrect results cannot be attributed to, say, a divide instruction that finally quit dividing. Instead, and sometimes grudgingly, we are forced to admit that its source was “human error,” whether in specifications, design, or implementation. Conceptually, then, because every software error can only be attributed to human error, and because the count of e.m.d. required to implement a
ADVANCES IN SOFTWARE SCIENCE
137
program has already been shown to be reasonably well approximated by E = V/L, it should follow that E represents an upper limit to the possible number of bugs of all sorts in a program. No matter how errors are defined, and,there is already a considerable and growing literature on the classification and counting of bugs, this offers an elementary starting point for the development of any hypothesis relating the expected number of errors to a computer program. Initial work in this area merely tested the simple hypothesis that the number of bugs B reported in each of the nine modules of a large system described by Akiyama (1971) should be more highly correlated with the values ofE of the modules than with the number of statements P in each of them. As reported in Funami and Halstead (1976), the correlation coefficient (r = 0.98) between E and B was indeed higher than the corresponding value (r = 0.83) between P and B for the Akiyama data. But a high correlation for one set of data, encouraging though it may be, leaves many questions unanswered, and these questions are essential. Primarily, a high linear correlation implies only that a somewhat linear relationship was observed, but says nothing about either the slope or the intercept involved. Consequently, it leaves unanswered the questions regarding what fraction of e.m.d. were initially erroneous, and how that fraction might be expected to vary with different definitions of errors. Using the observed correlation solely as a starting point, however, Cornell et al. (1977) and Ottenstein ( 1 9 8 ) have derived and tested a reasonably complete hypothesis that provides two or three new insights into error rate phenomena. Their model will be discribed briefly at this point. First, they differentiate between total bugs and validation bugs B,, where the latter are similar to “delivered bugs,” in the sense that validation bugs are those identified only during the validation phase of a project when its various modules are being integrated. Errors detected at that stage are usually treated more formally, with written problem reports or other records, hence they are more nearly comparable from one shop to another. Cornell et al., and Ottenstein then divide the problem into two parts, first the estimation of the number of errors per elementary mental discrimination to be expected on completely new material, and second, the effect of increasing familiarity as a programming task progresses. For the first they invoke the “Chunking” concept of psychology in order to obtain the number of correct e.m.d. to be expected before the first erroneous one is reached, calledE,. The method is ingenious and produces a value that is demonstrably correct. (Since their assumptions are far from axiomatic, it is thus reassuring to note that a similar value of E , could have been obtained from any of various sets of data.)
138
M. H. HALSTEAD
Cornell et al., and Ottenstein begin by accepting the psychological evidence that the short-term memory of the human brain can handle five chunks simultaneously, equate these to conceptually unique inputs (q2*), and reason that the ability to handle inputs implies an output, hence q2* = 6. Further, they conjecture that the number of e.m.d. required to handle these five chunks is precisely the number of correct e.m.d. to be expected before the first error. They then calculate Eo by assuming that the average language level employed is the same as that used in natural language communication, or A = 2.16. With these assumptions,
Eo = (V*)3/h2= [(2 + q2*)log2(2 + v2*)13/A2 (22a) = [(2 + 6) l0g2(2 + 6)]3/2.162 (22b) = 3000 e.m.d. WC) The second phase of the problem might be likened to the following analogy. Suppose one had a new apartment and a new office location a given number of city blocks apart and intended to walk between them. On the first trip, every comer encountered would be a point for a possible wrong turn. But as familiarity was gained, the actual number of wrong turns would approach zero, even though the number of comers did not change. Extending the analogy to many trips to many destinations, all within the same city, it should follow that the expected number of wrong turns would decrease as familiarity with the city increased, even though the upper limit for any new trip would still be the number of blocks to be traversed. For computer programs, the total possible number of e.m.d. that could be in error, or E, should similarly be reduced by the extent to which they represent repetition. Cornell et al., and Ottenstein noted that a quantitative measure of this repetition was already available in software science, in the form of the implementation level L, and that as a consequence,
8,
= LE/Eo
(234
Algebraically, LE must be equal to volume, hence
8,
VIEo = V/3000
(2%) Cornell et d.,and Ottenstein tested this hypothesis against data of Akiyama (1971), Shooman and Bolsky (1975), and Bell and Sullivan (1974), with consistent and satisfactory results, but their analysis of the excellent data of Lipow and Thayer (1977) and Thayer et a / . (1976) provides the greatest insight. Their data, gathered during the validation phase of a 115,000 statement command and control system, were presented in two different ways, and this proved to be significant. In one, values for each of some 250 individual procedures were given, and in the second they were grouped into the 25 mutually exclusive functional packages of =
139
ADVANCES IN SOFTWARE SCIENCE
the system. In both cases, figures were given for the number of problem reports B , and the number of executable statements P. The value of V required by Eq. (23b) could be obtained directly from P via Eqs. (b) and (15).
According to the error hypothesis, these two cases should differ statistically. When the procedures are grouped into the 25 functional entities, there should be only one independent variable, the observed volume, hence one would expect a coefficient of correlation of unity. In the other case, however, the number of errors in any individual one of the 250 procedures should depend not only on its own volume, but also upon the order in which it was implemented within its complete package. If both effects are comparable, but only one is available, then the one that is should account for only half of the variance, or r2 = 0.5. The expected value of the correlation between B, and B , for the 250 separate procedures should be (0.5)”2, or r = 0.71 rather than one. The actual values were r = 0.96 for the 25 functions, and r = 0.76 for the 250 procedures, lending support to the hypothesis. (Ottenstein also eliminated the possibility that the much higher correlation for the functionally grouped data was due instead to the averaging effect of larger samples by repeating ‘the process with randomly grouped procedures.) In order to see how well, and in some cases how poorly, the observed number of problem reports during validation agreed with the hypothesis, the results for the 25 functions, ordered according to observed errors, were as shown in Table 11. If one had been concerned only with the total, or even the four largest sources of error, the error hypothesis performed remarkably well. This in turn suggests again that having a dependable primary metric should allow for more fruitful future investigations into other sources of error that have heretofore been unidentifiable. As an interesting footnote, Ottenstein (1978) points out that, since deTABLEI1
I 4 8 8 13 22 26 27 30
7 10 24 28 20 29 45 20 32
30 41 48 50 54 55 67 69 79
58 47 52 70 62 33 63 I05 121
87 95 105 144
238 239 466
Sum
2006
180 78 66 136 24 1 22 1 406 2154
140
M. H. HALSTEAD
tecting and correcting each error requires an average of two computer runs, it is possible to express the required number of runs per programmer per day R, as
R, = 48SL (24) where S is the Stroud number and L is the implementation level of the program being debugged. If Eq.(24) is recast in terms of the more conservative properties, potential volume and language level, it becomes R, = 48SAN* (24a) which implies that in any one-language shop, programmers working on small jobs can profit from far more validation runs per day than their colleagues engaged on large or very large jobs. This may well lie at the root of an apparent paradox of long standing, on-line versus batch-mode debugging, in which the on-line mode is demonstrably superior, but large jobs are slow to move away from batch mode. Equation (24a) now suggests that there may actually be a valid reason for this phenomenon, and that the superiority of on-line debugging may indeed decrease as the size of the program being validated increases. Before leaving the related subjects of software effort, clarity, and error rates, an important aspect of the hypotheses involved is worth considerable attention. This is the absolute nature of the quantities being defined and measured, where the term absolute has the meaning that the property is measured from a true zero, and increases linearly with respect to its effect on the reader or the writer. Consequently, items can be compared in an absolute sense and one is not limited to merely ranking them relative to one another. The advantage of an absolute scale over a relative one is readily demonstrated by an example from thermodynamics, in which both the centigrade and absolute temperature scales exist. Suppose, for example, that an automobile tire with a constant volume had been inflated to a pressure of 30 lb/in.2at a temperature of 20"C, and the temperature was then raised to 40°C. It is clear from the relation P V = RT that the pressure at 40" will be greater than it was at 20", but of course it will not be 40/20 or twice as great. Instead, to find the amount, we must use the absolute temperature scale, on which the freezing point of water (OOC) is 273" above absolute zero. The final pressure is then 30 x (273 + 40)/(273 + 20) or 32 lb. The example would have been even messier, but still solvable, if the temperatures had been given in Fahrenheit or Reaumur, but the essential point is that it could not be solved at all if there was no absolute scale. It follows as a consequence that any other scale of program complexity
ADVANCES IN SOFTWARE SCIENCE
141
must be less useful, because in order to be different, it must either start at some point other than the true zero of effort, or deviate from linearity with respect to e.m.d. It does not follow that all less general, or nonabsolute measures must be completely useless, of course. Just as the Fahrenheit scale is convenient for many purposes, any complexity metric that can be easily measured, and is relatable to the absolute scale, may prove useful, but not superior. The McCabe cyclomatic number (McCabe, 1977) is an interesting example. It equates complexity with the count of arcs on a graph of a program. Because each arc must, in terms of software science, contribute to q,,and because E increases more than linearly with q,, it follows that a high correlation should exist between the effort and the cyclomatic number. It does not follow, however, that they should be linearly related, nor that a change in effort or clarity resulting only from a change in the language used to express an algorithm would produce any change at all in the cyclomatic number. 13. Measurement Techniques
The definitions of operators and operands in software science were originally thought to be completely axiomatic, with no reasonable margin of ambiguity. For operands, defined as any variable or constant, this is obviously the case, and in precise agreement with the intuitive approach of virtually anyone attempting to count, or measure, the operand content of any program. The same is nearly, but not identically, true of operators, defined as any explicit or implicit symbol or group of symbols that can either alter the value of an operand, or alter the order in which the value of an operand is altered. During early work in the area, it was found from experience that any group of a dozen or more programmers with no prior experience and working only from their own intuitive notion of operators and operands would obtain quite comparable values if asked to analyze the same algorithm. Further, these results were most frequently used in tests of the vocabulary-length relation and that relation is now known to be sufficiently robust that it will not detect minor discrepancies. For example, failure to identify any particular operator results in a lowered observation of both 7,and N,, thus producing no major effect on the relative error (N - n?)/N.Working only from intuition, and without the guidance provided by the definitions above, some programmers would include declaration statements, but most would exclude them. Again, the effect on the relative
142
M. H. HALSTEAD
error of the vocabulary-length was minor. By accepting the definition, it follows that, for example, INTEGER I, REAL X, serve to denote attributes of I and X, but not to alter their values. Consequently, they are not germane to the algorithm being analyzed, which consists only of executable statements. Similarly, because simple labels are neither variables nor constants, the expression “GO TO LABEL A” must be one operator, and not an operator operand pair. In resolving this possible ambiguity, however, the vocabulary-length relation was quite adequate, provided algorithms in languages requiring many GO TOs were compared with the same algorithms expressed in more sophisticated languages. In this particular case, I personally remember being forced to accept Dr. Necdet Bulut’s experimental evidence demonstrating that his intuitive grasp of the problem was superior to my own. (Occasionally it is only comforting in the abstract or the long term to realize that experimental science can be as deaf to expert opinion as to opinionated experts.) As experimental work has continued, the more important ambiguities have been resolved, but until quite recently there has been no generally acceptable method or technique for resolving the more subtle problems. This is well documented by an extensive investigation of operator identification methods for PL/I, reported by Elshoff (1977). In that study, Elshoff took an earlier method, improved upon it as well as could be done intuitively, and then intentionally perturbed it as much as possible in either direction. In one direction, he sought to reduce the number of unique operators to the greatest extent possible, combining all even slightly similar operators into one, and eliminating others. In the other direction he expanded the count of unique operators as much as possible. At that point he had a total of eight different, well-defined operator counting algorithms, covering a range considerably wider than intuition alone would suggest. Applying each of them in turn to the 34 commercial PL/I programs in his data set, he found that in all cases, the correlation coefficient between measured and calculated lengths, or N versus N ,was greater than 0.98. Only when the basic parameters were used to calculate additional properties, such as potential volume or effort, would the eight alternative operator definition packages yield significantly different results. Since Elshoff’s database did not include observed values of either q2* or T, he was forced to conclude that the counting method would definitely affect the results, but that since all of them confirmed the vocabulary-length relation, that relation could not be used to discriminate among them. Even more recently, however, this situation has been altered significantly by an important breakthrough in the mathematical modeling of the
ADVANCES IN SOFTWARE SCIENCE
143
rank-ordered frequency distribution of operators that will be summarized in the next section. 14. The Rank-Ordered Frequency of Operators
If the number of occurrences of the most frequent operator in any program is plotted as a vertical line at the origin, and the number of occurrences of the next most frequent operator is plotted as a second vertical line one unit to the right of the first, and if this process is continued until all of the operators have been treated in the same way, the resulting rank-ordered frequency distribution will exhibit a relatively smooth pattern. It must have a maximum value on the left, fall quickly at first, and then more slowly approach a height of only one unit on the right. There will, of course, be v1 vertical lines, and the area under the curve formed by their end points must be equal to the operator component of the program's length N,. Several investigators have been sufficiently intrigued by the observed stability of this pattern over a wide range of languages and problem sizes that they have attempted to derive the expected pattern mathematically. [See, for example, Zweben (1974, 1977), Elshoff (1977).] These early attempts gave reasonable approximations over one part of the range or the other, or for samples made up of medium- to small-sized algorithms, but a model of sufficient generality for use in the investigation of operator identification methods was elusive. The problem has apparently been solved with a rather novel technique. Because the technique itself may be useful in other cases in which unknown but relatively simple relations among absolute metrics are being sought, its description (Halstead et al., 1%7) will be summarized here. Called an Algorithm Generator, it is essentially just a computer program that accepts as input a set of parametersx,, X,, X,, X, X , , and a required answer or result X. Using all of the operations in its repertoire in all combinations with the input values, it exhaustively generates and tests programs, stopping only when it produces a program that yields the result X from the inputs given. If it is given more than one set of inputs and desired result, it will test the first program that succeeds on set one on all subsequent sets. If any fail, it discards the first program, and continues to generate more complicated programs until it succeeds or exceeds the established time limit. The accuracy required to meet the criteria for success must be supplied. The value of any constant it is allowed to use must also be supplied, entering as an input parameter. This latter requirement is both an advantage and a disadvantage. On the positive side, it
M. H. HALSTEAD
144
guarantees that the Algorithm Generator is in no sense a curve fitter. On the negative side, because the time required is proportional to the product of the number of parameters times the number of instructions in the repertoire raised to the power “number of steps,” an additional constant will increase the time as much as an additional input variable. While the design of the Algorithm Generator is essentially as outlined above, by consulting the cited reference one would find that several heuristics, such as dimensional analysis and the elimination of complementary operations, improved its performance without removing the restriction that it deliver the shortest program yielding the desired result. That it is capable of ignoring an unessential parameter was once demonstrated by feeding it the constants 2n, the acceleration of gravity, and the lengths, masses, and periods of several pendulums. It generated the program for p = 2~ (L/g)1’2,correctly ignoring the masses. Uber once claimed that by using the answer book, it had successfully derived all but two of the equations required by the problems in a freshman physics text, and that the two failures resulted from errors in the answer book. This approach was applied to the problem of rank-ordered frequency distributions of operators in the following way. Program number 1 in the data set of Elshoff described earlier was arbitrarily selected. (His counting method 2 was used, after the correction of two rather obvious errors.) Ten data sets were then taken from the sample program, one for each of the ten most frequent operators. The available operators for the generator were specified as addition, subtraction, multiplication, division, and logarithm to the base two. Two inputs,f,,o andf,,, and the desired result i were then entered for each of the 10 sets. Note that, most fortunately as it turned out, the index for the most frequent operator was taken as zero, rather than one. Also, again fortunately, the allowable error in the calculation ofi fromf,,o andfIsiwas set at 2 1. Given the available input, an Algorithm Generator must produce the five-step program equivalent to log, xo +xo log, x,+XI
xo/x1 + x2
log, x2 +x3 XO x X3 + X = i ( + l ) By deciphering this program we have the deceptively simple expression i
where the X i are given by
= Xo
log,(X,/X,)
ADVANCES IN SOFTWARE SCIENCE TABLEI11
Rank Operator Semicolon Assignment, initial (Argument, subscript) (Expression) DO IF EQ Binary + NE TO-By WRITE WHILE
II
Binary ELSE
~
I
SIGNAL CALL PO01 8 GT CALL PO02 READ
*
LT PUT CALL PO03 CALL PO04 Unary ON BEGIN OPEN CLOSE NG ALLOCATE CALL PO05 CALL PO06 CALL PO07 CALL PO08 CALL PO09 CALL POI0 CALL PO1 1 CALL PO12 GO T O AOOl
I
0 1
2 3 4 5 6 7 8 9 10 11
12 13 14 15
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 22 34 35 36 37 38 39 40 41 42
Frequency f1.i
f1.i
453 326 259 98 77 72 53 35 27 25 21 18 15 13
488 307 200 135 94 67 49 36 28 22 17 14
10
7 7 7 6 5 5 4 3 3 3 3 3 2 2 2 2 2 1 1 1 1
1 1 1 1 1 1 1
11
10 8 7 6 5 5 4 4 3 3 3 3 2
2 2 2 2 2 2 2 2 2 2 I 1 1 1 1
I 1
145
146
M. H. HALSTEAD
As mentioned earlier, the area under the curve ofj,,, versus i must equal the operator component of the length, giving the additional relation n. - I
N, =
fl.,
I=o
Equations (25) and (26) can be solved forfIsi, giving first the simple expression x i = x0/2uxo (28)
By using iterative techniques, Eq. (30) permits the determination offl,, for any pair of q,,N, values, after which all of the calculated values of the f l , i are available via Eq. (29). When tested against each of the 34 programs in Elshoff's database, Eqs. (29) and (30) gave uniformly good agreement, with an average correlation coefficient betweenfl,i andfl,, of 0.96 and an average slope as determined by a least-squares statistical fit of 1.08. The program that had the median value of the correlation coefficient was program number 2, and its analysis is shown in the accompanying Table 111. The mathematical model represented by Eq. (29) should not be confused with a relationship that has been derived, in the usual sense of that term, from more basic relations. Nevertheless, it appears to reflect reality with sufficient fidelity that it might serve to resolve ambiguities in operator counting techniques whenever a new language is to be analyzed. 15. The Relation between q l and q 2 An interesting and occasionally useful relation between the number of unique operators and operands has been experimentally derived and verified. The derivation makes use of the concept of the boundary volume denoted by V**, so it will be defined first. It may be recalled that when the potential volume V* was being defined, it, like the program volume, was taken as the product of length times the logarithm of its vocabulary, but that unlike the case of program volume, the fact that there would be no repetition made the length equal to the vocabulary.
ADVANCES IN SOFTWARE SCIENCE
147
It follows, therefore, that if a particular algorithm is being expressed at constantly increasing values of implementation level there must be a discontinuity in the volume as the level reaches unity, because for all but the potential language the vocabulary-length relation holds. The extent of this discontinuity can readily be evaluated simply by noting the effect that the vocabulary-length equation would have made on the potential volume if it had been applicable. This is achieved by calculating the boundary volume, defined as
v** = (TI* log,.q,* + q,* log, q,*) log,(q,*
+ q2*)
(31)
or, since ql* = 2, as
v** = (2 + q 2 *
lo& q,*) b , ( 2 + q2*)
(32)
In order to derive the relation between q1and q2,we first note that the rate of change of the total vocabulary with respect to its operator component should depend only upon the ratio of boundary volume to potential volume. Setting d7)/dq1= V**N*
(33)
and differentiating 7) =
7)1
+ 72
(34)
gives dqzldq,
=
V**N*
-
1
(35)
which may be integrated between limits
to give, where ql*
=
7)2
2
-
7)2* = [ ( V * * - V * ) / V * ] ( q ,-
?I*)
(37)
Now if we let two single-valued functions of q2* be defined as the auxilliary variables A and B, where A
=
(V**
-
V*)/V*
and B =
?Z*
-
24
Eq. (37) can be expressed simply as 7)z
= A7)l
+B
(38)
M. H. HALSTEAD
148
It is sometimes convenient to express Eq. (40)in terms of the total vocabulary, or 71 = (7) - B)/(A + 1)
(41)
and to simplify Eq. (38) to A
=
7)2*
logz(r)2*/2)42 + q 2 * )
(384
16. The Use of q2* in Prdiction
One of the real but seldom-mentioned pleasures of the scientific community is participation in the small drama that surrounds the demonstration of any counterintuitive scientific experiment to the uninitiated. Any first year physics or chemistry course is replete with such laboratory demonstrations. While the objective is to teach the student, rather than to impress him, a properly designed experiment may occasionally do both without loss of value. In this section we shall first discuss such an experiment in software science as it was reported by Woodfield (1978). It was counterintuitive to the 18 computer science graduate students who each performed the experiment because it demonstrated that more than half a dozen measurable parameters of their first program were determined by the siniple fact that it required four input-output parameters. Further, when the first program was expanded to include one more dimension on input, all of the changes in vocabulary components, length, volume, program level, and potential volume were similarly determined. As Woodfield describes the experiment, the initial assignment consisted of consulting the Algol version of a permutation algorithm, number 286 in the Communications of the Association for Computing Machinery, implementing a similar version in FORTRAN, and then to expand that version in such a way that it could handle one added dimension on input. The initial assignment was then completed by manually tabulating the values of ql,q2,N , , and N 2for both FORTRAN programs. At this point the four basic metrics for each of the 36 FORTRAN versions were tabulated, and a copy issued to each of the participants. They were instructed to take q2* = 4 for the first program, q2* = 5 for the second, and A = 1 for both, and to calculate from q2* and A all of the program properties that could be compared with the observed values. The 36 sets of basic metrics are shown in Table IV. Obtaining the various properties from the given values of q2* and A required the following calculations. V* was first obtained directly from
ADVANCES IN SOFTWARE SCIENCE
149
TABLE IV Original Student 1
2 3 4 5
ql
10 7
8 9
9 9 9 8 9 7 8 9 I1
10
11
10 8 13 7 12 11 9 9 9.89 1.81
9 9 7 8 7 9 8 10 8.72
11
13 9
6 7 8 9 10
11 11
11
12 13 14 I5 16 17 18 Mean S.D.
72
1.18
Expanded
Nl
N,
71
26 23 37 38 25 27 43 24 29 40 29 25 37 24 36 27 27 30 30.39 6.33
32 31 32 33 29 31 36 31 36 35 30 30 20 33 31 30 27 31 31.44 2.48
11 12 I1 13 9 12 ii 9 9 11 9 8 14 8 12 12 11 9 10.61 1.75
7 2
N,
11
37
14
51
11 10 I1 8 II 14 17
46 40 34 30 53 50 58 57 57 49 46 31 38 32 57 40 44.78 9.75
I5 I5 13 13 10 9 11
19 12 12.44 2.83
N2 43 68 40 42 43 39 67 66 63 50 62 55
60 37 42 33 66 44 51.11
12.08
Eq. (6). L was then obtained from Eq. (1 l), allowing V to be obtained from Eq. (7). A and B were obtained from Eqs. (38a) and (39). V, A, and B, and Eqs. (1) and (14), using iterative techniques provided vl,q2, v, and N. Their results for these properties are shown in Table V. Woodfield noted that in all cases the mean values observed agreed with the calculated values within the probable errors. This is a fairly convincing demonstration of the internal consistency of the software equations, and of the central role played by q2* in the programming process. A word of caution is still in order, however, concerning the q2* concept itself. The two algorithms analyzed in the experiment were basically straightforward and simple, and contained no ambiguities. Consequently, the concept of q2* was directly applicable and required no further interpretation. On the other hand, algorithms exist that are not as readily understood. It is known that, for example, a short program containing within itself a large number of information-packed constants will exhibit a value of LV, hence of +j2*,considerably greater than would be expected from a simple count of input/output parameters. Further, there are other areas in which the practice of equating potential volume to the volume of a procedure call
M. H. HALSTEAD
150
TABLEV
7,: From q 2 * , A
Observed q2:From q 2 * , A Observed q: From q2*,A Observed N : From q2*,h Observed V : From q2*,A Observed L : From q2*,A Observed V * : From q2*J Observed LV q2*: From q 2 * , A
Observed V*
Original program
Expanded program
9.27 9.89 t 1.81
10.58 10.61 t 1.75
8.84 8.72 t 1.18
13.09 12.44 -t 2.83
18. I I
18.61 2 1.69
23.67 23.06 t 2.98
57.57 61.83
84.58 95.89 t 20.95
24 1 261
7.72 -t
386 436 t 108
39
0.0585 f 0.0146
0.0509 0.0481 2 0.0117
15.51 14.95 t 3.02
19.65 20.77 t 6.85
0.0645
4 3.84
2
0.76
5 5.21
t
1.58
becomes suspect, and where the proper interpretation of q2* is murky. For example, it is difficult to specify what is meant by a subroutine call on a compiler by an operating system. It is clear, however, that the potential volume of the compiler is not even slightly related to the information supplied to it by the call from the operating system. On the positive side, however, typical application programs do not fall in that class, and for them the equating of potential volume to the volume of a procedure call is apparently sound. 17. Gradlng Student Programs
Although the first book on software science was not published until 1977 (Halstead, 1977a), and it was written as a research monograph rather than a text, elements from this field have already started to trickle downward into the teaching of beginning programming. This is shown, for example, by student exercise 11.7 in Henry Ruston’s recent “Programming with PWI” (1978). This development is perhaps to be expected since there has been a rapidly increasing awareness that programming style has an economic importance in the life cycle of large systems. This has placed a require-
ADVANCES IN SOFTWARE SCIENCE
151
ment for teaching “style” on those who teach first and second year courses in programming. In earlier years, as soon as a student’s assigned program compiled correctly and produced perfect results from test inputs, it was worth the maximum points for that assignment. But this is no longer the case. The program must still be graded for style, and at this point a problem arises. Grading programming style has much in common with grading English essays, but the typical professor of computer science, by his choice of his own career, indicated more personal interest in science or mathematics than the field of the humanities. While even in the hard sciences, the professor prefers a well-written laboratory report or problem set, “scientific objectivity” often suggests that it be graded solely on accuracy of results. Even scientific papers are judged on content, not on style, and to describe a colleague’s paper as “well written” may suggest a lack of important content. Consequently, anything that offers an objective, even automatable, method of grading the programming style of students is of immediate interest. If this technique also provides, as a side effect, an automatic method for the detection of copying, then that interest is certainly strengthened. But even though, and perhaps especially because, interest in such a tool can be expected to be high, it becomes important to have some assurance that it would in fact measure those attributes of student programs in such a way that its results are in general agreement with those of a human grader, not only in principle, but in practice as well. This obviously calls for a parallel study, in which the grades assigned by manual methods can be compared with the corresponding software metrics. It would be too late, however, to apply such a test to a class that was already being taught specifically about the six impurity classes and how to avoid them, how to measure and lower the volume, or how to measure and raise the implementation level of their programs. For such a class, it is obvious that an automatic software grader would measure only the extent to which the class was absorbing the criteria already built into the grader. We would be accepting as given the unanswered question of whether or not those criteria were adequate. This problem was nicely avoided in a study reported by Shen (19781, and his results provide some encouragement. Professor Shen’s study covered the second course in computer programming at Purdue University (CS320), in which the main emphasis is on good style. To quote from his report, “Our approach is to teach the concept of structured program development using D-charts [ 1,2], reinforced by examples taken from a book on programming style [3]. Students are asked to work out problems in FORTRAN and PASCAL, which are then graded on both correctness
152
M. H. HALSTEAD
and style. We hope that the students receiving better grades in CS320 are better programmers.” After the course was completed and the manually determined numerical grades had been posted, the final program written during the course was available in machine-readable form for 31 of the students. These 31 versions, each designed and implemented by the student to accomplish the same task, were then processed by a software analyzer, which obtained the four basic metrics and various calculable properties. It was expected that certain of these properties should show general agreement with the course grades, while others should not, and these expectations were confirmed. Because the presence of impurities will cause the actual length to deviate from the length predicted by the vocabulary-length equation in either direction, Shen calculated the absolute relative error IN - N l / N and found that it decreased as grades increased. The coefficient of correlation was -0.46, which was significant at the 1% level. He also calculated the language level A for this one program from each student, and found that its correlation of 0.4 was significant at the 2.5% level. Using both relative error in length and language level and performing a multiple correlation with the student grades, he found a correlation coefficient of 0.56, which was significant at the 4% level. Shen also made the interesting observation that there was one outstanding student, whose grade of 99 was 4% above the next highest (the low was 48%), and that his program was also ranked best by the software analysis. Combining these findings provides some confidence that an automatic system of student programmer evaluation would not introduce a danger that good programming style as evaluated by human graders would go unrewarded. Perhaps an even more important corollary, however, lies in the evidence that the rather ad hoc methods used to teach programming style in the course studied were also successful. If it were actually true that the software analyzer “knows” what good programming is, then the students might learn the same thing more easily if they were permitted to study the analyzer instead of the ad hoc examples. But considering a software analyzer only as a grading device and not as a teaching tool, Karl Ottenstein has demonstrated (1976) that it has an intriguing capability to measure the similarity of implementation of student assignments. In fact, given machine-readable versions of a single assigned problem for a large class, it can determine the statistical likelihood that any two are independent. For all four of the basic metrics of two versions to be identical is quite unlikely, even if each of the metrics is near its mean for the sample. But if all four measures are near the tails of the distribution, yet identical for both programs, the probability that they represent independent samples becomes vanishingly small. Naturally, the
ADVANCES IN SOFTWARE SCIENCE
153
data it obtains remain the same even when some or even all of the operand names are altered, a device that frequently escapes the human grader. 18. Semantic Partitioning
In an interesting article in Acta Znformatica, Paul Zislis (1975) offered a most ingenious method for preparing large programs for testing. Essentially, the Zislis method consists of the automatic separation of a program into its meaningful parts. It uses the standard control flow analysis of graph theory and the determination of which operands are live in each block as a point of departure, and then combines blocks into segments or partitions. The criterion for a partition is that it completely contain at least one temporary variable. Because such a temporary variable is never used in any other partition, its value does not need to be tested at the entry or exit of the block during program validation runs. During the development of the method, Zislis found that in their use of temporary operands many programs contained impurities of class I1 (ambiguous operands). Loop counters, for example, frequently were given the same operand name in completely unrelated loops. This initially prevented the isolation of many smaller partitions. The problem was overcome and the impurity removed by automatic renaming of temporary variables when they were redefined. The results were highly successful in isolating the test points (partition exits) at which the fewest number of operands needed to be examined, but they were even more striking in several other, perhaps unexpected ways. First, Zislis quickly noted that the groups of blocks into which a program was partitioned each appeared to have a logical coherence or a rational unity that was much deeper than could possibly have been expected from control flow analysis alone. Subjectively, it was obvious that the method was actually isolating the meaningful or semantic partitions of the programs it was analyzing. Now, in essence, semantic partitioning implies automatic abstracting to higher level, where each partition represents what would have been a single statement, had the source language contained the one desired. At that time it seemed counterintuitive that this result could be achieved without manual intervention. Consequently, it could not be accepted on subjective evidence alone, but required objective support as well. Zislis was able to provide this in an ingenious and straightforward way. He reasoned that among the FORTRAN programs in the computer center library some would be well documented, and that a well-documented program should have its significant comments preceding logical entities. He defined a significant comment as two adjacent com-
154
M. H. HALSTEAD
ment card images. Using a large number of programs, he then tabulated the number of times that a programmer’s significant comment was followed immediately by the start of a new partition. The objective test was convincing, with 87% of the significant comments marking partition boundaries. When the requirement for a significant comment was raised from two cards to three, the percentage increased to 93. Consequently, partitions obtained with Zislis’s method were rightfully called semantic. Considering the process further, from the point of view of the volumes of the modules it produces, reveals another interesting property. The sum of the lengths of the partitions of a program must, of course, be equal to the length of the program. But every partition must contain at least one operand that is absent from all other partitions. Consequently, the sum of the vocabularies, if divided by the number of partitions, must be less than the vocabulary of the program. Not only must this average partition vocabulary be less than the total program’s vocabulary, it must also be the minimum possible average vocabulary achievable by any method of division. In other words, if any partition found with the Zislis method could be further subdivided in such a way that the subdivisions contained fewer operands, then the method itself would have performed the subdivision. Consequently, it can be seen that semantic partitioning must minimize the sum of the modular volumes of a program. This may provide yet another approach to the interesting problem of modularity and its effect on computer programming, as well as another link between the elements of software science and the human thought processes. On the more frivolous side, it has also been suggested that a software analyzer, coupled with a semantic partitioner, would be capable of providing a programmer working at an on-line cathode ray tube with a number of items of pertinent (or impertinent) information, including requests for missing comments between partitions, as well as running means of the language level he was using, and perhaps the number of errors he might expect in his work. In fact, in light of the material discussed in the next section on technical English, it is not even inconceivable that it could also calculate the potential volume of any comment he did include, compare it with the potential volume of the partition to which it applied, and suggest changes where the discrepancy exceeded preset criteria. Fortunately, just because a thing is possible in principle does not establish that it is possible in practice, and 1984 is still a few years away. 19. Technical English
It was the pioneering work of Kulm (1974) that established the fact that the methods and equations of software science could be applied to English
ADVANCES IN SOFTWARE SCIENCE
155
prose as readily as to computer programs. As a specialist in mathematical education, Kulm became interested in the area when he calculated the potential volume of an equation in a grade school mathematics text, and found that it was nearly the same as his calculation of the potential volume of the accompanying prose version of the same relation. There had been earlier attempts to handle English with the technique of software, but these had been abortive, primarily because it was not clear before Kulm's work how the concept of operators and operands applied to prose. He was able to show, however, that a 1958 paper by Miller et al. (1958) contained the essential part of the solution. In that paper, words were divided into two classes, called function words and content words. The function words are, in general, all of those words that are classified grammatically as articles, pronouns, prepositions, conjunctions, or auxiliary verbs. All of the others are counted as content words, with the exception of numbers, about which Miller expressed some uncertainty. This uncertainty can be resolved by counting numbers that have names, such as one, two, twelve, and so forth as function words, and numbers without individual names, such as seventythree, 1978, and so forth as content words. For most prose, the treatment of numbers is immaterial, but for highly technical material, and especially for outlines, the rule adopted may have a measurable effect. Miller et ul. pointed out that both syntactically and statistically content words are different from function words. They even noted that the meaning of a content word may change with time, but the meaning of a function word seldom does. In support of their analysis, they presented a reasonably exhaustive list of the 363 function words in English. Kulm reasoned that the content words must be equivalent to operands, and that the function words are operators. To the list of 363 operators, one must also add, of course, the punctuation symbols, font changes, capitalization, and paragraphing. With these unambiguous and objective definitions, the major problem of operator-operand identification is resolved, and English prose can be treated in the same way as material written in computer language. There is one important difference, however, and it is related to impurity classes. In a computer program, the use of two operand names to reference the same value is a class I11 impurity, and to be avoided. In English, the use of a synonym is not necessarily redundant, but instead may serve to sharpen the particular meaning of the word it replaces. And whether it does or not, it is a preferred practice, rather than an impurity to be avoided. Consequently, we should expect that the observed vocabulary will appear larger than the "true" vocabulary, where the latter refers to the number of different items actually being discussed. In a subjective way
156
M. H. HALSTEAD
this effect can be accounted for in any particular passage by noting which words are being used as synonyms for others in the passage. But because it will usually depend on context, such an approach is both prohibitively laborious and experimentally unacceptable. In like fashion, both the singular and the plural form of a single word refer to the same word, rather than to two unique concepts. Variations in tense show a similar effect. Using a mechanized counting method in which every unique string of letters contributes to the vocabulary will therefore produce a result that is not precisely the same in all respects for prose and programs. In terms of volume or effort, the gross vocabulary for prose should be strictly comparable to q for programs, and no correction should be required. In considering either the vocabulary-length relation or the relation between q,, and q2,on the other hand, the net vocabulary of a natural language text is applicable. In practice it has been found that the gross vocabulary or the count of unique strings of letters (or punctuation symbols), and the number of conceptually unique words or the net vocabulary are closely related. The net vocabulary q' can be obtained from the gross vocabulary r ) by multiplying by 0.40. Curiously, this is the same value that would result if each of the three sources synonyms, plurals, and tenses had a 50% probability of occurrence [for that case 1/(1 + 4 + 3 + 3) = 0.401. The fact that both the vocabulary-length equation and the relation between the operator component and the operand component of vocabulary agree as well for English prose as for computer programs is of sufficient importance that it will be illustrated here, from a study detailed by Halstead (1977a). In that study the first 12 of a series of abstracts published by the American Geophysical Union were analyzed individually, and values of q,,q2 N , , andN, were tabulated. Values of the net vocabulary components were obtained from the observed components by multiplying by 0.40, and used in the vocabulary-length equation to obtain N. This was then compared with the observed length N = N , + N 2with the results shown in Table VI. The calculated coefficient of correlation between the observed lengths and the lengths calculated solely from the vocabulary components is 0.997 for this sample, and the slope of the least-squares-fit is 0.94, a rather incredible result. Using the crude approximation A = I , it is possible to calculate both components of the vocabulary from the length alone. (See the section on the relation between q 1and q2.)For the same 12 technical abstracts, these calculations agreed with the measured net vocabulary components as shown in Table VII.
ADVANCES IN SOFTWARE SCIENCE
157
TABLEVI Abstract 1
2 3 4 5 6 7 8 9 10 11
12 Sum Mean
N
N
86 211 I63 167 259 222 275 319 46 1 915 23 I 32 1 3630 302
84 230 I57 179 26 1 222 306 294 464 874 253 315 3639 303
Here again the correlation coefficient is extremely high, r = 0.995, and the line of regression has an intercept of -0.5 and a slope of 0.996. The virtually perfect agreement between theory and observation when the sample is analyzed in this way has a considerably deeper significance than that shown by the vocabulary-length agreement, because these calculations involve the potential volume. In other words, to obtain the fraction TABLEVII Net vocabulary Calculated from length Abstract
Operators
1
11
2 3 4 5 6 7 8 9 10
16 14 14 I8 16 19 18 24 35 17 19
11 12
Sum Mean
22 I 18
Operands
Observed Operators
13 34 23 27 37 32 43 42 62 107 36 44
10 16 14 12 16 14 16 18 22 39 14 20
500 42
21 1 18
Operands 14 30 24 26 38 34 40 45 61 105
34 43 494 41
158
M. H. HALSTEAD
of the vocabulary attributable to the operator component required the calculation of V* as an essential step in the process. Consequently, this experiment provides important support to the idea that an English passage actually has a potential volume. This now appears to be a reasonable assumption, despite the fact that the related concept of a procedure call, function, or subroutine does not exist for natural language. The implications of this hypothesis will be explored further in later sections. 20. Learning and Mastery
In the section on clarity, it was shown that the number of elementary mental discriminations (e.m.d.) required to understand a piece of material is given by E . Now a program that has been understood once will require fewer discriminations to understand if encountered a second time, despite the fact that its measurable value of E is of course unchanged. We may represent this second encounter by 1 x E , where 1 is a learning factor. Because E = V / L , this process is equivalent to the effort to understand initially a program with a lower volume and higher level. Upon a third encounter, the effort should be further reduced, to 1%. But the process cannot continue indefinitely, because that would imply, for any value of 1 < I , that the point would eventually be reached at which understanding would require less than one quanta of mental effort. Now as the effective value of V decreases and the effective value of L increases, L must have an upper limit of one. Because the product of LV is constant and equal to V * , the lower limit of V and, hence, also on E must also be V * . After some number of iterations n, no further decrease in effort can be expected. All of the details of the program have been mentally reduced to its basic concepts, hence it has been completely mastered. The number of iterations required to attain mastery can be obtained from the relation
/“-‘E
=
(42)
V*
and the time required for mastery is given by i=n
1 1-I
li5T -
T M =T i=O
(43)
Over most of the range of interest, the difference between the sum and its limit is insignificant, and there is some evidence that I = 3, hence, rather closely T M = 3T
(44)
ADVANCES IN SOFTWARE SCIENCE
159
The broad generality, and in some measure the depth of software science is demonstrated by the extent to which it can be seen to explain phenomena in areas considerably removed from the field in which it originated. It is difficult to imagine a question farther from the original field than “What is the rate at which meaningless syllables can be learned?” Thus, it is interesting to see how easily and precisely this question can be answered with the material of previous sections. In a classical psychological experiment on the measurement of rote leaning, Bjorkman (1958) used 150 ten-year old subjects. Their task was to learn that a specific meaningless syllable was the “correct” response to the preceding syllable in a list, as each syllable in turn was presented for a fixed interval of time. This exposure time was 2.1 sec with 0.1 sec between exposures, or T = 2.2 sec. There were seven syllables in the list, and since there was no response for the last syllable, six correct responses were possible each time the list was repeated. During the first trial (f = l ) , of course, there could be no correct responses since the subjects had not previously been exposed to them. The list was repeated 14 times, and the average fraction of correct responses to each syllable for each trial was reported in Table 9 of Bjorkman (1958). Summing the six fractions for each trial gives the average number of correct responses (R,)for each trial ( r ) shown in Table VIII. Given the task of learning that a specific syllable is the “correct” response to another syllable, we wish to predict the number of correct responses to be expected after any measured interval of time, as a function of the number of syllables to be learned. Basically, this will require that we know the following four things: (1) How many elementary mental discriminations (E) will be required to understand the material; (2) the rate ( S ) at which elementary mental discriminations are made; (3) the relation between the time required to understand material (T) and the time required to master it (Td;and (4) how much time has been available for mastery before any given trial. With this information, the fraction of correct responses to total responses is simply the available time divided by the required time. From Eq. (43) combined with Eq. (18d), we have
where 1 = 2/3, S = 18 e.m.d./sec, and A = 2.16. In order to apply Eq. (45) to this problem, we must evaluate it serially for each response. For the first, there is one input and one output, or qz* = 2. For the next position, q2* must increase by one, and so on. Denoting the
160
M. H. HALSTEAD TABLEVIII Correct responses
Observed R ,
Theory R ,
7 8 9 10 11 12 13 14
0.00 0.17 0.44 0.87 1.29 1.59 2.01 2.31 2.47 2.68 3.09 3.11 3.37 3.39
0.00 0.32 0.64 0.96 1.28 1.60 1.92 2.23 2.55 2.79 2.99 3.19 3.39 3.56
Sum
26.79
Trial r 1
2 3 4 5 6
r rz
27.42 0.997 0.994
position by p , where p goes from one to the number of responses P in the list, this can be stated mathematically as
Now if we divide the time available to master each syllable by the time required to master it, we should obtain the fraction of correct responses (noting that the fraction can not be allowed to exceed one). For the first syllable, this time should be T = (2.1 + 0.1) sec times the number of times it has been exposed, or (t - 1). For the second syllable, the available time should be 2~ times the number of preceding trials, and so on. The available time for any position p at any trial t can therefore be expressed mathematically as T ~ ,= DP (t -
1)T
(47)
Defining R I as the correct number of responses during the rth trial, we have
ADVANCES IN SOFTWARE SCIENCE
161
where the minimum must be taken because no more than one correct response can be made for a single syllable, no matter how much time has been available for its mastery. Substituting Eq. (46) and (47) in (48), and setting P = 6 gives the values shown in Table VJII. For this case at least, the basic software relations provided the “right answer” to a rather “far out” and difficult question. 21. Text File Compression
Using the vocabulary-length relation, Shen and Halstead ( 1978) have calculated theoretically the compression ratios to be expected for files of PL/I and for files of English prose, and compared them with the values obtained theoretically by Rubin (1976). Rubin defined a compression ratio (CR) in terms of the lengths of input and output strings and the required output representation CR
=
len(input string) - [len(output string) - len(output rep)] len(input string)
(49)
He reported observations of the compression ratio achieved by some 22 variations of five basic algorithms on a file of 35,040 characters of PL/I source input as 80 character records, and a file of 29,305 characters of English prose. The highest values of CR for each of Rubin’s five basic algorithms is given below. Shen and Halstead noted that from software theory, the length of the output string should be the volume V in bits, and that V can be obtained from the length N with the vocabulary-length relation, Eq. (15). The length of the input string, however, must be the product of N times the number of bits per item or word in uncompressed form, denoted by B,. Similarly, the length of the output representation must be the vocabulary length times B,. Consequently, the maximum compression ratio can be expressed as
which simplifies to
For the PL/I file, the number of statements is given by the number of records, o r P = 35,040/80 = 438. Using the approximation N = 7 S P , the value of N is 3300, implying 7) = 430. The value of B, is obtained by dividing the number of bits per record, 8 x 80, by the same 7.5 items per statement. Equation (51) then yields 0.77 for PUI.
M. H. HALSTEAD
162
For the English text there should be five characters per word, hence B , = 5 x 8 = 40, and the file should contain N = 29,305/5 = 5861 words, implying q = 694, yielding CR = 0.65. The value of q obtained from N with Eq. (15) neglects the effect of redundancy in natural language discussed in Section 19. In order to correct for this, the vocabulary must be multiplied by 2.5, which lowers the value calculated by Eq. (51) only slightly, to 0.63. Comparing the text file compression results for each of Rubin’s five basic algorithms with the theoretical maximum gives, in percentages, PWI Prose
I
I1
73.1 53.7
72.5 57.3
111 73.5 55.2
IV
v
CR,,,
74.5 56.6
73.6 58.1
77 63
Shen and Halstead were able to conclude that, since both of Rubin’s two best algorithms were achieving more than 90% of the compression possible, any attempt to find more powerful compression techniques would be wasted effort. The results of the preceding analysis, while not of overwhelming importance in themselves, serve to emphasize the limited but highly useful role of science in engineering. Just as the equations of thermodynamics provide no blueprint for the design or construction of an efficient steam engine, the equations of software science provide no specifications for a compression algorithm. But instead, just as thermodynamics permits the engineer to calculate the maximum efficiency that could be reached by the best possible engine working between two specified temperatures, software science permits the software engineer to calculate the maximum efficiency that could be attained by the best possible compression algorithm working between two specified languages.
22. Top-Down Deslgn in Prose In their recent paper, Comer and Halstead (1978) employed the software relations to explain the results of two timed experiments in outlining and writing tutorial papers. They reasoned that quantitative experiments in top-down design are urgently needed but that the subsequent reproduction of an experiment in an independent laboratory would be prohibitively expensive if the implementation of a large programmed system was chosen as the test vehicle. They suggested instead that many of the same design processes could be studied in the same way if the implementing and writing of the ubiquitous technical paper were substituted for the programming project. That way, other investigators could rather easily and inexpensively determine how closely the experiments could be reproduced.
163
ADVANCES IN SOFTWARE SCIENCE
In order to conform to the software relation for mental effort, which is intended to apply only to programs being implemented by a programmer who is working from a problem statement he understands, the class of technical papers was further limited to tutorial papers. When the test vehicle is restricted to tutorial papers that require no additional research or library work on the part of the author, then he too can be said to “understand the problem statement” as soon as he has a title. Then the principles of top-down design can be applied, a system of outlining can be adopted, and the entire process can be quantitatively studied. Both authors performed one complete experiment using the same controlled conditions. The first involved a research tutorial (Halstead, 1977), while the second used a paper explaining a programming system (Comer, 1978). The conditions for the experiments stipulated that a title be formulated, an initial five-point outline be derived from the title, and that this outline be further expanded by subdividing each point five ways. The result, a hierarchy of five-tuples, reflected the top-down methodology used to construct it. After smoothing the five-by-five outline, the author formulated his draft version of his paper using it. During each stage of the design, and for each hand-written page of the “implementation” phase, the author recorded the time spent to the nearest minute or closer. After final completion of the paper, each timed unit was analyzed for values of the basic software parameters. Their raw data, consisting of both components of the gross (not net) vocabularies and lengths, and the observed times were as follows: For experiment 1, Part 1. Title
2. Five-point outline 3. Five-by-five-pointoutline
4. Final outline
T(min) I 9.3 I1 4.3 6.7 4 2.2 21 11
5. Draft paper
21 25 17 13 16 11 28 31 10
9;
9;
N,
N2
4 25 35 12 20 26 9 24 24 27 27 20 20 22 23 37 23 36
5 21 31 17 19 19 8 53 22 32 42 31 33 36 25 43 53 45
10 53 96 42 39 92 26 160 70 47 223 80 52 62 61 83 67 19
6 28 49 22 23 31 8 69 29 39 67 52 47 43 28 54 65 63
(corlfinuad)
164
M. H. HALSTEAD
16 22 II 20 14 21 14 I5 I5 12
40 44 39 31 28 37 20 29 27 30 37 39 34 31 23 22
11
17 I5 17 10
6. Draft abstract
13
47 46 27 49 52 47 33 37 43 30 34 35 45 45 20 31
69 I22 I20 73 81 84 48 56 62 63 61 91 64 81 44 48
53 72 45 57 55 52 39 43 46 31 37 52 51 58 26 41
The comparable raw data for the second experiment were as follows: Part 1. Title page 2. Five-point outline 3. Five-by-five-point outline 4. Draft paper Page 1 L
3 4 5 6 7 8 9 10
I1 12 13 14 15 16 17 18
I75 I60 855
8 II 29
1095 lo00 1330 790 840 I095 745 940 740 605 210 720 735 I I60 945 420 405 70
35 34 38 35 36 40 30 35 37 37 22 35 38 47 40 34 25 11
20 67
30 27 1 I7
85
57 64 58 51 53 56 53 43 63 65 22 47 52 57 54 33 52 12
87 94 112 86 85 95 91 74 85 92 34 81 112 98 99 53 69 12
73 79 75 65 63 67 70 54 72 75 23 62 72 66 69 39 64 12
10
20 10
ADVANCES IN SOFTWARE SCIENCE
Testing the vocabulary-length
165
relation on the raw data (with
q = 0.4qf), the relative errors (N - N ) / N were 7.3% and 3.0% for the
prose pages in experiments 1 and 2, respectively, indicating that both sets of test material might be expected to conform reasonably well to other software relations. Comer then hypothesized that the prose resulting from a strict topdown design should exhibit some degree of modularity, and went on to suggest that this modularity should, in one sense at least, conform to an optimum. In defining this optimum he followed the same reasoning used by Ottenstein, as described previously in Section 12, to the effect that psychological chunking in the human brain corresponds to a specific potential volume of 24. Setting the potential volume of the ideal module as VM* = 24, and A = 1 , the volume of the ideal module can be calculated as V M = (VM*)*/A
=
576
(52)
Since for this module the value of q2* is known to be six, thus providing the relation between q 1and q2,it is possible to obtain the length precisely from the volume, or N M= 117.6. Similarly, the implementation time for the ideal module should be TM= (VM*)3/(A2S)= 768 sec
(53)
It should be noted that neither N M= 117.6 nor T M= 768 seconds depend upon the results of either e,xperiment, but only upon the hypothesis that any top-down design should produce optimum modules. From the specific design followed, however, it can be postulated that for the total paper, the 5 x 5 point outline should result in an q2* = 25, or V* = 128.4. Using the same exact method employed in finding N Mfrom VM*gives the length corresponding to q2* = 25 as N = 2034. Because the total length must be the sum of its parts, the number of optimum modules M to be expected should be simply the ceiling of
M
=
wv,i
=
r2034/117.61 = 117.31 = 18
(54)
As can be confirmed from the raw data, for both papers just 18 clocked intervals were devoted to the writing phase. Of greater interest, perhaps, is the fact that the draft version of the first paper required 299 min, and the second 231 min, whereas the product M T Mas obtained from the hypothesis is 13,824 sec or 230 min. As further support for the modularity hypothesis, it was noted that the average length of the 18 modules of the first paper was 124.8 ? 29.6, and of the second 142.2 f 43.3, both within their probable errors of the hypothetical N = 117.3.
M. H. HALSTEAD
It should also be pointed out that neither of the draft versions analyzed represented “typist-ready’’ copy. To reach that stage required more effort, the equivalent of the programmer’s debug runs, and data on that phase were not included in the experimental report. There are far more questions unanswered than answered by these two experiments, and the unanswered ones are both more interesting and more important. The implication is clear, however, that the quantitative study of any methodology suggested for the design process is not only possible, but potentially useful.
23. Conclusions An attempt has been made in the preceding sections to present a summary or overview of the present status of software science. The serious student is urged to consult the more detailed reference by Halstead (1977), or better yet, the more recent papers listed in the reference section. But even more knowledge is to be gained by actual, personal experiments. It is known that intellectual skepticism is healthy, and that the student of chemistry or physics must have laboratory experience before he can possibly be said to understand his field. Unlike mathematics, in which a student can readily accept a theorem once he has seen a proof, the natural or experimental sciences are not based on theorems. Instead, they are based on hypotheses, about which one can only say that they have not yet been shown to be false. But because there is no such thing as “ultimate truth” in the natural sciences, any known relation is subject to modification and improvement as knowledge is gained. This should not be allowed to minimize the power of the experimental scientific method, however, for if all things could be understood by the application of pure reason, without recourse to experimental verification, the ancient Greeks would have reached the moon. Despite the fact that there are no theorems, and perhaps never can be any, in the field of software science, one basic attribute shared by the equations in this field has become quite noticeable. This is the total and complete lack of arbitrary constants or unknown coefficients among the basic equations. To this might also be added their utter simplicity. For example, and this may serve as a summary as well as a demonstration of that simplicity and freedom from arbitrary constants, the following 15 equations can be expressed with only two properties on the right, starting with the two elements, unique operators and operands. Operator length
r),
Nl
=
2
i= 1
fl,*
ADVANCES IN SOFTWARE SCIENCE
167
Operand length i=l
Vocabulary 7=
+
r)l
92
Length N = N 1+ N, Expected length N
= 7)
V
=
log,(7)/2)
Volume
N log,
7)
Potential volume v*
= 7)*
log,
r)*
Program level
L
=
v*/v
A
=
LV*
E
=
V/L
T
=
EIS
Language level
Effort (e.m.d.1
Time where S = 18 e.m.d./sec, as measured in psychology. Expected errors
B
=
VIE,
where E, = 3000, as obtained from q2* = 6, taken from psychology’s chunking hypothesis. Frequencyf,,i of the ith most frequent operator
xi = x, 10g2(x,/xi) where
Xi
=
log,f1,i
168
M. H. HALSTEAD
Boundary volume
v** = q* l0&(7*/2) log, q* Rate of change of operators To these may be added the slightly more complex program level relation, still without introducing unknown constants. Program level where q,* = 2. While there are many other, more complex forms of the relations shown above, and other more complex relations discussed in the preceding sections, these demonstrate the simplicity of the structure presently underlying the theory of software, and with it the apparent elegance of that struct ure . It cannot yet be predicted what impact this discipline may be expected to have on the field of software engineering, or of computer programming in general in the future, but perhaps its greatest impact may result from one conclusion that seems inescapable. This conclusion is that natural laws govern language and the mental activity of using it far more strictly than most of us previously recognized. REFERENCES Akiyama, F. (1971). An example of software system debugging. Proc. Inr. Fed. lnf. Process. SOC.Congr. pp. 353-359. Baker, A. L., and Zweben, S.H. (1978). The use of software science in evaluating modularity concepts. Tech. Rep. (July), Department of Computer and Information Sciences, Ohio State University, Columbus; also appears in IEEE Trans. Sofrware Eng., SE-5(2). March 1979, 110-120. Bayer, R. (l973a). A theoretical study of Halstead’s software phenomena. Proc. Assoc. Cornput. Mach. Nail. Cot$ pp. 126135. Bayer, R. (1973b). On program volume and program modularization. Tech. Rep. 105, Department of Computer Science, Purdue University, Lafayette, Indiana. Bell, D. E., and Sullivan, J. E. (1974). Further investigations into the complexity of software. Tech. Rep. 2874-11. Mitre, Bedford, Massachusetts. Bjorkman, M. (1958). “Measurement of Learning. A Study of Verbal Rote Learning.” Almquist & Wiksel, Stockholm. Bohrer, R. (1975). Halstead’s criterion and statistical algorithms. Proc. Compur. Sci. Srat. Interface Symp., 8th, pp. 262-266. Bulut, N. (1973). Invariant properties of algorithms. Ph.D. Thesis, Purdue University. Bulut, N., and Halstead, M. H. (1974a). Impurities found in algorithm implementations. Assoc. Comput. Mach. SIGPLAN Nor. 9 (3), 9-12. Bulut, N., Halstead, M. H., and Bayer, R. (1974b). Experimental validation of a structural property of FORTRAN algorithms. Proc. Assoc. Comput. Mach. Natl. Con$ pp. 207211.
ADVANCES IN SOFTWARE SCIENCE
169
Comer, D. (1978). MAP: a PASCAL preprocessor for large program development. Tech. Rep. 262, Department of Computer Science, Purdue University, Lafayette, Indiana. Comer, D., and Halstead, M. H. (1978). A simple experiment in top-down design. Tech. Rep. 292, Computer Science Department, Purdue University, Lafayette, Indiana; also appears in IEEE Trans. Software Eng., SE-5 (2), March 1979, 105-109. Cornell, L., and Ottenstein, K. J. (1976). Further investigations into a software science relationship. Conf. Proc. CMG-VII, Atlanta, pp. 195-198. Cornell, L., Schneider, V. B., and Halstead, M. H. (1977). Predicting the number of bugs expected in a program module. Tech. Rep. 205, Computer Science Department, Purdue University, Lafayette, Indiana. Elci, A. (1975a). Factors affecting the program size of control functions of operating systems. Ph.D. Thesis, Purdue University. Elci, A. (1975b). The dependence of operating system size upon allocatable resources. Tech. Rep. 175, Department of Computer Science, Purdue University, Lafayette, Indiana. Elshoff, J. L. (1976). Measuring commercial P U I programs using Halstead’s criteria. Assoc. Cornput. Mach. SIGPLAN Not. 11 (S), 38-46. Elshoff, J. L. (1977). Studies of software physics using P U I computer programs. Res. Rep. 2444, General Motors Research Lab. Elshoff, J. L. (1978a). An investigation into the effect of the counting method used on software science measurements. Assoc. Comput. Mach. SIGPLAN Not. 13 (2), 30-45. Elshoff, J. L. (1978b). A study of the structural composition of PL/I programs. Assoc. Comput. Mach. SIGPLAN Not. 13 (6), 29-37. Elshoff, J. L., Halstead, M. H., and Gordon, R. D. (1976). On software physics andGM P U I programs. Res. Rep. 2175, General Motors Research Lab. Fitzsimmons, A., and Love, T. (1977). A review and critique of Halstead’s theory of software science. Proc. IEEE Int. Conf. Cybern. Soc., 7th. Fitzsimmons, A., and Love, T. (1978). A reivew and evaluation of software science. Assoc. Compur. Mach. Cornput. Surv. 10 (l), 3-18. Funami, Y.,and Halstead, M. H. (1976). A software physics analysis of Akiyama’s debugging data. I n “Proceedings of MRI XXIV International Symposium on Computer Software Engineering,” pp. 133-138. Polytechnic Press, New York. Gordon, R. D. (1977). A measure of mental effort related to program clarity. Ph.D. Thesis, Purdue University. Gordon, R. D. (1978a). Measuring improvements in program clarity. Tech. Rep. 268, Department of Computer Science, Purdue University Lafayette, Indiana; also appears in IEEE Trans. Software Eng., SE-5 (2), March 1979, 79-90. Gordon, R. D. (1978b). A qualitative justification for a measure of program clarity. Tech. Rep. 269, Department of Computer Science, Purdue University, Lafayette, Indiana; also appears in IEEE Trans. Software Eng., SE-5 (2). March 1979, 121-128. Gordon, R. D., and Halstead, M. H. (1976). An experiment comparing FORTRAN programming times with the software hypothesis. Proc. Am. Fed. Inf. Process. Soc. Natl. Compur. Conf., pp. 935-938. Halstead, M. H. (1972a). Natural laws controlling algorithm structure? Assoc. Cornput. Mach. SIGPLAN Nor. I (2), 19-26. Halstead, M. H. (1972b). A theoretical relationship between mental work and machine language programming. Tech. Rep. 67, Department of Computer Science, Purdue University, Lafayette, Indiana. (See Halstead and Bayer, 1973.) Halstead, M. H. (l973a). An experimental determination of the ‘purity’ o f a trivial algorithm. Assoc. Cornput. Mach. SIGME Perf. Eval. Rev. 2 (l), 10-15. Halstead, M. H. (l973b). Language level, a missing concept in information theory. Assoc. Comput. Mach. SIGME Perf. Eval. Rev. 2 ( I ) , 7-9. Halstead, M. H. (1974). A software physics comparison of a sample program in DSL ALPHA and COBOL. Res. Rep. RJ-1410, IBM Research Lab., San Jose, California.
170
M, H. HALSTEAD
Halstead, M. H. (1975a). Software physics: basic principles. Res. Rep. RJ-1582, IBM Research Lab., San Jose, California. Halstead, M. H. (1975b). Toward a theoretical basis for estimating programming effort. Assoc. Comput. Mach. Narl. Conf. Proc. pp. 222-224. Halstead, M. H. (1976a). Using the methodology of natural science to understand software. A m . Fed. lnf. Process. Soc. Natl. Compur. Conf. Halstead, M. H. (1976b). The essential design criterion of computer languages: software science. A m . Fed. lnf. Process. Soc. Natl. Compur. Conf. Halstead, M. H. ( 1977a). “Elements of Software Science.” Elsevier North-Holland, New York. Halstead, M. H. (1977b) Potential contribution of software science to software reliability. Tech. Rep. 229, Department of Computer Science, Purdue University, Lafayette, Indiana. Halstead, M. H. (1977~).A quantitative connection between computer programs and technical prose. IEEE COMPCON 77 Dig. pp. 332-335. Halstead, M. H. (l977d). Potential impacts of software science on software life cycle management. In “Software Phenomenology,” pp. 385-400. U.S. Army Institute for Research in Management Information and Computer Science. Halstead, M. H. (1977e). On lines of code and programming productivity. IEM SYS.J . 6 (4). 421-423. Halstead, M. H. (1977f’). A software science analysis of the writing of a technical paper. Tech. Rep. 242, Department of Computer Science, Purdue University, Lafayette, Indiana. Halstead, M. H. (1978a). Software science. In “Proceedings of the Second Annual Workshop on Software Life Cycle Management.” U.S. Army Institute for Research on Management Information and Computer Science, Atlanta, 174-179. Halstead, M. H. ( 1978b). Management prediction-can software science help? Proc. COMPSAC 78, IEEE Catalogue Number 78-CH1338-3C, 126-128. Halstead, M. H. (1979). Software Science. In “Encyclopedia of Computer Science and Technology” (J. Belzer, ed.), Vol. 13, 242-262. Decker, New York. Halstead, M. H., and Bayer, R. (1973). Algorithm dynamics. Assoc. Compur. Much. Nutl. Conf. Proc. pp. 126-135. Halstead, M. H., Uber, G. T., and Gielow, K. R. (1967). An algorithmic search procedure for program generation. Proc. A m . Fed. lnf. Process. Soc. Nutl. Comput. Conf. 30, 657-662. Harvill, J. B. (1978a). A method of programming language comparison using Halstead’s software science. Tech. Rep. (July), Department of Computer Science, North Texas State University, Denton. Harvill, J. B. (3978b). Functional Parallelism in an operand state saving computer. Proc. Annu. Workshop Comput. Archir. Non-Num. Process., 51h, pp. 10/1-10/8. Harvill, J. B., and Nylin, W. C., Jr. (1975). Multiple tense programming, a new concept for program complexity reduction. Tech. Rep. 75021, Department of Computer Science, Southern Methodist University, Dallas, Texas. Hunter, L., and Ingojo, J. C., (1977). Conservation of software science parameters across modularization. Assoc. Compur. Much. Nurl. Conf. Proc., pp. 189-194. Ingojo, J. C. (1975). Modularization in the pilot compiler and its effect on the length. Tech. Rep. 169, Department of Computer Science, Purdue University, Lafayette, Indiana. Kennedy, D. and Bruning, R. (1974). Children’s descriptions of complex objects. Tech’. Rep., Division of Distributive Education, University of Nebraska, Lincoln. Klobert, R. K. (1977). Calculation of error proneness of computer programs. A m . Inst. Aeron. Astron. Comput. Aerospuce Symp. Proc., pp. 422426. Kulm, G. (1974). An alternative measure of reading complexity. Presented at A m . Psycho/. Assoc. Annu. Meet.. N e w Orleans. Kulm, G . (1975). Language level applied to the information content of technical prose. In “Collective Phenomena and the Applications of Physics to other fields of Science” N. A.
ADVANCES IN SOFTWARE SCIENCE
171
Chigier and E. A. Stern, eds.), pp. 401-408. Brain Research Publ., Fayetteville, New York. Laemmel, A., and Shooman, M. (1977). Statistical (natural) language theory and computer program complexity. Tech. Rep. EE/EP-76-020, Department of Electrical Engineering, Polytechnic Institute of New York. (See also IEEE COMPCON 77 D i g . ) Lipow, M. and Thayer, T. A. (1977). Prediction of software failures. Proc. Annu. Reliah. Muintainuh. Symp. Love, L. T. (1977). Software psychology: Shrinking life cycle costs. I n “Software Phenomenology,” pp. 606-625. U.S. Army Institute for Research in Management Information and Computer Science. Love, L. T., and Bowman, A. B. (1976). An independent test of the theory of software physics. Assoc. Comput. Much. SIGPLAN Not. 1 1 ( I 1). McCabe, T. J. (1977). A complexity measure. IEEE Trans. Soffwure Eng. SE-2 (4). 308320. Magidin, M., and Viso, E. (1976). On the experiments in algorithm dynamics. lnf. l n l v s t . 1 (14). (Univ. Autonoma Metropolitana-Iztapalapa.) Miller. G. A., Newman, E. B.. and Friedman, E. A. (1958). Length frequency statistics of written English. l n f . Contr. 1, 370-389. Miller, J., Brosgol, B., Davis, M., Morel], T., Nestor, J., and Struble. D. (1977). SPARSource program analyzer and reporter. Assoc. Compuf. Much. SICSOFT Soffwtrrr E n g . N o t . 2 (4), 26. Oldehoeft, R. R. (1977). A contrast between language level measures. IEEE Trans. Software Eng. SE-3(6). 476-478. Oldehoeft, R. R., and Bass, L. J. (1977). Dynamic software science with applications. Tech. Rep. 77-132, Department of Computer Science and Experimental Statistics, University of Rhode Island, Kingston. Ostapko, D. L. (1974a). On deriving a relation between circuits and input/output by analyzing an equivalent program. Assoc. Comput. Mach. SIGPLAN Not. 9 (6). 18-24. Ostapko, D. L. (1974b). Analysis of algorithms presented in software and hardware. Assoc. Compuf. Much. Annu. Conf. Proc., p. 749. Ottenstein, K. J. (1976a). A program to count operators and operands for ANSI-FORTRAN modules. Tech. Rep. 196, Department of Computer Science, Purdue University, Lafayette, Indiana. Ottenstein, K. J. (1976b). An algorithmic approach to the detection and prevention of plagiarism. Assoc. Comput. Much. SIGCUE Bull. 8 (4). Ottenstein, L. M. (1978a). Further validation of an error hypothesis. Assoc. Compuf.Mach. SIGSOFT Software Eng. Not. 3 ( I ) , 27-28. Ottenstein, L. M. (1978b). Predicting parameters of the software validation effort. Ph.D. Thesis, Purdue University. Ottenstein, L. M. (1978~).Quantitative estimates of debugging requirements. Tech. Rep. (August), Department of Computer Science, Purdue University, Lafayette, Indiana. Rice, J . R. (1978). Ellpack 77 user’s guide. Tech. Rep. 266, Department of Computer Science, h r d u e University, Lafayette, Indiana. Robinson, S. K., and Torsun, I. S. (1977). The automatic measurement of the relative merits of student programs. Assoc. Comput. Much. SIGPLAN Not. 12 (4). Rubin, Z. Z. (1976). Experiments in text file compression. Commun. Assoc. Comput. Much. 19 (1 I), 617-623. Ruston, H. (1978). “Programming with PL/I.” McGraw-Hill, New York. Schneider, V. (1978). Prediction of software effort and project duration-four new formulas. Assoc. Cornput. Much. SIGPLAN N o t . 13 (6), 49-59. Schneider, V., and Halstead, M. H. (1978). Further validation of the software science programming effort hypothesis. Proc. Nutl. Bur. Stand. Annu. Tech. S y m p . , 17th, pp. 434.
172
M. H. HALSTEAD
Shannon, C. E. (1935). A mathematical theory of communication. Bell Sysr. Tech. J . 27, 379-423. Shen, V. Y. (1978). The relationship between student grades and software science parameters. Tech. Rep. (May), Department of Computer Science, Purdue University, Lafayette, Indiana. Shen, V. Y.,and Halstead, M. H. (1978). On the limits of text file compression. Tech. Rep. (July), Department of Computer Science, Purdue University, Lafayette, Indiana. Sheppard, S . B., Borst, M. A,, Curtis, B., and Love, L. T. (1978). Predicting programmers ability to modify software. Tech. Rep. 78-388100-3, General Electric. Shooman, M. L. (1977). Software engineering reliability, design, management. Course Syllabus, CS 909, Polytechnic Institute of New York. Shooman, M. L., and Bolsky, M. I. (1975). Types, distributions and test and correction times for Programming Errors. Assoc. Compur. Mach. SIGPLAN No!.. 10, (6), 347-357. Shooman, M. L.. and Laemmel, A. (1977). Statistical theory of computer programsinformation content and complexity. IEEE COMPCON 77 Dig., pp. 341-347. Stroud, J. M. (1966). The fine structure of psychological time. Ann. N.Y. A c u d . Sci., pp. 623-63 1 . Symes, L. R., and OldehoeA, R. R. (1977). Context of problem solving systems. IEEE Trans. Software Eng. SE-3 (4), 306-309. Tarig, M. A. (1977). A systematic approach for comparing FORTRAN and P U I programs. M.Sc. Thesis, Middle East Technical University, Ankara, Turkey. Thayer. T. A., Lipow, M., and Nelson, E. C. (1976). Software reliability study, final report, Tech. Rep. 238, Rome Air Development Command. Walker, M. A. (1977). A software science parameter counting algorithm. Tech. Rep. 77-2. Department of Mathematics and Computer Science, Indiana University-Purdue University at Indianapolis. Walston, C., and Felix, C. P. (1977). On lines of code and programming productivity. IBM Sysr. J . 16 (41, 422-423. Woodfield, S . N. (1978). An experiment on unit increase in algorithm complexity. Tech. Rep. (June). Department of Computer Science, Purdue University, Lafayette, Indiana; also appears in IEEE Truns. Sc$ware Eng., SE-5(2), March 1979. 76-79. Zipf, G . K. (1949). "Human Behavior and the Principle of Least Effort." Addison-Wesley, Reading, Massachusetts. Zislis, P. M. (1973). An experiment in algorithm implementation. Tech. Rep. 96, Department of Computer Science, Purdue University, Lafayette, Indiana. Zislis. P. M. (1974). Semantic partitioning, an aid to program testing. Ph.D. Thesis, Purdue University. Zislis, P. M. (1975). Semantic decomposition of computer programs: an aid to program testing. Acra Inf. 4, 245-269. Zweben, S . H. (1973). Software physics: resolution of an ambiguity in the counting procedure. Tech. Rep. 93, Department of Computer Science, Purdue University, Lafayette, Indiana. Zweben, S . H. (1974). The internal structure of algorithms. Ph.D. Thesis, Purdue University. Zweben, S . H. (1977). Study of the physical structure of algorithms. IEEE Trans. Sqfrware Eng. SE3 (3). 250-258. Zweben, S. H., and Halstead, M. H. (1979). The frequency distribution of operators in PL/I programs. IEEE Truns. S o f f w a r e E n g . , SE-5(2). 91-95. Zweben, S. H., Halstead, M. H., and Elshoff, J. L. (1978). The frequency of occurrence of operators in P U I programs. Tech. Rep. (June), Department of Computer Science, Purdue University, Lafayette, Indiana; see also Zweben and Halstead (1979).
Current Trends in Computer-Assisted Instruction PATRICK SUPPES
Institute for Mathematical Studies in the Social Sciences Stanford University Stanford. California
1.
2.
3.
4.
5.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CAI in Elementary and Secondary Education . . . . . . . . . . . . . 2. I PLAT0 Elementary Mathematics and Reading . . . . . . . . . . 2.2 CAI Courses of Computer Cumculum Corporation . . . . . . . . CAI in Postsecondary Education . . . . . . . . . . . . . . . . . . . . 3.1 Community-College Courses . . . . . . . . . . . . . . . . . . 3.2 Undergraduate Physics at Irvine . . . . . . . . . . . . . . . . . 3.3 Undergraduate Logic and Set Theory . . . . . . . . . . . . . . 3.4 Other CAI Courses . . . . . . . . . . . . . . . . . . . . . . . . . Current Research . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. I Natural-Language Processing . . . . . . . . . . . . . . . . . . 4 . 2 Uses of Audio . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Informal Mathematical Proofs . . . . . . . . . . . . . . . . . . 4.4 Modeling the Student . . . . . . . . . . . . . . . . . . . . . . . The Future. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
173
. 175
. 176 . 179 185
. 186 . 190 . 191 198 199
. 200 201
. 207 212 222 225
1. Introduction
The objective of this chapter is to survey current activities in computer-assisted instruction (CAI), with the emphasis on the period 1973-1978. References to work occurring earlier are limited; moreover, there is no attempt to give even a summary history of the earlier period or to explain why there has been a developing use of computers for instruction up to 1973. A reasonably detailed survey of programs actually in operation as of about 1971 is contained in Lekan (1971). This publication is an index to computer-assisted instruction, and the 1971 third edition lists 1264 specific programs. The 1978 edition (Wang, 1978) contains information about 2997 programs available from 341 different institutions. Other recent surveys are Levien (1972) and, more pertinent for the period covered by this chapter, Lecarme and Lewis (1975) and Hunter et af. 173 ADVANCES IN COMPUTERS, VOL 18
Copyright 0 1979 hy Academic Press. Inc All rights of reproduction in any form reserved. ISBN 0-1 2-012 I IS-?
174
PATRICK SUPPES
(1975); still other more specialized ones are mentioned in later sections. The large work on computers in education edited by Lecarme and Lewis is especially valuable for providing as of 1975 a broad survey of activities throughout the world. The book is based upon papers given in Marseille at a conference organized by a widely based international committee. In a survey of rapidly developing technology, the literature is not as well defined as in the case of more theoretical matters, nor is it in easily accessible journals. Thus, in a certain sense there is no comparison in the accessibility of the literature, say, on formal languages and the literature on computer-assisted instruction. Many of the items that I have referenced have appeared only as reports, with limited circulation; in some cases it has been difficult to establish the date the report was issued. I have divided the chapter into five sections. This introduction is followed by a section on CAI in elementary and secondary education. Section 3 surveys CAI in postsecondary education. Section 4 is concerned with current research, with special emphasis on the kind of research that requires increasingly sophisticated programs. Finally, Section 5 is a brief attempt to forecast the main trends in computer-assisted instruction, although, of course, the forecasts must be treated with skepticism in view of the notorious difficulty of making successful predictions about trends in either computer theory or applications. Before turning to the substantive developments outlined above, there is one general issue that is worth elaboration. It is the question of whether or not computers and related forms of high technology constitute a new restraint on individuality and human freedom. This issue can be an especially sensitive one in education for a variety of reasons that do not need to be explored here. There are several points I would like to make about the possible restraints that widespread use of computer technology might impose on education. The first is that the history of education is a history of the introduction of new technologies, which at each stage have been the subject of criticism. Already in Plato’s dialogue Phaedrus, the use of written records rather than oral methods of instruction was criticized by Socrates and the Sophists. The introduction of books marked a departure from the personalized methods of recitation that were widespread and important for hundreds of years until, really, this century. Mass schooling is perhaps the most important technological change in education in the last hundred years. It is too easy to forget that as late as 1870 only 2% of the highschool-age population in the United States completed high school. A large proportion of the society was illiterate; in other parts of the world the situation was even less developed. Moreover, the absence of mass schooling in many parts of the world as late as 1950 is a well-documented fact. The efforts to provide mass schooling and the uniformity of that schooling in its basic structure throughout the world are among the most striking
TRENDS IN CO M PUTE R-AS SIST ED INSTRUCTI ON
175
social facts of the twentieth century. It is easy to claim that with this uniform socialization, of the primary school especially, a universal form of indoctrination has been put in place. There is something to this criticism, for the similarity of curriculum and methods of instruction throughout the world is surprising, and no doubt in the process unique features of different cultures have been reduced in importance, if not obliterated. My second point is that the increasing use of computer technology can provide a new level of uniformity and standardization. Many features of such standardization are of course to be regarded as positive insofar as the level of instruction is raised. There are also opportunities for individualization of instruction that will be discussed more thoroughly in later sections, but my real point is that the new technology does not constitute in any serious sense a new or formidable threat to human individuality and freedom. Over a hundred years ago in his famous essay On Liberty, John Stuart Mill described how the source of difficulty is to be found elsewhere, in the lack of concern for freedom by most persons and in the tendencies of the great variety of political institutions to seriously restrain freedom, if not repress it. Here are Mill’s words on the matter. The greatest difficulty to be encountered does not lie in the appreciation of means toward an acknowledged end, but in the indifference of persons in general to the end itself. If it were felt that the free development of individuality is one of the leading essentials of well-being; that it is not only a co-ordinate element with all that is designated by the terms civilization, instruction, education, culture, but is itself a necessary part and condition of all those things; there would be no danger that liberty should be undervalued, and the adjustment of the boundaries between it and social control would present no extraordinary difficulty.
We do not yet realize the full potential of each individual in our society, but it is my own firm conviction that one of the best uses w e can make of high technology in the coming decades is to reduce the personal tyranny of one individual over another, especially wherever that tyranny depends upon ignorance. The past record of such tyranny in almost all societies is too easily ignored by many who seem overly anxious about the future.
2. CAI in Elementary and Secondary Education
In this section, some examples of CAI at the stage of research and development for elementary and secondary schools, and also some examples of commercial products that are fairly widely distributed, are considered. As in the case of the sections that follow, there is no attempt to survey in a detailed way the wide range of activities taking place at many different institutions. It is common knowledge that there is a variety of computer activity in secondary schools throughout the United States and
176
PATRICK SUPPES
in other parts of the world. A good deal of this activity is not strictly to be classed as computer-assisted instruction, however, but rather as use of the computer in teaching programming, in problem solving, or in elementary courses in data processing oriented toward jobs in industry. Section 2.1 examines the PLATO projects in elementary reading and elementary mathematics. Section 2.2 surveys commercial CAI courses now offered by Computer Curriculum Corporation. 2.1 PLATO Elementary Mathematics and Reading
Recent general descriptions of the educational uses of the PLATO computer system are to be found in Bitzer (1976) and Smith and Sherwood (1976). The best detailed description of the work in elementary-school mathematics and elementary-school reading is contained in the PLATO project final report, which covers the recent period of substantial National Science Foundation support from 1972 to 1976 [Computer-Based Education Research Laboratory (CERL), 19771. 2.1.1 Elementary Mathematics
The goal of the elementary-school mathematics program was to demonstrate the feasibility and value of PLATO in developing a mathematics curriculum for Grades 4-6. From 1973 to 1976, more than a hundred hours of instructional material were developed, which was delivered to about 500 students for approximately 30,000 student-contact hours. This work took place under the direction of Robert B. Davis (1974), who has been prominent in mathematics education since the early 1960s. The elementary-mathematics demonstration included enough coursework to allow students to work on PLATO for about 30 min each day throughout the school year. Further details of the curriculum and of the implementation are to be found in Dugdale and Kibbey (1977). The courseware was developed in three strands, as follows (CERL, 1977): ( 1) Whole number arithmetic, including: meanings of operations; computation techniques and practice; algorithms; place value; renaming and symbols; and word problems. (2) Fractions, mixed numbers, and decimals, including: meanings of fractions and mixed numbers; equivalent fractions; addition, subtraction, and multiplication of fractions and mixed numbers; the meaning of decimal numerals; and heuristic approaches to problem solving. (3) Graphs, variables, functions, and equations, including: signed num-
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
177
bers (integers and rationals, positive, negative, and zero); variables and open sentences: exponents; graphs; and the representation of functions by graphs, tables, and formulas (pp. 66-67). The courseware was designed for a wide range of student abilities, and generally worked fairly well without any major flaws, but the schedule did not allow for extensive revision. Roughly speaking, each half-hour session was divided into three parts: review, new material, and a final portion of highly enjoyable curriculum material often organized in the form of a game. The curriculum material made extensive use of the graphic capacities of PLATO terminals, and, compared to earlier work in the field, this was probably the most original and most attractive aspect of the material developed. The report mentioned gives a large number of illustrations of the ways in which the graphic features were used; there was a continual emphasis on the strategy of making displays appear and change at the same time as the corresponding abstract or symbolic notations were presented on the screen. An especially attractive example is the “paintings library” designed by Sharon Dugdale and David Kibbey. In this lesson the student chooses or is given a fraction between zero and one. His task is then to “color in” that fraction of a rectangle, but the coloring in is done with the touch panel in a manner rather like finger painting. After the work is completed, the student may add his painting to a ‘‘library’’ that other students can look at. It was found that adding the “public” library increased the quality of students’ work. Different types of designs were used for the purposes of the coloring--checkerboard and much more elaborate patterns displaying principles of symmetry in interesting ways. The main test of the courseware was in the school year 1975-1976. During this period, PLATO was in daily operation in 13 classrooms in six different schools, with four terminals in each classroom. That year, approximately 75 students participated in Grade 4, 140 students in Grade 5, and I10 students in Grade 6. Concerning the evaluation of the work, one of the more significant features was the positive change in attitude toward mathematics on the part of the students in the PLATO classes. Concerning achievement in both 1974-1975 and 1975-1976, PLATO fourth- and fifth-grade classes clearly outperformed non-PLAT0 classes on Educational Testing Service’s special achievement test on fractions. Test-performance differences between the PLATO and non-PLAT0 students on the graphs strand and on the whole-numbers strand were not significant. The preliminary character of these results based on outcomes just for 1974-1975 must be emphasized.
178
PATRICK SUPPES
2.1.2 Elementary Reading
The main features of the PLATO Elementary Reading Curriculum Project during the period from 1971 to 1976 were the following: development of a tree of behavioral objectives, which was intended to describe a sequence of skills involved in learning to read; development of about 80 hr of instructional materials in support of these objectives; development of a computer-based curriculum management system; articulation of principles of audiovisual sequencing and student interaction patterns; development of computer-based teacher control and feedback routines; and implementation of the above program in 25 classrooms with 52 terminals, with delivery of about 17,000 hr of instruction to 1225 kindergarten, firstgrade, remedial, and educable mentally retarded students. According to the final report cited (CERL, 1977), the principal successes of the program were held to be the following: (1) The enthusiastic acceptance by students and teachers of welldesigned CAI as a normal part of daily instruction. (2) The design of successful lesson paradigms. The data indicated that most students interacted successfully with lessons and that their performance improved with successive iterations of the same lessons. (3) Clarification of perceptions about what degree of curriculum and teaching management is optimally handled by the computer as opposed to the classroom teacher.
In the same report, the major obstacles to successful development were found to be the following: (1) Unreliability of the audio component of the hardware, which gave continual trouble, both in operation and in preparation of audio materials. (2) Unexpected rigidities in the computer-based curriculum management system. (3) Scope of the original conception. In hindsight, the staff felt that rather than producing an entire curriculum on-line, it would have been better to have focused on those things that PLATO does best, especially because the problems with audio made the implementation of a full curriculum difficult.
The lessons covered, in one form or another, material that is more or less standard in the teaching of initial reading: visual skills; letter names, alphabetization, and introduction to letter sounds; auditory discrimination; phonics; basic vocabulary words; concept words; and stories. What is rather surprising in the final report is that there is a discussion of a model of the process of learning-to-read but no discussion or references to the extraordinarily large literature on these matters. It has been
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
179
estimated that the number of books and articles on reading written in the United States since 1920 is well in excess of 30,000. Not all of this literature is of the same scientific and intellectual significance, but there is certainly a body of substantial work that needs to be referenced in any new conception of a model of the learning-to-read process. The final report on PLATO discusses with frankness and objectivity the problems encountered in the external evaluation of the project by Educational Testing Service. It is not appropriate to review the problems here but just to remark that inevitably there are difficulties in such evaluations. My own judgment would be that the large-scale ETS effort was premature in relation to the PLATO developments and should have been conducted only after materials had been thoroughly developed and given a preliminary test, followed by a first round of revisions.
2.2 CAI Courses of Computer Curriculum Corporation
At the public-school level, the largest number of students participating in CAI are those taking courses offered by Computer Curriculum Corporation (CCC), with which I am associated. At the time of writing this article in late 1978, more than 150,000 students are using the CCC courses on an essentially daily basis. This usage is spread over the country, with systems in 24 states; most of the students are disadvantaged or handicapped. The main effort at CCC has been in the development of drill-andpractice courses that supplement regular instruction in the basic skills, especially in reading and mathematics. The I5 courses offered in 1978 by CCC are listed in Table I. Because of their early development, the three most widely used curriculums are the Mathematics Strands, Grades 1-6; Reading, Grades 3-6; and Language Arts, Grades 3-6. The strands instructional strategy plays a key role in each of these courses and its explanation is essential to a description of the CCC curriculums. 2.2.1 Strands Strategy
A strand represents one content area within a curriculum. For example, a division, a decimal, and an equation strand are included in the mathematics strands curriculum. Each strand is a string of related items whose difficulty progresses from easy to difficult. A computer program keeps records of the student’s position and performance separately for every strand. By comparing a student’s record of performance on the material in one strand with a preset performance criterion, the program determines
PATRICK SUPPES
180
TABLEI CAI COURSES OFFERED BY COMPUTER CURRICULUM CORPORATION
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Mathematics Strands, Grades 1-6 Reading, Grades 3-6 Reading for Comprehension, Grades 3-6 Language Arts Strands, Grades 3-6 Language Arts Topics, Grades 3-6 Mathematics, Grades 7-8 Critical Reading Skills Adult Arithmetic Skills Adult Reading Skills Adult Language Skills I Adult Language Skills I1 GED Preparation Course Fundamentals of English Introduction to Algebra Problem Solving, Grades 3-6
whether the student needs more practice at the same level of difficulty within the strand, should move back to an easier level for remedial work, or has mastered the current concept and can move ahead to more difficult work. Then the program automatically adjusts the student’s position within the strand. The process of evaluation and adjustment applies to all strands and is continuous throughout each student’s interaction with a curriculum. Evenly spaced gradations in the difficulty of the material allow positions within a strand to be matched to school grade placements by tenths of a year. Grade placement in a specific subject area can then be determined by examining a student’s position in the strand representing that area. Since performance in each strand is recorded and evaluated separately, the student may have a different grade placement in every strand of a curriculum. Teachers’ reports, available as part of each curriculum, record progress by showing the student’s grade placement in each strand at the time of the report. In a curriculum based on the strands instructional strategy, a normal lesson consists of a mixture of exercises from different strands. Each time an item from a particular cumculum is to be presented, a computer program randomly selects the strand from which it will draw the exercise. Random selection of strands ensures that the student will receive a mixture of different types of items instead of a series of similar items.
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
181
Each curriculum also provides for rapid gross adjustment of position in all the strands as the student first begins work in the course. Students who perform very well at their entering grade levels are moved up in half-year steps until they reach more challenging levels. Students who perform poorly are moved down in half-year steps. This adjustment of overall grade level ensures that students are appropriately placed in the curriculum and is in effect only during a student’s first ten sessions. 2.2.2 Mathematics Strands, Grades 1-6
Mathematics Strands, Grades 1-6 contains 14 strands, or content areas. Table I1 lists the strands in the mathematics curriculum. The curriculum begins at the first-grade level and extends through grade-level 7.9. The seventh-grade material does not constitute a complete curriculum for that grade year but is intended as enrichment for students who complete the sixth-grade material. (Mathematics, Grades 7-8 is a separate course for these grades.) Each strand is organized into equivalence classes, or sets of exercises of similar number properties and structure. During each CAI session in mathematics, students receive exercises from all the strands that contain equivalence classes appropriate to their grade levels. For example, a student at mean grade-level 2.0 will be given exercises from seven strands: NC, HA, HS, VA, VS, EQ, and MS. TABLEI1 THESTRANDS I N MATHEMATICS STRANDS, GRADES1-6 Strand 1 2 3 4 5 6 7 8 9 10 I1 12
13 14
Name Number concepts Horizontal addition Horizontal subtraction Vertical addition Vertical subtraction Equations Measurement Horizontal measurement Laws of arithmetic Vertical multiplication Division Fractions Decimals Negative numbers
Abbreviation
NC HA HS VA
vs
EQ MS HM LW VM DV FR DC NG
182
PATRICK SUPPES
Students are not given an equal number of exercises from all strands. The program adjusts the proportion of exercises from each strand to match the proportion of exercises covering that concept in an average textbook. The curriculum material in Mathematics Strands, Grades 1-6 is not prestored but takes the form of algorithms that use random-number techniques to generate exercises. When a particular equivalence class is selected, a program generates the numerical value used in the exercise, produces the required format information for the presentation of the exercise, and calculates the correct response for comparison with student input. As a result, the arrangement of the lesson and the actual exercises presented differ between students at the same level and between lessons for a student who remains at a constant grade placement for several lessons. Students are ordinarily at terminals about 10 min a day, during which time they usually work in excess of 30 exercises. Thus, a student following such a regime for the entire school year of 180 days works more than 5000 exercises. 2.2.3 Reading, Grades 3 - 6
The Reading, Grades 3-6 curriculum consists of reading-practice items designed to improve the student’s skills in five areas: word analysis, vocabulary extension, comprehension of sentence structure, interpretation of written material, and development of study skills. It contains material for four years of work at grade-levels 3-6 as well as supplementary remedial material that extends downward to grade-level 2.5. The program is divided into two parts: basic sentences and strands. Basic sentences begins at grade-level 2.5 and ends at grade-leve13.5. The items in this section are short and easy. They represent the simplest type of reading-practice exercise that can be presented in a contemporary computer-assisted instructional system. The strands section starts at grade-level 3.5 and continues through grade-leve16.9. When working in the strands section, the student receives items from all five strands during every session (see Table 111). 2.2.4 Language Arts, Grades 3 - 6
Language Arts, Grades 3-6 covers grades 3 through 6 with enough material for a year’s work at each grade level. It also offers a supplement of lessons for students with special language problems. These include hearing-impaired students and students for whom English is a second language.
TRENDS IN COM PUTER-ASSISTED INSTR UCTlON
183
TABLE111
THE STRANDS
IN
READING, GRADES3-6
Strand
Content
A
Word attackanalyzing words as units Vocabulary-building a reading vocabulary Literal comprehension-understanding the literal meaning of sentences and short paragraphs Interpretive comprehension-reading sentences for interpretation Work-study skills-learning to use resources effectively
B C D E
The language arts curriculum stresses usage instead of grammar and presents very few grammatical terms. It is divided into two courses, language arts strands and language arts topics. Both courses cover the same general subject areas, but their structures are different. Language arts strands uses a strands structure to provide highly individualized mixed drills (Table IV). In language arts topics the entire class receives lessons on a topic assigned by the teacher. More detailed descriptions of the content and structure of all three curriculums are found in CCC’s teacher’s handbooks for mathematics (Suppes et al., 1977), reading (Fletcher et al., 1972), and language arts (Adkins and Hamilton, 1975). 2.2.5 Evaluation The three curriculums just described have had extensive evaluation by many different evaluation groups, including individual school systems. TABLEIV T H ESTRANDS I N LANGUAGE ARTSSTRANDS, GRADES 3-6 Strand
Content
A B C D E F G
Principal parts of verbs Verb usage Subject-verb agreement Pronoun usage Contractions, possessives, and negatives Modifiers Sentence structure Mechanics
H
184
PATRICK SUPPES
More than 40 such studies are reported in Macken and Suppes (1976) and Poulsen and Macken (1978). A detailed mathematical study of individual student trajectories is found in Suppes ef al. (1978). The data and analysis from these many studies are far too detailed even to try to summarize here. A qualitative sense of the kind of results obtained can be conveyed by quoting the final paragraphs of the article by Macken and Suppes (1976). We would like to make four main remarks in summarizing the results reported in this paper. I . At least four kinds of studies are included in this survey. First, there are studies that measure grade placement gains with standard achievement tests to analyze the results of the use of CAI. Second, there are studies that report gains made in CCC’s curriculum as measured by the grade placements that are built into each of the three curriculums. Third, there are linear regression studies of the relation between grade placements in the CAI curriculums and standardized test scores. Finally, there are anecdotal reports of student and teacher attitudes in a variety of settings. Certainly the variety of studies does not exhaust the possibilities, but it does give a broad assessment of computerassisted instruction as provided by CCC’s three basic curriculums in mathematics, reading, and language arts. 2. The variety of studies covers a wide range of student populations. Results for disadvantaged urban students, many of whom are members of minority populations, are reported from Houston and Fort Worth, Texas. Reports for disadvantaged students in suburban areas are represented by studies from Freeport, New York. and San Dimas, California. Reports from small urban environments that include many minority students are represented by the Gulfport and Meridian, Mississippi studies. Results for a rural population of native Americans are reported from Isleta, New Mexico. Finally, the studies from the schools for the deaf in Florida, Illinois, Oklahoma, and Texas provide a variety of results for handicapped students, many of them with multiple handicaps. 3. Several of the studies correlate time spent at computer terminals with grade placement gains in the CAI curriculums. These studies reproduce the positive linear relationship that has been found in previous work of the same sort, for example, that reported in Suppes, Fletcher, Zanotti, Lorton, and Searle (1973). We would not expect to be able to find linear gains with indefinite increases in the amount of time spent at computer terminals, but it is clear, from the studies reported here and from other studies, that for a fairly wide range of time measurements an approximate linear relation holds very well. We can conclude that students who need to increase their gains should be assigned additional CAI sessions. 4. The many studies reported in this survey show quite positive results for the use of computer-assisted instruction in basic skills, and these results seem to hold for a variety of measures of gain and for a wide variety of student populations. Broadly speaking, these results are consistent with others reported in the literature referred to in the introduction [Vinsonhaler and Bass, 1972, and Jamison, Suppes, and Wells, 19741. It should be noted that they also agree with a large number of studies of organized drill and practice in basic skills. The research literature since the 1920s has indicated the importance of carefully organized drill and practice regimes for the development and maintenance of basic skills in mathematics, reading, and language arts. (For a review of this literature, see chapter 5 of Suppes, Jerman, and Brian [1%8].) Perhaps the central role of computer-assisted instruction in basic skills is to provide an efficient and, from the teacher’s standpoint, painless method of delivering a continual stream of individualized exercises to students and automatically evaluating their answers (pp. 34-35).
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
185
I have not attempted to describe the other CCC courses, which are just beginning to be used on an extensive basis. The new course “Reading for Comprehension,” for example, represents a n improved version of the earlier reading course discussed above and is now probably more widely used than Reading, Grades 3-6. The secondary-school courses, especially aimed at the upper grades, namely, “Introduction to Algebra” and “Fundamentals of English,” are being used not only by high schools but also by some community colleges. The Adult Skills and GED (General Educational Development) preparation courses are being used in several prisons and various community centers.
3. CAI in Postsecondary Education
In this section some salient examples of CAI at universities, community colleges, or other postsecondary institutions are examined to provide a sense of the conceptual variety of the work that is being undertaken. There has been no attempt to survey the wide range of activities taking place at many different institutions. Fortunately, an excellent survey was published as a report in June 1977 by C. A. Hawkins, and this analysis of computer-based learning in the United States, Canada, the United Kingdom, and the Netherlands is fairly up-to-date as of early 1976. Much detailed information of the same sort is contained in the CONDUIT State of the Art Reports (1977). A recent brief survey of CAI in Canada is to be found in Hunka (1978). From a survey standpoint, there have not been that many decisive changes to warrant an additional attempt for the present chapter . A good recent survey of educational technology in Japan is to be found in Sakamoto (1977). The use of CAI in Japan is as yet surprisingly limited. Sakamoto cites the case of industrial education at the Central Training School of the Nippon Telegraph and Telephone Corporation and the Japan Society for the Promotion of Machine Industry. These schools have 30 terminals each. The Fujitsu System Laboratory has 20. IBM has an online system of about 30 terminals. Kanda Foreign Language School has 48 terminals. There is also some work at schools and universities; for example, a 13-terminal CAI system is being used at Tokiwa Middle School in Tokyo for instruction in mathematics (Kimura, 1975). In addition, the Koyamadai High School in Tokyo has a 48-terminal computer system being used for second-year physics (Ashiba, 1976). There are also activities at Tsukuba University, Kanazawa University of Technology, Aichi University of Education, Hokkaido University of Education, and Osaka University. Given the population and wealth of Japan, the ac-
PATRICK SUPPES
186
tivities in CAI are quite underdeveloped. I have made a point of mentioning the activities known to me, because many readers will perhaps be interested in Japan and yet not be familiar with the current situation. The status of CAI is really no different from the status of interactive computing in Japan, which is still restricted in character. Very substantial changes will almost certainly take place in the next decade, and I would expect Japan in 20 yr to reach a level of activity nearly comparable to that of the United States. It is important to emphasize the great variety of CAI activities in the many different institutions throughout the world, especially in the United States. It is also important to emphasize that much of the activity is of a local sort that goes unreported in the published literature. For example, the use of computers to facilitate instruction in elementary statistics or in first courses in computer science is to be found in a number of institutions that have not reported these activities, and they are known only to persons at the institution in question and to various visitors and others who by chance have heard about the activities. I have not covered the topic of computerized adaptive testing, which lies somewhat outside CAI. A good recent reference on such testing is Weiss (1978). 3.1 Community-Col lege Courses
I describe in this section the PLATO course in biology developed in close collaboration with community-college instructors in Illinois and the TICCIT mathematics course for community-college students. Both of these activities were substantial parts of large activities in CAI funded by the National Science Foundation in the past five years. It should be noted that the PLATO activities also included courses in accounting, chemistry, mathematics, and English at the community-college level, and details of this work may be found in the final report on the demonstration of the PLATO IV computer-based education system (CERL, 1977). My account of the biology course is drawn from this report. I selected the biology lessons rather than the chemistry lessons for discussion here because many of the chemistry lessons had previously been prepared at the University of Illinois. 3.1.1 PLATO Biology
The final report on the demonstration project gives data running from the fall of 1974 to the spring of 1976. The demonstration was conducted in four community colleges and a vocational school. The data on usage at
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
187
TABLEV USAGEBY INDIVIDUAL STUDENTS OF PLAT0 COMMUNITY-COLLEGE BIOLOGY LESSONS
College
I I1 111 Total
Fall 1974
Spring 1975
Fall 1975
Spring 1976
Students Hours
Students Hours
Students Hours
Students Hours
357 102 81 540
1826 462 196 2484
359 262 375 996
2117 1282 2922 6321
414 204 479 1097
2239 636 2679 5554
436 314 424 1174
2601 1230 2637 6468
Totals by college students
Hours
1566 882 1359 3807
8783 3610 8434 20827
three colleges are contained in Table V. The table shows the substantial usage of the lessons, in terms of both number of students and number of hours-almost 4000 students and more than 20,000 hr of instruction. In addition, a total of 29 instructors were involved in the field test, 25 of these for at least three semesters. The 84 lessons that were developed prior to and during the project represented approximately 55 hr of instruction. By and large, the lessons were designed to supplement regular instruction. The way in which the lessons were used was left to the individual instructors who were responsible for their particular course sections. In designing the lessons, four types of instructional strategies were used, often in combination in a single lesson. One was practice mode. This material assumed the student had received instruction off-line prior to the session at a computer terminal. The tutorial mode gave instruction directly, folloWed by direct questions on the content of the computer-based lesson. The simulation-model mode simulated biological processes using especially PLATO’s graphic capabilities. The inquiry mode gave instruction followed by questions and feedback, which were intended to guide the student toward a conclusion. In using these various modes, extensive use was made of PLATO’s graphic capacities; on the other hand, no audio facilities were available. In Table VI, a list of the lessons, the number of students, and the number of minutes used by these students are shown for two courses for which the lessons were available, Biology 101 and 1 1 1 , in the spring of 1976. Lessons for which a blank is shown are lessons for which quantitative data on usage were not available. Lessons or sections of lessons followed by a superscript are ones on topics that would not usually be covered in the curriculum. It is interesting to note that a significant number of students still accessed these lessons. For the data shown, the system was used over 1500 hr, and 566 students accessed the lessons. The
TABLEVI. PLAT0 LESSONUSAGE-BIOLOGY101 A N D 111. SPRING 1976 Lesson 1 Tools used in biology
2 3
4
5
6
7 10
13
14
Experimental technique Life in a microcosm Simple chemistry I Simple chemistry 11" The ultrastructural concept Cells-structure and function Diffusion and osmosis Introduction to water relations Water relations laboratory" Surface aredvolume in living systems Cell growth Mitotic cell divisionb Mitosisb Meiosis (Arsenty)b Meiosis (Porch)b Embryology".b Plant life cyclesb Hormonal control of the menstrual cycleb DNA and protein synthesis DNA, RNA, and protein synthesis Enzyme experiments Photos ynthesis" Experiments in photosynthesis" Essentials of photosynthesis ATP, anaerobic, and aerobic respiration Electron transport chain Measuring the level of life Respiration and enzymes Experiments in respiration Blood typingb Drosophila geneticsb Plant growth Plant responses and apical dominance" Flowering and photoperiod" Fruiting and leaf senescence" Enzyme-hormone interactions" Organization of the higher plant ADH and water balance in humans Neuron structure and function Human digestive system The heart" Cardiac cycle Heart rate regulatory mechanisms Mechanics of breathing" Elementary psychophysiology of audition Physiological basis of learning
No. of students
No. of minutes
59 11 18
2198 340 228
214 -
17,306 -
49 260 230 6 58 I5
6684 13,793 9445 76
76 40 1
3319 1651 32 36 210 2892
1
II 27 49 32 7 -
150 187 122 57 62 6
2 5 29 24 13 20 130 79 35
-
-
79 1 86 1
2085 929 267 2318 504 1 2885 1236 2858 66 19 39 336
-
-
423 259 618 8102 3367 1695
-
13
1395
18
550
Quantitative data not available. Topics not usually treated in this curriculum.
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
189
range of topics shows that a great variety of concepts were programmed. No doubt the excellent graphic facilities available on the PLATO system helped make the lessons attractive. 3.1.2 TlCClT English and Mathematics
The TICCIT project, just like the PLATO project, received major support from the National Science Foundation for the period running from about 1971 to 1976. The TICCIT project had the responsibility to develop two community-college courses, one in English and one in mathematics. The curriculums of the two courses are fairly standard and will not be reviewed here. The more distinctive feature of the TICCIT courses has been the effort to use an explicit instructional strategy focused on learner-controlled courseware (Bunderson, 1975; Bunderson and Faust, 1976). The Educational Testing Service (ETS) evaluation of the TICCIT courses, as summarized quite objectively in Bunderson (1977), presents the following conclusions (see also Alderman, 1978). ( 1 ) When used as an adjunct to the classroom, TICCIT (like PLATO) did not produce reliable, significant differences in comparison with classes that did not use TICCIT (or PLATO). (2) When used as an integral scheduled part of either mathematics or English classes, TICCIT students did significantly better than nonTICCIT students. (3) Characteristics of the teacher are significant in determining the performance and the attitude of students in both TICCIT and non-TICCIT classes, a conclusion that matches much other research of a similar sort. (4) There was a difference of about 20% in completion rate in favor of non-CAI classes for the TICCIT classes. ( 5 ) The success rate of students who took the TICCIT mathematics more than once seemed to indicate that the courseware did not provide sufficient remedial depth to teach some of these students.
These results are not terribly surprising. It seems to me important that we do not have some immediate evaluation of CAI on the basis of a single year’s test as in TICCIT or PLATO. It is rather as if we had had a similar test of automobiles in 1905 and concluded that, given the condition of roads in the United States, the only thing to do was to stay with horses and forget about the potential of the internal combustion engine. A wide variety of research shows that the method of teaching at the college or university level very seldom makes any difference in achievement if the students and the settings in which the studies are conducted
190
PATRICK SUPPES
are diverse or large in number (Jamison et al., 1974). I would expect this robust conclusion based on many different kinds of courses and evaluation of them to hold up mainly for CAI as well. Some further remarks on these matters from the standpoint of productivity are contained in Section 3.3. 3.2 Undergraduate Physics at lrvine
Perhaps the best known current example of the use of computers for instruction in college-level physics is the work done by Alfred Bork and his associates, especially Stephen Franklin and Joseph Marasco, at the University of California, Irvine. Bork has described this activity in a number of publications (Bork, 1975, 1977a,b, 1978; Bork and Marasco, 1977). In describing the objectives of the kind of work he has done, I draw especially upon Bork (1978), in which he describes the way in which Physics 3A was taught at Irvine in the fall of 1976 to approximately 300 students. The students had a choice of using a standard textbook or making extensive use of various computer aids. In addition, the course was self-paced: students were urged to make a deliberate choice of a pacing strategy. The course was designed as a mastery-based course along the lines of what is called the Keller plan or PSI (Personalized System of Instruction), in which the course is organized into a number of modules. Each module is presumed to be developed around a carefully stated set of objectives, and at the end of each module, students are given a test; until a satisfactory level of performance is achieved, they are not permitted to move to the next module. Bork describes six different ways in which the computer was used in the course. All students had computer accounts, and during the 10 weeks of the term the average student used about 2.5 hr per week. Thus the total time involved with the approximately 300 students was about 7500 hr in the term. Before turning to the various roles of the computer described by Bork, I would like to emphasize that, having had a personal opportunity to see some of his material, the use of graphic displays is especially impressive and is certainly a portent of the way computer graphics will be used in the future for the teaching of physics. The first role of the computer was simply as a communication device between student and instructor. The instructor, Bork, could send a message to each student in the class and the students could individually send messages to him. He says that typically he would answer his computer mail once a day, usually in the evening from a terminal at his home. The second use of the computer was individual programming by the student as an aid to learning physics. The language APL was available to the students, and one of the eight units was spent in learning APL by the
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
191
students who chose the computer track. One reason for the choice of APL was the fact that the computer system at Irvine had available efficient graphic capability within APL. The third role of the computer was as a tutorial device helping students to learn the basic physics to which they were being exposed. Bork properly emphasizes that tutorial programs are to be contrasted with large lecture courses in which the student must essentially play a passive role. The tutorial programs required ongoing dynamic interaction with the student, and the development of material was tailored to the needs and capacities of the students in a way that is never possible in a large lecture setting. A fourth role of the computer was as an aid. to building physical intuition. In this case, extensive use was made of the graphic capabilities available on the Tektronix terminals used in the course. A fifth use of the computer was in giving the tests associated with each of the modules. Because of the way PSI courses are organized, alternate forms-often randomly generated-of each test were required in case the student had to take the test several times before demonstrating mastery of the particular module. During the 10 weeks of the course in the fall of 1976, over 10,000 on-line tests were administered. Students perceived this test-giving role as the most significant computer aspect of the course. The sixth use of the computer was in providing a course management system. As would be expected, all of the results of the on-line tests were recorded: programs were also developed to provide students access to their records and to provide information to the instructor. In Fig. 1 a typical graphical illustration to help physical intuition is shown from the section on mechanics in the physics course described. In his many publications concerning the developments at Irvine, Bork emphasizes that his project, like others described below, is still only in the beginning stages of what we can expect in the future. One of the most promising things about the Irvine project is the persistence with which Bork is continuing to develop new materials and new approaches for computer-assisted instruction in physics. 3.3 Undergraduate Logic and Set Theory
I survey activities at several institutions but mainly concentrate on the work at Stanford, with which 1 have been associated for many years. 3.3.1 Logic a t Ohio State
Computer-assisted instruction is being used in varying degrees in introductory courses in logic in a number of different institutions. A good
PATRICK SUPPES
192 CONTINUE PLOTTING ?
I
px
I
t FIG.I .
Graphical example from lrvine physics curriculum.
example of using it for drill and practice is to be found at Ohio State University. Almost 3000 students use the program each year. After presentation of course material by lectures, 32 Hazeltine 2000 CRT terminals connected to an IBM 370-158 computer and using Coursewriter 111 offer extensive drill-and-practice exercises, including course examinations, but not ordinarily the course final. There are several salient aspects of the program. One is that the drilland-practice exercises are generated rather than being stored. The second is that the course is a rather informal one and quite elementary, but the faculty and staff have made effective use of CAI to handle very large amounts of drill-and-practice work. The course concentrates on propositional inference, truth tables, Venn diagrams, syllogistic arguments, and, in the latter part, rather extensive material on inductive methods, especially Mill’s methods. A good recent report of the course is to be found in Laymon and Lloyd (1977). Student questionnaires have been distributed in order to get an attitudinal evaluation of the course. In the winter of 1976, for example, 71% of a total of 198 sampled students strongly agreed that if they were given a choice between (a) 1 hr of recitation per week and 13 hr of computer time per week and (b) 2 hr of recitation per week and no computer time, they
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
193
would choose (a). Eighty-seven percent of the students indicated they would like to see more of the course material available at computer terminals. Only 10% of the students indicated that they found the computer material too difficult. About 73% of the students indicated that they found the terminal room a better place to meet and interact with other students than the recitation room. (For the data just cited I am indebted to Ronald Lay mon. )
3.3.2 Logic at Stanford Since 1972, the introductory logic course at Stanford has been taught during the regular academic year entirely as a CAI course. Various aspects of the course have been described in a number of publications (Goldberg and Suppes, 1972, 1976; Larsen et al., 1978; Suppes, 1975; Suppes et al., 1977). Basic data on the course are given in Table VII. There are 29 lessons that form the core of the course. The number of exercises in each lesson, the mean time to complete the lesson, and the cumulative time are shown, as well as a brief description of the content of each lesson. The cumulative times are shown in parentheses after the times for the individual lessons. The data are for the autumn quarter of 19761977, but only minor curriculum changes have been made in the last year. It should be emphasized that many of the exercises involve derivations of some complexity, and a strong feature of the program is its ability to accept any derivation falling within the general framework of the rules of inference available at that point in the course. For example, prior to lesson 409, students are required to use particular rules of sentential inference, and only in lesson 409 are they introduced to a general tautological rule of inference. Lesson 410, it may be noted, is devoted to integer arithmetic, which would often not be included in a course in logic. The reason for it in the present context is that this is the theory within which interpretations are given in the course to show that arguments are invalid, premises consistent, or axioms independent. In a noncomputer-based course, such interpretations to show invalidity, etc., are ordinarily given informally and without explicit proof of their correctness. In the present framework, the students are asked to prove that their interpretations are correct, and to do this we have fixed upon the domain of integer arithmetic as providing a simple model. It should be noted that students taking a Pass level require on the average about 67 hr of connect time, which, at present, may be about the highest of any standard computer-based course in the country. Moreover, for students who go on to take a letter grade of A or B, additional work is required, depending upon the particular sequence of applications they
PATRICK SUPPES
194
TABLEVII MEAN T I M EA N D CUMULATIVE M E A N T I M EFOR
No. of
Student's time in hours
Lesson
exercises
40 1 402 403 404 405
19 18 14 14 19
0.59 1.12 0.76 1.17 4.07
406 407 408 409 410 41 I 412 413 414 415 416 417
16 12 21 24 13 7 7 7 7 I1 4 7
1.83 (9.54) 2.37 (11.91) 14.38 (26.29) 2.37 (28.66) 0.71 (29.37) 0.61 (29.98) 0.59 (30.57) 0.44 (31.01) 0.97 (31.98) I .99 (33.97) 0.99 (34.96) 2.00 (36.96)
418 419 420 42 I 422 423
8 8 12 8 14 28
1.50 (38.46) I .54 (40.00) 1.51 (41.52) 0.44 (41.96) 1.20 (43.16) 2.78 (45.94)
424 425 426
28 22 21
2.87 (48.81) 2.67 (51.48) 1.41 (52.89)
427 428
17 23
4.1 1 (57.00) 6.18 (63.18)
429
40
3.96 (67.15)
(.59) (1.71) (2.47) (3.64) (7.71)
A
REPRESENTATIVE QUARTER
Content Introduction to logic Semantics for sentential logic (truth tables) Syntax of sentential logic, parentheses Derivations, rules of inference, validity Working premises, dependencies, and conditional proof Further rules of inference New and derived rules of inference Further rules and indirect proof procedure Validity, counterexample, tautology Integer arithmetic Two rules about equality More rules about equality The replace equals rules Practice using equality in integer arithmetic The commutative axiom for integer arithmetic The associative axiom Two axioms and a definition for commutative groups Theorems 1-3 for commutative groups Theorems 4-7 for commutative groups Noncommutative groups Finding-axioms exercises Symbolizing sentential arguments Symbolizing English sentences in predicate logic Inferences involving quantifiers Quantifiers; restrictions and derived rules Using interpretations to show arguments invalid Quantifiers and interpretation Consistency of premises and independence of axioms The logic of identity (and sorted theories)
take. For example, those choosing the lesson sequence on social decision theory will require an average of somewhat more than 20 additional hours. Those who take the lesson sequence on Boolean algebra and qualitative foundations of probability will require somewhat less connect time but they do more proofs that benefit from reflection about strategic lines of attack, which need not necessarily occur while signed on at a terminal.
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
195
Also, the number of hours of connect time just discussed does not include the finding-axioms exercises but only the introduction to them in lesson 421. These exercises, which have been reported in Goldberg and Suppes (1972), present the student with a number of statements about a particular theory', for example, statements about elementary properties of betweenness on the line. The student is asked to select not more than a certain number of the statements, for example, five or six, as axioms, and prove the remainder as theorems. This kind of exercise has been advocated by a number of mathematical educators. The method is often called the Moore or Texas method in honor of the well-known American topologist R. L. Moore, who introduced it many years ago as his own primary approach to teaching. Even apart from the finding-axioms exercises, for which we do not have good time measurements, the variation in individual student time spent at terminals is substantial. For most terms the standard deviation for the Pass level of the course will be somewhere between 15 and 20 hr, and the range will be somewhere from 30 hr as a minimum to 140 h r as a maximum. In both the logic course described in this section and the set-theory course described in the next, an effort is made to minimize the amount of input that must be the student's responsibility. Essentially the student is given a control language that informs the computer program which inferences to make next. The system of natural deduction that has been implemented in the logic course is close to that given in my textbook (Suppes, 1957). In addition, the students are given a number of administrative commands; for example, they type NEWS to get the news of the day, including any program changes, etc., or (if they have been absent) OLDNEWS to get old news items that have been deleted from the news file. By typing GRIPE, they may send a message complaining about some feature of the course or course operation. Ordinarily the gripes are answered by one of the teaching or research assistants by a response addressed to that individual student, who receives the answer the next time he signs on. The student also can type HINT (in fact, he need just hit the control key and H)to obtain hints about various exercises and derivations. Not all exercises have hints stored with them, but many do at the present time. This is one feature of the course we continue to expand. There are other features of a similar nature about which I shall not give details. It has been the experience of most people that commands of the sort I have just been discussing are desirable for smooth working of a course, in particular, if the desire is to reduce the amount of administrative supervision that must be provided by teaching assistants. Our long-term objective is to make the course as self-contained as possible, and we continue to introduce new
196
PATRICK SUPPES
features aimed at realizing this goal. One new feature in this respect is Browse Mode, which the student enters by typing a control key and B. This mode allows the student to review exercises he has already worked at or to look ahead at the curriculum. Detailed instructions are given to indicate exactly what it is that he wants to see, either from the past or in terms of what he will be encountering in the future. The course makes extensive use of audio, and some of the results are discussed in more detail in Section 4.2. I do mention here that one of the optional features of the course is the ability of the student to control the speech rate. The logic course is offered each of the regular three terms during the academic year at Stanford and for several years has been the only offering in elementary logic. The annual enrollment in the three terms runs somewhere between 240 and 300 students, which is somewhat higher than the enrollment before the course was made computer-based, although there has also been a small increase in the number of Stanford undergraduates during this same period. It should also be mentioned that the enrollment is restricted. Twelve terminals are devoted to the course, and this number will handle about 100 students per term. The terminals are not scheduled but are generally available for students six days a week, 24'hr a day. We have found that students sort themselves out in terms of hours of availability fairly well and, although students occasionally must wait to get access to terminals, there has been no general request that a signup procedure be followed. A sample of 20 students taking the course in the fall of 1978 indicated that the most preferred features of the course were self-pacing and freedom to work at any time of day or night. Although a clear majority said that the course took more time than other Stanford courses they had taken, about 70% of the students gave the course a value of 6 or 7 on an overall satisfaction scale ranging from 1 (not satisfied) to 7 (very satisfied), and no student gave it a value below 4. The computer system running the logic course is a dual processor PDP-KIlO using TENEX as an operating system. The terminals are Datamedia-2500~,and earphones are available at each terminal. Thus, a student station consists of a Datamedia terminal and earphones. 3.3.3 Set Theory at Stanford
The same computer system, just described, and three additional Datamedia-2500 terminals, also equipped with earphones, are used for teaching axiomatic set theory at Stanford. The curriculum of the course in set theory is classical; it follows closely
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
197
the content of my earlier book (Suppes, 1960). The course is based on the Zermelo-Fraenkel axioms for set theory. The first chapter deals with the historical context of the axioms; the next chapter deals with relations and functions. The course then concentrates on finite and infinite sets, the theory of cardinal numbers, the theory of ordinal numbers, and the axiom of choice. Students who take the course for a Pass stop proving theorems at the end of the chapter on the theory of cardinal numbers. Those who go on for a letter grade of A or B must prove theorems in the theory of ordinal numbers and standard results involving the axiom of choice. Although the conceptual content of the course is classical, the problems we have faced in making it a complete CAI course are not. The logic course just described is in many ways deceptive as a model of how to approach mathematically oriented courses, for the proofs can be formal and the theory of what is required is, although intricate, relatively straightforward compared with the problems of having reasonable rules of proof to match the standard informal style of proofs to be found in courses at the level of difficulty of the one in set theory. The problems of developing powerful informal mathematical procedures for matching the quality of informal proofs found in textbooks are examined in some detail in Section 4.3. There are about 500 theorems that make up the core of the curriculum. The students are asked to prove a subset of these theorems. The number of students is ordinarily between 8 and 12 per term, and therefore individual student lists are easily constructed. Students ordinarily prove between 40 and 50 theorems, depending upon the grade level they are seeking in the course. The latest data for the students completing the course in the fall term, 1978, are as follows: The average number of hours of connect time to complete the course was 52.0, with the minimum being 29.7 hr, and the maximum 75.2. Apart from the challenging technical problems of offering a course like axiomatic set theory entirely as a computer-based course-in this respect it is set up exactly like the logic course described above-there is another strong reason for the significance of the set theory course for further developments in CAI. There has been a tendency in CAI to concentrate on elementary courses that are taken by very large numbers of students, whether at the school or college level. This strong concentration on the most elementary courses I think is a mistake. I was myself pushed into offering the course in axiomatic set theory after not having taught it as a lecture course for a number of years, because of a staff reduction. It seems to me that undergraduate courses of a rather specialized and technical nature will, in many institutions, be offered only rarely, if at all, in the next two decades because of the anticipated declines in enrollment and the
198
PATRICK SUPPES
budget pressures for not increasing staff. One way to offer a variety of specialized courses is to offer them as CAI courses. It is also easy to make a comparative analysis to show the lower cost of such low-enrollment courses when offered by CAI. In my own case, by offering logic and set theory as two separate courses every term, I now have a teaching load that is double the normal one at Stanford. I plan in the future to increase still further the number of such courses. We are currently working on a course in the foundations of probability, and, under the general supervision of my colleague Georg Kreisel, a course in proof theory has already been run experimentally and is now being revised. Both of these courses are at about the level of the course in axiomatic set theory, and both will have anticipated small enrollments. 3.4 Other CAI Courses
There are two promising areas in which a good deal of work has been done but which currently do not have as extensive a range of activities as would be anticipated. These areas are courses in computer programming that are entirely computer-based, and elementary courses in foreign language. 3.4.1 Computer Programming
Various international efforts at computer-aided teaching of programming have been documented in the literature. For example, Santos and Millan (1975) describe such efforts in Brazil: Ballaben and Ercoli (1975) describe the work of an Italian team: and Su and Emam (1975) describe a C A I approach to teaching software systems on a minicomputer. Extensive efforts in CAI to teach BASIC have been undertaken by my colleagues at Stanford (Barrel d.,1974, 1975). A joint effort at Stanford was also made to teach the initial portion of the course in LISP by CAI methods (Suppes et d., 1977). On the other hand, at the time of writing this chapter, I know of no courses in computer programming that are taught entirely by CAI and that have anything like the total number of individual student hours at terminals comparable to the logic course described above. It may be that I am simply unaware of some salient experiments in this matter, but it does seem that the use of CAI for total instruction in computer programming is not nearly as developed as would have been anticipated 10 yr ago. The enrollment in the set-theory course, for example, is ordinarily, as I have indicated, between 8 and 12 students per term, with an annual enrollment of about 30 students, which is more than the annual enrollment in the years before the course became computerbased.
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
199
3.4.2 Foreign Languages
In the period prior to that covered by this chapter, there were extensive experiments in the elementary teaching of foreign languages by CAI methods. Work at Stanford in the 1960s especially centered around the teaching of Slavic languages and was conducted primarily by Joseph Van Campen and his colleague Richard Schupbach. In those efforts at Stanford and elsewhere some 10 years ago, there was considerable federal support for language work of all kinds, including CAI approaches. Ten years later, the amount of such support is quite limited and the CAI developments are surprisingly restricted. At the time of writing this chapter, for example, the only major activity at Stanford is the development of a course in Armenian, which is being supported by private sources. Instruction in Armenian has not been regularly given at Stanford, and a CAI approach provides an opportunity to offer it regularly without requiring the presence on the faculty of a native speaker of Armenian. It is also hoped that the work being undertaken now can be generalized to the teaching of Armenian for students in elementary and secondary schools, because of the strong desire of Armenian communities in a number of places in the United States to maintain the linguistic and cultural traditions associated with speaking and understanding the language. A recent effort at Stanford was made by E-Shi Wu (1978) to teach elementary Mandarin, mainly orally and mainly by telephone, using a touchtone response pad as the only means of response. Wu's work shows that a great deal of foreign language instruction can be brought into the home within a CAI framework. As hardware continues to become cheaper, it is likely that such efforts will move from the research stage to ones that are operational in character. But it is true, all the same, that considering the activities 10 yr ago the range of CAI work in the teaching of foreign languages is more limited than would then have been anticipated. M y own judgment is that it is not some conceptual resistance to CAI as a method of teaching foreign languages but rather the severe restrictions on research and instructional budgets characteristic of the late 1970s that have been the limiting factor on current developments.
4. Current Research
In this section I analyze some of the main areas of current research most significant for CAI. The first concerns natural-language processing: the second, the use of audio; the third, informal mathematical procedures; and the fourth, efforts at modeling the student.
200
PATRICK SUPPES
An important topic that I have not covered here or in other parts of the chapter is an assessment of current work on authoring languages for CAI. It is my own view that approaches are still changing too much to warrant a review here. There are also complex problems of assessing the language features needed for different kinds of courses. The requirements, for example, for teaching foreign language are rather different from those for teaching undergraduate mathematics. I have also not attempted to survey current hardware activities, for CAI has as yet depended very much on general hardware developments in the computer industry. The only immediate case at hand that may turn out to be an exception to this rule is the development of audio, which is discussed in Section 4.2. Still more uncertain is the role that will be played by videodisks in CAI. I mention something briefly about this in Section 5 , but there is no current activity sufficiently developed to warrant analysis at this time. 4.1 Nat ural-Language Processing
Without doubt, the problems of either accepting natural-language input
or producing acceptable informal natural-language output constitute some of the most severe constraints on current operational efforts and research in CAI. It is fair to say that there have been no dramatic breakthroughs in the problems of processing English, as either input or output, during the period covered by this chapter. Moreover, these problems are not simply a focus of research in CAI but have wider implications for many diverse uses of computers. No doubt the current intensive efforts at developing and marketing sophisticated word processors for office use will have in the next decade an impact on the level of natural-language processing that can be implemented efficiently and at reasonable cost in hardware that is just becoming available. All the same, during the period covered by this chapter, the difficulties of adequately inputting or outputting natural language by a program run by a computer, no matter how powerful, have become apparent to all who are seriously engaged in thinking about or trying to do something about the problem. From a theoretical standpoint, linguists have come to realize that syntax alone cannot be a satisfactory conceptual basis for language processing, and model-theoretic semanticists represented by logicians and philosophers have come to recognize how far any simple model-theoretic view of the semantics of natural language is from the intricate and subtle details of actual informal usage. In addition, the romantic hope of some computer scientists that the theoretical problems can be bypassed by complicated programs that do not have a well-articulated theoretical basis in syntax and semantics, as well as pragmatics, has also been dashed. Winograd’s doctoral dissertation was pub-
TRENDS IN COM PUTER-ASSISTED INSTR UCTl ON
201
lished in 1972, and the bright hopes that it seemed t o raise for rapid progress in understanding natural language have certainly not been realized. Perhaps the most instructive thing that can be said is that we are much more aware of the difficulties now than we were at the beginning of the 1970s. As in the case of Winograd, some of the particular examples have been impressive. A good instance is Woods’ (1970, 1974) development of transition network grammars with applications to particular subject matters, for example, that of rock samples from the moon. Woods’ work, like Winograd’s, was not directly oriented toward CAI. The same can be said for the work of Schank and associates (Schank et a / . , 1972; Schank, 1973, 1975). The convergence of syntax and semantics at the theoretical level is well exemplified by the book on Montague grammar edited by Barbara Partee (1976), but this has not led to extensive computer implementations. The only example known to me of implementing Montague grammar has been that by Joyce Friedman, which is still in a rather primitive state. My own work on variable-free semantics has been aimed at matching the syntax of natural language as closely a s possible by a relation-algebraic semantics (Suppes, 1976; Suppes and Macken, 1978; Suppes, 1979), but implementation of such a variable-free semantics, although it is aimed at efficiency of computation, has not yet been made. In our work at Stanford on producing an increasingly informal language for mathematical proofs, some progress has been made, but I shall reserve discussion of that until I deal specifically with informal mathematical procedures in Section 4.3. Goldstein and Papert (1977) write in an optimistic way about the application of artificial intelligence to comprehension of language, although they do not make specific new technical proposals and they are more interested in representing a general viewpoint for further research than furthering the research itself. There is no doubt that there is something quite positive to be said about the viewpoint expressed by them, but it is also clear that there is a very large distance between the hopes they express and their serious realization in completed work. Because the central problems of natural-language processing are in no sense special to educational applications of computers, it does not seem appropriate t o try to give a more detailed sense of what the outstanding conceptual problems are at the present time. It should be clear, however, how important it is for CAI to have available a much more sophisticated level of natural-language processing than is now the case. 4.2 Uses of Audio
The importance of spoken speech in instruction has been recognized from time immemorial. The earliest articulate and sophisticated advocacy
PATRICK SUPPES
of the importance of spoken dialogue as the highest form of instruction is in Plato’s dialogue Phaedrus, where Socrates criticizes the impersonal and limited character of written records as a means of instruction. The experiments on-the use of audio for CAI at the Institute for Mathematical Studies in the Social Sciences at Stanford are among the most extensive in the world and, because of my own close association with them, can most easily be reported here. However, I emphasize that the use of audio in CAI is the focus of continued work at other centers as well; it was, for example, a part of the PLATO and TICCIT projects, although the use of audio in the PLATO project turned out to be rather limited. Ten years ago the Institute had four different audio systems running simultaneously, but for a number of years now the concentration has been on digitized audio and, since 1974, on the construction and use of the MISS machine, a microprogrammed, intoned speech synthesizer utilizing linear predictive coding techniques. Detailed technical descriptions of the MISS machine are to be found in Sanders et a/. (1976) and Levine and Sanders (1978). As part of the Institute’s computer facility, at the heart of which is a dual processor DEC-KIIO running the TENEX operating system, the MISS hardware provides 48 audio channels, of which 16 can be speaking simultaneously. The MISS system is capable of reproducing speech in two ways, either by resynthesizing recorded phrases or by resynthesizing and concatenating individually recorded words. The system also allows for user modification of the speech rate and volume as well as the instructionally important feature of synchronizing speech events with display activity at terminals. Prosodic adjustment of concatenated words is directed by linguistic text-analysis routines and is done by modifying fundamental frequency contours, duration, and amplitude of the individually recorded words. The importance of the ability to synthesize new messages from individually stored words rather than recording entire messages cannot be overemphasized. This makes curriculum creation as well as revision very much more efficient. In addition, it permits the introduction of audio messages that are synthesized contingently to comment on particular features of student work; a typical application in the Institute is the delivery of error messages in the courses on logic and set theory. Linear predictive coding (LPC) is a well-documented technique for digital representation of speech (see Atal and Hanauer, 1971; Makhoul, 1975; Markel and Gray, 1976); in simple terms, LPC models the spectrum and voicing features of speech. Spectral representation can be compressed and economically stored for later use. The research at the Institute has been guided by the requirement that the speech produced must be of sufficiently high quality to allow a student to listen to it without a feeling of
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
203
strain for considerable periods of time, and it also must be natural enough not to seriously divert the student’s attention from the conceptual material being presented. A representation that requires about 9000 bits per second has been found to approximate these conditions. The details are given in the publications mentioned, but an important point to mention here is that the quality of the recording of individual words or of entire phrases is quite satisfactory at this rate. What still requires research is the development of appropriate algorithms for the imposition of synthetic prosodic features when messages are synthesized from the LPC coding of individual words. The present synthetic prosody is acceptable but certainly can be considerably improved (for current efforts, see Sanders et a/., 1978). The remainder of this subsection is devoted to describing two relatively elaborate experiments that have been run to evaluate the MISS system from the standpoint of potential CAI users. The first experiment concerns the intelligibility of computer-generated speech for elementary-school children, and the second the attractiveness of the speech in comparison with visually displayed information for university-level students. 4.2.1 Letter Recognition by Elementary-School Children
The Institute has a long history of research-and-development interest in CAI in initial reading for elementary-school students: most of that work was directed by Atkinson (see Atkinson, 1968: Atkinson et a/., 1973: Atkinson and Hansen, 1966; Fletcher and Atkinson, 1972). Although the research on reading itself has not been continued during the past five years, we have felt it important to test the viability of the MISS system and other related methods of computer-generated speech for use with young students. What I describe here is part of a larger experiment dealing with recognition of both letters and words by Laddaga et a/. (1978). In teaching initial reading, it is important that computer-generated speech be able to produce letter sounds that can be recognized easily by the students. It is such a letter-recognition experiment that is described here. The subjects of the experiment consisted of 48 first graders from three classes in the area near Stanford. Twelve students from each class made up three treatment groups; four additional students from each class made up the control group of 12 students. Each treatment group received seven sessions of taped, computer-generated speech, and an eighth session with taped human speech. The control group received eight sessions of taped human speech. The sessions consisted of 9 to 12 children listening to a letter sound in the carrier phrase “circle the letter.” After hearing the letter, the children circled the grapheme for the letter sound t h e y heard from among three choices on an answer sheet. The two confusion choices
PATRICK SUPPES
204
came from two sets of letters that were used on alternate sessions. The test letters and their confusion sets are shown in Table VIII. Each session covered the alphabet in one of eight random orders. Approximately 6 sec after each item was presented, the subjects heard a beep and the correct answer was displayed on a flash card. An entire session took no more than 6 min. Three different systems for computerized speech synthesis were used for this experiment, One system was a sophisticated phonemic synthesizer developed by Jonathan Allen and Dennis Klatt at M.I.T. (see Allen, 1977; Klatt, 1976). The M.I.T. system converts text by rule into control parameters for the synthesizer and thus into speech. The tapes were prepared at M.I.T. specifically for test in this experiment. The second system was the VOTRAX VS-6 system produced by Votrax, a division of the Federal Screw Works, Inc. (see VOTRAX, n.d.). The VOTRAX system is also a phonemic synthesizer similar in certain respects to the M.I.T. synthesizer, but the phonetic control parameters are generated by hand rather than by rule and consequently allow less control over the allophones than the M.I.T. system. The third system was the Institute’s MISS system described above. The control system was a high-quality, high-fidelity recording of a human speaker. It needs to be recognized that this is a very severe test on any speech generator, for individual letters in isolation are not easy to recognize. The results were actually quite positive. The mean correct scores for each session fell between 83 and 98%. The specific data are shown in Table IX. TEST
TABLEVIII LETTERS WITH CONFUSION SETS
Confusions Test letter A B C D E F G H I J K L M
Set 1
I E S T A S Z A Y G Q F N
K P T C P X B K E H J R L
Confusions
Set 2 1 E S T A S Z A Y G Q F N
J D Z B B R D J A K H X R
Test letter N 0 P
Q R S
T U V W X Y Z
Set 1 M U E U N F D Q Z U S I V
X H B K F C G o E D R W I
Set 2 M U E U N F D Q Z U S I
v
L J T P L X P W F O K H
c
TABLEIX MEAN(PERCENT CORRECT), STANDARD DEVIATION, A N D NUMBER OF STUDENTS I N LETTER EXPERIMENT VOTRAX
No. of students
Mean
S.D.
88.5 88.5
8.52 15.9 7.38 5.97 3.28 3.71 6.58 4.94
Session
Mean
S.D.
students
Mean
S.D.
1
8.52 11.97 6.45 7.71 3.52
12 12
88.8 88.5
5.13 5.51
12 9 9 9
90.6 90.6 92.1 92.0 93.1
8.5 10.24 4.45 5.02 5.55 5.79 4.11
11 11
6 7
86.9 83.7 88.8 84.3 91.5 83.3 85.0
Control
98.3
2.63
9
95.8
4.48
I1
2 3 4 5
11
LPC
M.I.T.
No. of
11
11 11 11 11
89.9 93.1 %.2 94.8 95.1 95.1
Control Mean
No. of S.D. students 5.21 4.14 2.46 2.48 3.63 3.32 4.42
12
11 11 11 11
96.2 91.1 97.8 98.1 91.4 91.8 97.6
11
%.9
2.15
11
No. of students 11 11 11
12
12 12 12 12 11
206
PATRICK SUPPES
The variances were relatively large so there was not sharp separation, but there was across sessions a regular superiority in terms of intelligibility of the systems and a sign test was used to examine this feature of the data. As might be expected, the control system, that is, the direct high-fidelity recording of a human speaker, scored highest; both the M.I.T. system and the MISS system scored higher than Votrax in seven sessions; the MISS system scored higher than M.I.T. in four sessions, lower in two sessions, and the same in one session, as can be seen from Table IX. What is encouraging is the relative acceptability of the systems tested for instructional purposes. Further details of the experiment, including a study of the learning that took place during the experiment, are to be found in the article cited. 4.2.2 Experiment
on Choice of Audio by University Students
Since 1972, Introduction to Logic has been taught at Stanford strictly as a CAI course, and, as indicated in the earlier discussion, the number of students is presently between 240 and 300 per year. For several years now, a variety of experiments have been conducted on the choice of audio by the students. I report here two experiments in the winter and spring terms of the academic year 19761977 in which the students were given a choice between audio and visual display of the same information. It is important to be clear on the choice offered the students. All students received considerable visual information at CRT terminals. In addition, those choosing audio received explanation of concepts and a variety of informal pedagogical comments through earphones. Those choosing visual display received this additional information on the CRT screen as written text. The students were required to try each method of receiving information at the beginning of the course, but then they were given freedom to choose at each sign-on to receive information in audio or visual form. Extensive details of these experiments will be published elsewhere, but the summary data are as follows. In the winter of 19761977, after the initial exposure period when freedom of choice was in operation, there were 1287 log-ins by students in the course. In 49% of those log-ins, the students selected audio, but there was a significant decline from the beginning to the end of the course in the use of audio, that decline running from 59 to 39%. In the spring of 1976-1977, there were 2742 log-ins of a similar nature, and audio was selected 48% of the time. Again, there was a decline during the course, from 58 to 44% in this case. We are not certain of the reason for the decline but we think that, in the latter part of the course, audio was used less frequently because students were more familiar with the content and found they could work
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
207
more quickly with the visually displayed information. It is also to be emphasized that Stanford undergraduates who are using this course are highly selected students of considerable academic ability. It is not at all clear that the percentage data reported here would be at all similar if students of less general academic aptitude were presented the same choices. I will not attempt to present further details of the data here, but an important feature is the existence of strong individual differences in choice of sensory modality. In other words, the students who selected audio or visual display methods tended to persist in that same choice throughout the course, with some decline, as indicated, in the choice of audio. What this indicates is rather strong individual differences in preferences for audio or visual displays. It has been one of the traditional features of C A I to stress accommodation to such individual differences, but as far as I know, such extended experiments on choice of sensory modality have not previously been conducted. Several experiments have already been performed and more are planned on choice between synthetic prosody and fully recorded messages, as well as data on the choice between synthetic prosody and visual displays. But quite apart from the work at Stanford with which I am particularly familiar, it is clear that there will be extensive experiments at a number of research centers on the use of audio in CAI, and it may be confidently predicted that in the coming decade, extensive use of audio will be one of the most salient features of the next generation of CAI systems. 4.3 Informal Mathematical Proofs
Work on constructing programs and using them as a basis for courses in logic and mathematics has a long history at the Institute for Mathematical Studies in the Social Sciences at Stanford. It has been my responsibility to have overall direction of this work since 1963 when the first efforts began, but as in any enterprise of this kind much of the real substantive work has been done by my younger collaborators. C A I teaching of elementary logic in the schools goes all the way back to first demonstrations in 1963. C A I teaching of logic at the university level began in serious fashion at Stanford in 1972. Currently the only way in which Stanford students may take the introductory logic course or the intermediate-level course in axiomatic set theory is at computer terminals as C A I courses. In the present section, I concentrate on the problems of having working informal mathematical procedures at the level of axiomatic set theory in an undergraduate version. A standard presentation of this material is to be found in my textbook (Suppes, 1960). The complexity and difficulty of the proofs in
208
PATRICK SUPPES
this book are comparable to those in many textbooks on different mathematical topics written for upper undergraduate courses. A description of the developments as of 1975 is to be found in Smith et al. (1975) and Suppes (1975). The main features of the system in 1975 were the use of a tautology rule, which automatically checked the validity of any purely sentential formula, the use of the similar Boole rule for Boolean expressions, a system of natural deduction, and the use of a resolution theorem prover to do routine inferences from one intuitively obvious step to another in a proof. In providing some idea of the number of steps required for proving theorems, the following computation is conservative. In my 1960 textbook on axiomatic set theory, there are approximately 500 theorems, which is about the same number as in the computer-based course. Almost all of the 500 theorems have now been proved on the system, and we have made a concerted effort to deal with some of the most difficult. The average length of theorem is certainly less than 10 lines or 10 steps, and therefore the entire body of material could be proved in something less than 5000 steps. The most difficult theorems, for example, theorems justifying transfinite recursion or a theorem that is used crucially in proving equivalence of the axiom of choice-Bernays’ enumeration theorem, namely, that for any set A there is an ordinal that is equipollent to A-required some 200 steps. A good example of how the proof system worked as of 1975 is given in the article by Smith er al. (1975). They give a proof of the classical Schroder-Bernstein theorem that if one set is weakly less pollent than another and conversely, then the two sets are equipollent. This theorem, deceptively simple in formulation, is relatively difficult to prove and is, in fact, the first really nontrivial theorem ordinarily proved in a first course on axiomatic set theory. In the Institute’s informal proof system as of 1975, the proof of this theorem given in the above publication took 26 lines. Although the appearance is somewhat complicated, it is important to give a sense of how the interaction takes place in constructing a proof. I give as an example of moderate difficulty from the set theory course the proof of Cantor’s theorem that any set A is strictly less pollent than its power set. The interactive version is followed by the full summarization version, called “review.” There are several things to be said about this interactive proof. First, the input by the student is minimized. The actual input by the student is italicized below, but of course it must be recognized that the input is taking place at a CRT terminal. What I have reproduced here is a printed analog. The command VERIFY is used to call the resolution theorem prover mentioned. Note that axioms, theorems, and definitions can be called either by number or name. Both kinds of exam-
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
209
ples occur. The abbreviation WP, which occurs in line 2, is a command permitting the introduction of a working premise or assumption. Note that midway through the proof the student calls for a review in order t o see the work done, which is summarized in the first four lines. This review function is important in order for the student t o be able to get at any point a clear view of what has been proved thus far. Derive: A < pow(A) *VERIFY ( I ) *A 5 pow(A) Will you wish to specify? (No) * Using *lHEOREM (Number or Name) *4.1.10 Using *tHEOREM (Number or Name) *4.2.1 Using *defINITION (Number or Name) *map Using *defINITION (Number or Name) *injecrioN Using *gO *wp (2) * p o w ( A ) S A
*21et (variable) *f be such that (3) *injcf) and d o m u ) = p o w ( A ) and m g c f ) is a subser o f A Will you wish to specify? (No) * Using *rHEOREM (Number o r Name) *4.2.1 Using *defINITION (Number o r Name) *map Using *defINITION (Number o r Name) *injectioN Using *gO *let (variable) *D be such that (4) * ( Ax)(x in D i$x in Z and x is not in i n v ( f ) ( x ) ) Will you wish to specify? No * Using *aXIOM (Number or Name) *sepARATION Schema: (E B)(A x)(x in B <-> x in A & FM(x)) Replace for FM *x is not in invCf)(x) Which variable indicates the parameter places? *x Using *gO *rev IE W Derive: A < pow(A) VERIFY Using: Th. 4.1.10, Th. 4.2.1, Df. map, Df. injection (1) A 5 pow(A)
210
PATRICK SUPPES
WP (2) Pow(A) IA 2 L E T Using: Th. 4.2.1, Df. map, Df. injection (3) Inj(f) and dom(f) = pow(A) and rng(f) sub A LET Using: Ax, separation Instance: x not in inv(f)(x) for FM (4)(A x)(x in D <--> x in A & x not in inv(fHx)) *3,4vERIFY ( 5 ) *D in dornCf) Will you wish to specify? (No) * Using *tHEOREM (Number o r Name) *powERSET Using *defINITION (Number o r Name) *subSET USING *go *3,5vERIFY (6) *f(D)in A Will you wish to specify? (No) * Using *defINITION (Number o r Name) *subSET Using *tHEOREM (Number or Name) *range Using *rHEOREM (Number o r Name) *3.10.9 Using * g o *3,5tHEOREM (Number or Name) *3.10.58 3,5 Th. 3.10.58 (7) Inv(f)(f(D)) = D *4,6,7conTRADICTION Will you wish to specify? (No) * Using *go 4,6,7,2 CONTRADICTION (8) Not pow(A) 5 A *1,8defINITION (Number or Name) *less POWER 1,8 Df. less power (9) A < pow(A) Finally, here is the review version of the complete proof. revIEW A < pow(A) VERIFY Using: Th. 4.1.10, Th. 4.2.1, Df. map, Df. injection (1) A 5 pow(A) WP (2) Pow(A) 5 A 2 L E T Using: Th. 4.2.1, Df. map, Df. injection (3) Inj(f) and dom(f) = pow(A) and rng(f) sub A L E T Using: Ax. separation Instance: x not in inv(f)(x) for FM (4)( A x)(x in D < - > x in A & x not in inv(f)(x)) 3,4 VERIFY Using: Th. powerset, Df. subset ( 5 ) D in dom(f) 3,5 VERIFY Using: Df. subset, Th. range, Th. 3.10.9 (6) f(D) in A
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
21 1
3 3 Th. 3.10.58 (7) Inv(f)(f(D)) = D
4,6,7,2 CONTRADICTION (8) Not pow(A) IA 1,8 Df. less power' (9) A < pow(A)
Developments since 1975 are summarized in Blaine and McDonald (1978). The main improvements on the system are the use of more natural and more powerful facilities replacing simply the use of a resolution theorem provedearlier, more student aids such as an extended HELP system, and the use of more informal English in the summarization of proofs. These new facilities are illustrated by the output of the informal summary or review of a proof for the Hausdorff maximal principle. It is a classical exercise required of students in the course to prove that the Hausdo& maximal principle is equivalent t o the axiom of choice. What is given here is the proof of the maximal principle using Zorn's lemma, which has already been derived earlier from the axiom of choice. Hausdorff maximal principle: If A is a family of sets, then every chain contained in A is contained in some maximal chain in A. Proof Assume (1) A is a family of sets. Assume (2) C is a c'hain and C C A. Abbreviate: { B: B is a chain and C C B and B C A} by: C!chns. By Zorn's lemma, (3) C!chns has a maximal element. Let B be such that (4)B is a maximal element of C!chns. Hence, (5) B is a chain and C C B and B C A. It follows that, (6) B is a maximal chain in A. Therefore, (7) C is contained in some maximal chain in A.
This summarized proof would not be much shorter written in ordinary textbook fashion. It does not show the use of the more powerful inference procedures, which are deleted in the proof summarization, but the original
21 2
PATRICK SUPPES
interactive version generated by the student did make use of these stronger rules. The current system, called EXCHECK, is a definite improvement on the one described in Smith et a/. (1975), but there is still a great deal to be done before we shall be satisfied with all of its features. The informal English output can certainly be improved upon in terms of naturalness and fluency. What is probably more important, additional substantial gains are needed to make the handling of proofs efficient, flexible, and easy for the students. All of the procedures implemented in EXCHECK are meant to be used by persons who have no prior programming experience or even contact with computers. Moreover, the procedures need to be such that they can be explained rather easily to students beginning a course and of such a character that their use does not interfere with the students’ concentrating on the concepts that are central to the actual content of the course. It is easy to think of specific features that would improve the present procedures, especially those that embody particular features of set theory as opposed to general logic. It seems unlikely that any deep new general discoveries about proof procedures will be found that will apply across quite different domains of mathematics. As in the case of other parts of artificial intelligence, it seems much more reasonable to conjecture at the present time that the procedures used will need to deal in detail with the content of specific areas of mathematics. Thus, for example, some rather different procedures will need to be implemented for a good course in geometry or in number theory, even though the general procedures will also need continued modification and improvements. In order to give the discussion definiteness, I have concentrated on the few courses we have been developing at Stanford. It is obvious, on the other hand, that conceptual development of informal mathematical procedures at a level that makes them easy to use by undergraduate students of mathematics and science has much wider implications for CAI. No doubt, as I just indicated, specific subject matters will require specific study and specific procedures, but the general framework or approach should be applicable to a wide variety of courses that are mathematically based. This applies not only to courses in pure mathematics but also to many courses in particular sciences and disciplines that are closely related to mathematics, such as mathematical statistics, computer science, and operations research. 4.4 Modeling the Student
From the beginning of educational theory about instruction there has been a concern to understand what is going on in the student’s mind as he learns new concepts and skills. This attitude of many years’ standing is
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
213
well exemplified in the following quotation from John Dewey’s famous work, Democracy and Education (1916, quotation from 1966 edition). We now come to a type of theory which denies the existence of faculties and emphasizes the unique role of subject matter in the development of mental and moral disposition. According to it, education is neither a process of unfolding from within nor is it a training of faculties resident in mind itself. It is rather the formation of mind by setting up certain associations or connections of content by means of a subject matter presented from without (p. 69).
With the powerful opportunities for individualization present in CAI, there has been an increased concern to model the student in order to have a deep basis for individualization of instruction. Before considering current work, it is important to emphasize that concern with individualization is by no means restricted to computer-assisted instruction. Over the past decade, there has been an intensive effort by leading educational psychologists to identify strong effects of aptitude-treatment interaction. What is meant by this is the attempt to show that, by appropriate adaptation of curriculum to the aptitude of a particular student, measurable gains in learning can be obtained. One of the striking features of the recent CAI work reviewed below is the absence of references to this extensive literature on aptitude-treatment interaction. The hope that strong effects can be obtained from such interaction can be viewed as a recurring romantic theme in education-not necessarily a romantic theme that is incorrect, but one that is romantic all the sqme because of its implicit hopefulness for obtaining strong learning effects by highly individualized considerations. Unfortunately, the conclusions based upon extensive data analysis, summarized especially in Cronbach and Snow (1977), show how difficult it is in any area to produce such effects. It is fair to conclude at the present time that we do not know how to do it, and from a theoretical standpoint it is not clear how we should proceed. Keeping these negative empirical results in mind, I turn now to one of the more significant recent research efforts in CAI, namely, the development of what is called intelligent CAI (ICAI), which has as its primary motif the psychological modeling of the student. This work, which is represented in a number of publications, especially ones that are still in technical report form, has been especially contributed to by John Seely Brown, Richard R. Burton, Allan Collins, Ira Goldstein, Guy Groen, Seymour Papert, and a still larger number of collaborators of those whom I have just named. It will not be possible to review all of the publications relevant to this topic, but there is a sufficient consistency of theme emerging that it will be possible in a relatively short space to give a sense, I think, of the main objectives, accomplishments, and weaknesses of the work done thus far.
214
PATRICK SUPPES
It is fair to say that the main objective is to design instructional systems that are able to use their accumulated information to act like a good tutor in being able to construct an approximate model of the student. Of course, this concept of constructing a model of the student means a model of the student as a student, not as a person in other respects. Thus, for example, there is little concern for modeling the relation of the student to his peers, his psychological relation to his parents, etc. The models intended are at the present time essentially rather narrowly construed cognitive models of student learning and performance. This restriction is, in my judgment, a praiseworthy feature. It is quite difficult enough to meet this objective in anything like a reasonably satisfactory fashion. As I have formulated the objective of this work, it should be clear that John Dewey would have felt quite at home with this way of looking at instructional matters. The ICAI movement, however, has a taste for detail and specific developments that go far beyond what Dewey himself was concerned with or was able to produce on his own part or by encouragement of his cohorts in educational theory and philosophy. 4.4.1 Features of ICAl Research
There is a certain number of features or principles of this literature on modeling the student that occur repeatedly and that I have tried to extract and formulate. My formulation, however, is too superficial to do full justice to the subtlety of the surrounding discussion to be found in the various reports by the authors mentioned above. My list consists of seven principles or features. (1) At a general level the research proposed (and it is still mainly at the proposal level) represents an application of information-processing models in psychology, especially the recent use of production systems first advocated by Allan Newell. (2) The fundamental psychological assumption is that the student has an internal model of any skill he is using to perform a task. This internal model is responsible primarily for the errors generated, and few of the actual errors that do occur can be regarded as random in character. This principle corresponds to much of classical psychological theorizing about behavior but the strong emphasis on the deterministic character of the behavior is unusual after many years of probabilistic models of behavior and of learning in general psychology. The authors are undoubtedly romantic and too optimistic about the correctness of their deterministic views, especially about the possibility of proving their correctness, but the detailed applications have generated a great deal of interest and it would be a mistake to devalue the efforts because of disagreement about this point.
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
215
( 3 ) The analysis of errors made by the student leads to insight into the bugs in the student’s model of the procedures he is supposed to be applying. The explicit emphasis on bugs and their detection has been one of the most important contributions of artificial intelligence to the general theory of cognitive processes. Seymour Papert has emphasized the fundamental character of this idea for years. It has been taken up with great success and in considerable detail by the various authors mentioned above, but especially by Brown et al. (1976, 1977). A particularly interesting application, worked out in great detail, to errors made by students in elementary arithmetic is to be found in Brown and Burton (1978). (4) The representation of the diagnostic model of the student’s behavior can best be done by use of a procedural network. The term diagnostic model is used to mean “a representation that depicts the student’s internalization of a skill as a variant of a correct version of the skill’’ (Brown et al., 1977, p. 5 ) . A procedural network is defined as a collection of procedures “in which the calling relationships between procedures are made explicit by appropriate links in the network. Each procedure node has two main parts: a conceptual part representing the intent of the procedure, and an operational part consisting of methods for carrying out that intent” (p. 6). It is, of course, clear from this characterization that the notion of a procedural network is not a well-defined mathematical concept but a general concept drawn from ideas that are current in computer programming. The examples of procedural networks to provide diagnostic models of students’ algorithms for doing addition and subtraction problems are, when examined in some detail, very close to ideas to be found in the empirical literature on arithmetic that goes back to the 1920s. There is much that is reminiscent of the early work of Edward Thorndike, Guy T. Buswell, C. H. Judd, B. R. Buckingham, and others, and somewhat later studies that date from the 1940s and 1950s, such as W. A. Brownell (1953), Brownell and Chazal(1958), and Brownell and Moser (1949). These studies are concerned with the effects of practicing constituent parts of a complex arithmetical skill and especially with the comparison of meaningful versus rote learning of subtraction, Unfortunately, this large earlier literature, which from an empirical standpoint is considerably more thorough and sophisticated than the current work on diagnostic models, is not seriously examined or used in this latter work. All the same, there is much that is positive to be said about the approach of Brown and his associates, and if the models can be developed with greater theoretical sophistication and with greater thoroughness of empirical analysis of their strengths and weaknesses, much can be expected in the future. ( 5 ) It is important lo make explicit a goal structure for the computer tutor and also a structure of strategies to be used by the tutor. The concept of
21 6
PATRICK SUPPES
goals and subgoals has been one of the most fruitful outcomes of a variety of work, ranging from problem solving to computer programming. Traditional behavioral psychology of 20 yr ago did not explicitly introduce the concept of a goal, although of course the concepts of ends and of objectives are classical in the theory of practical reasoning since the time of Aristotle. (The classical source of these matters is the extensive discussion in Aristotle’s Nicomachean Ethics.) An explicit theory of tutors built around the concept of goal structure has been set forth by Stevens and Collins (1977). Much that is said here is sensible and would be hard to disagree with. The difficulty of the research is that at present it is at a sufficiently general level that it is difficult to evaluate how successful it will be either as a basic theoretical concept or as a powerful approach to implementation of CAI. (6) A theory of causal and teleological analysis is needed for adequate development of models of the student’s procedures. There is a long history of causal analysis and, more particularly, of teleological analysis that goes back certainly to Aristotle and that has strong roots in modern philosophy. Immanuel Kant’s Critique of Judgment presents an elaborate theory of teleology, for example. For many years, however, teleological notions have been in disrepute in psychology and, to a large extent, also in biology. For a certain period, even causal notions were regarded as otiose by philosophers like Bertrand Russell. * Fortunately, these mistaken ideas about causality and teleology are now recognized as such and there is a healthy revival of interest in them and in further development of their use. An example of application in the present context is to be found in Stevens et al. (1978), but it is also fair to say that this current literature on ICAI has not carried the constructive literature on causality or teleology to new theoretical ground as yet. There is reason to hope that it will in the future. (7) There is an essential need for programs that have specialists’ knowledge of a given domain; it is not feasible to write universal general programs that will operate successfully across a number of diflerent domains. The programs referred to in this principle are the programs used by the computer tutor. This echoes the theme mentioned in the discussion of informal mathematical proofs in Section 4.3. It is unlikely that simple general principles of tutoring will be found that are powerful enough to operate without a great deal of backup from highly particular programs dealing with
* Here is one of Russell’s more extravagant claims in his famous article on these matters (1913): “The law of causality, I believe, like much that passes muster among philosophers, is a relic o f a bygone age, surviving, like the monarchy, only because it is erroneously supposed to do no harm. . . . The principle ‘same cause, same effect,’ which philosophers imagine to be vital to science, is therefore utterly otiose.”
TRENDS IN COMPUTE R-ASSISTED INSTR UCTlON
21 7
specialized domains of knowledge. As mentioned, this is a point that is emphasized in some detail by Goldstein and Papert (1977). In stating these seven features, or principles, I have only tried to catch some of the most general considerations that have dominated the ICAI literature. There are a number of other interesting concepts, for example, Goldstein’s concept of an overlay model, which is the intellectual basis of his concept of a computer coach. The overlay model is regarded as a perturbation on the expert’s model that produces an accurate model of the student. (See, for example, Carr and Goldstein, 1977.) The ICAI programs that embody the seven principles or features listed above are as yet still relatively trivial, with one exception, namely, SOPHIE, and it remains to be seen to what extent the high ambitions for the development of individualized tutorial programs will be realized as more complicated subject matters are tackled. From an experimental and conceptual standpoint, however, the examples that have been worked out are of considerable interest and certainly represent examples whose complexity exceeds that of most familiar paradigms in experimental psychology. 4.4.2 Four Examples of /CAI
One attractive example is Carr and Goldstein’s (1977; see also Goldstein, 1977)implementation of their concept of a computer approach for the game of Wumpus. They describe the game as follows: The Wumpus game was invented by Gregory Yob [1975] and exercises basic knowledge of logic, probability, decision analysis and geometry. Players ranging from children to adults find it enjoyable. The game is a modern day version of Theseus and the Minotaur. The player is initially placed somewhere in a randomly connected warren of caves and told the neighbors of his current location. His goal is to locate the homd Wumpus and slay it with an arrow. Each move to a neighboring cave yields information regarding that cave’s neighbors. The difficulty in choosing a move arises from the existence of dangers in the warren-bats, pits and the Wumpus itself. If the player moves into the Wumpus’ [sic] lair, he is eaten. If he walks into a pit, he falls to his death. Bats pick the player up and randomly drop him elsewhere in the warren. But the player can minimize risk and locate the Wumpus by making the proper logistic and probabilistic inferences from warnings he is given. These warnings are provided whenever the player is in the vicinity of a danger. The Wumpus can be smelled within one or two caves. The squeak of bats can be heard one cave away and the breeze of a pit felt one cave away. The game is won by shooting an arrow into the Wumpus’s lair. If the player exhausts his set of five arrows without hitting the creature, the game is lost (p. 5 ) .
The overlay modeling concept of Goldstein was already mentioned above. The simplified rule set of five reasoning skills for analysis of the overlay model of a given student is exemplified in the following five.
21 8
PATRICK SUPPES
L1: (positive evidence rule) A warning in a cave implies that a danger exists in a neighbor. L2: (negative evidence rule) The absence of a warning implies that no danger exists in any neighbors. L3: (elimination rule) If a cave has a warning and all but one of its neighbors are known to be safe, then the danger is in the remaining neighbor. PI: (equal likelihood rule) In the absence of other knowledge, all of the neighbors of a cave with a warning are equally likely to contain a danger. P2: (double evidence rule) Multiple warnings increase the likelihood that a given cave contains a danger. Overlay models are then characterized in terms of which of these five rules has or has not been mastered. The details of the model are undoubtedly ephemeral at the present time and will not be recapitulated here. The rules just cited do affirm the proposition that the expert programs at the basis of the construction of a computer tutor must be specific to a given domain of knowledge, in this case, knowledge of Wumpus. A second attractive example is the construction of a computer tutor to help students playing the PLAT0 game “How the West Was Won,” a game constructed to provide drill and practice on elementary arithmetical skills in an enticing game format. This game is played with two opponents, the computer usually being one of them, on a game board consisting of 70 positions with, in standard fashion, various obstacles occurring along the route from the first position to the last position. The object of the game is to get to the last position, represented by a town on the map, which is position 70. On each turn the player gets three spinners to generate random numbers. He can combine the values of the spinners, using any two of the four rational arithmetic operations. The value of the arithmetic expression he generates is the number of spaces he gets to move. He must also, by the way, compute the answer. If he generates a negative number, he moves backwards. Along the way there are shortcuts and towns. If a player lands on a shortcut, he advances to the other end of the strip he is on. If he lands on a town, he goes on to the next town. When a player lands on the same place as his opponent, unless he is in a town, his opponent goes back two towns. To win, a player must land exactly on the last town. Both players get the same number of turns, so ties are possible. It is apparent that an optimal strategy for this game is a somewhat complex matter and therefore there is plenty of opportunity for a tutor to improve the actual strategies adopted by students. A relatively elaborate diagnostic model of the sort described above in a general way has been developed for this and is discussed in several publications. The first and most substantial one is Brown et al. (1975b).
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
21 9
A third attractive and at the same time considerably more substantial example, from a pedagogical standpoint, is SOPHIE, which is an operational ICAI system designed to provide tutoring in the domain of electronic troubleshooting (Brown et al., 1975a). As described by Brown et a/. (1976), the kernel system called the SOPHIE lab "consists of a large collection of artificial intelligence programs which use a circuit simulator to answer hypothetical questions, evaluate student hypotheses, provide immediate answers to a wider variety of measurement questions, and allow the student to modify a circuit and discover the ramifications of his modifications. To enable students to carry on a relatively unrestrained English dialogue with the system, the SOPHIE lab has a flexible and robust natural language front-end'' (p. 4). The authors describe several experiments and, in fact, provide one of the few examples in this literature of an attempt at relatively detailed evaluation, although it is scarcely extended or very deep by more general standards of evaluation. One point that the authors stress that is of some interest is that they do not see a conflict between sophisticated ICAI systems and more traditional frameoriented CAI, for they see the latter offering standard exposition of instructional material and the ICAI system providing sophisticated individual tutoring in what corresponds in the case of SOPHIE to actual troubleshooting exercises. The learning environment added on top of the SOPHIE lab consists of two main components. One is called the Expert Debugger, which can not only locate faults in a given simulated instrument, but more importantly can articulate exactly the inferences that lead to the location. It can explain its particular top-level troubleshooting strategy, the reason for making a particular measurement, and what follows from the results of the measurement. The second instructional subsystem added is a troubleshooting game that permits one team to insert an arbitrary fault and requires the other team to locate this fault by making appropriate diagnostic measurements. An interesting requirement for the team that inserts the fault is that it must be able to predict all of its consequences, such as other parts blowing out, and also be able to predict the outcomes of any measurement the diagnosing team requests. The preliminary data reported in Brown et af. (1976) show that there is considerable enthusiasm on the part of the students for the kind of environment created by SOPHIE. The number of students with whom the system has yet been tried is still small, and it is not really operational on a large scale, but certainly SOPHIE must be regarded as one of the most promising developments to come out of the ICAI movement. A fourth and final example to be reviewed here is the development of
220
PATRICK SUPPES
diagnostic models for procedural bugs in basic mathematical skills by Brown and Burton (1977), referred to earlier. This work especially attempts to implement procedural networks as described in a general way and about which some remarks were made specific to arithmetical skills. Two applications of this work show considerable promise. One is the development of an instructional game called BUGGY for training student teachers and others in recognizing how to analyze the nature of student errors. The program simulates student behavior by creating an undebugged procedure, and it is the teacher’s problem to diagnose the nature of the underlying misconception. He makes this diagnosis by providing strategic test exercises for the “student” to solve. The computer program also acts as arbiter in the evaluation of the validity of the hypothesis of the teacher. When the teacher thinks he has discovered a bug, he is then asked to describe it, and to make sure that his description has the proper analytical character, he is asked to answer a 5-exercise test in the same way that he thinks the “student” would. An experiment with a group of undergraduate education majors using BUGGY as the vehicle for teaching the ability to detect regular patterns of errors indicated significant improvement as a result of this experience. More extensive experimentation would be required to estimate the full significance of the use of BUGGY in comparison with more traditional methods of discussing the nature of student errors, as reflected in the kind of literature going back to the 1920s referred to earlier. A second application of the diagnostic modeling system for procedural bugs was to a large database collected in Nicaragua as part of the Radio Mathematics Project (Searle et al., 1976). This system was quite successful in diagnosing in a patterned fashion a large number of the errors made by more than 1300 school students in answering more than 20,000 test items. The program was, in some sense that is difficult to make completely precise, successful in diagnosing a large number of the systematic errors, but what is not clear is what gain was obtained over more traditional methods of analysis of sources of error. For example, the most common bug identified was that when borrowing is required from a column in which the top digit is zero, the student changes the zero to a nine but does not continue borrowing from the next column to the left. This is a classical and well-known source of error of students doing column subtraction problems. The formulation given here does not seem to offer any strong sense of insight beyond the classical discussions of the matter. A more dubious proposal of the authors is that the characterization of errors given by the program BUGGY is a “much fairer evaluation” than the standard method of scoring errors. The concept of fairness is a complicated and subtle one that has had a great deal of discussion in the theory
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
221
of tests. The cavalier nature of this judgment is something that is too often present, and it is a negative aspect of the romantic features of the ICAI literature. 4.4.3 Weaknesses of /CAI Work
The four examples I have described, especially the last two, show the potential for ICAI to set a new trend for computer-assisted instruction in the decade ahead. Much has been thought about and something has been accomplished of considerable merit. I have tried to state what I think those merits are. I would like to close by formulating some of the weaknesses present thus far in the ICAI work. (1) The claims for the potential power of ICAI must mainly be regarded as exaggerated in the absence of supporting empirical data of an evaluative sort. The authors of the various reports referred to seem, in the main, to be unaware of the subtle and complicated character of producing new curricula organized in new ways so as to produce substantial evidence of learning gains. After the efforts that have been devoted to such matters thus far, one expects discussions of these matters in the closing decades of the century to be at once skeptical, detailed, and precise. (2) In spite of the interest in student learning, there has been little effort to develop a theory of learning in connection with the work described above. N o doubt some of the ideas are intuitively appealing, but it is important to recognize that they are as yet far from being articulated in the form of a systematic theory. (3) There is also missing what might be termed standard scholarship. The absence of evidence of detailed acquaintanceship or analysis of prior work in the theory of learning is one instance of such lack of scholarship, but the same can be said in general of the thinness of the references to the extensive literature in psychology and education bearing on the topics of central concern to ICAI. Much of the talk of traditional curriculum theory, for example, is closer than might be imagined and has some of the same strengths and weaknesses. (4) The collective effort represented by ICAI is in the tradition of soft analysis characteristic of traditional curriculum theory. The fact that the analysis is soft, not supported by either exactly formulated theory or extensive empirical investigations, does not mean that it is not able to contribute many clever ideas to the current and future trends in CAI. It does mean that a move has got to be made away from the soft analysis to harder theory and more quantitative analysis of data in order to become the kind of applied science it should be.
222
PATRICK SUPPES
( 5 ) There is running through much of the work on ICAI a problem of identifiability, which is classical in developed sciences such as physics and economics. The workers in this field have commendably turned their attention to underlying structures, especially underlying mental structures, of students learning a new skill or concept, but they have been overly optimistic in much of what they have written thus far about identifying the nature of the structure. I have in fact not seen one really sophisticated discussion of the problems of identifiability that are implicit in the approaches being taken. (6) For researchers interested in modeling the mental structure of students, there is a surprising absence of consideration of powerful nonverbal methods in experimental psychology for making inferences about such structures. I have in mind, first, the importance of latencies or response times as sensitive measures of underlying skill. The relation between such latency measures and the relative difficulty of problems in basic arithmetic has been extensively studied in prior work of my own (for example, Suppes et al., 1968; Suppes and Morningstar, 1972), but the use of latencies is one of the oldest and most thoroughly understood measures in experimental psychology. The second is the technically more complicated study of eye movements, especially for the kind of theory being advocated in the development of either SOPHIE or BUGGY. The study of eye movements would almost certainly give much additional insight into the undebugged models that students are using for solving problems.
In closing I want to emphasize that I think that none of these weaknesses is irremediable or fatal. The ICAI movement is, from a research standpoint, perhaps the single most salient collective effort in extending the range of CAI in the period under review. The movement has much promise and much can be expected from it in the future. 5. The Future
It would be foolhardy to make detailed quantitative predictions about CAI usage in the years ahead. The current developments in computers are moving at too fast a pace to permit a forecast to be made of instructional activities that involve computers 10 years from now. However, without attempting a detailed quantitative forecast it is still possible to say some things about the future that are probably correct and that, when not correct, may be interesting because of the kinds of problems they implicitly involve.
( I ) It is evident that the continued development of more powerful hardware for less dollars will have a decided impact on usage. It is reason-
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
223
able to anticipate that by 1990 there will be widespread use of CAI in schools and colleges in this country, and a rapidly accelerating pattern of development in other parts of the world, especially in countries like Canada, France, Germany, Great Britain, and Japan. Usage should have increased at least by an order of magnitude by 1990-such an order of magnitude increase in the next 12 years requires a monthly growth rate of something under 2%, which is feasible, even if somewhat optimistic. (2) By the year 2000 it is reasonable to predict a substantial use of home CAI. Advanced delivery systems will still be in the process of being put in place, but it may well be that stand-alone terminals will be widely enough distributed and powerful enough by then to support a variety of educational activities in the home. At this point, the technical problems of getting such instructional instrumentation into the home do not seem as complicated and as difficult as organizing the logistical and bureaucratic effort of course production and accreditation procedures. Extensive research on home instruction in the last 50 years shows clearly enough that one of the central problems is providing clear methods of accreditation for the work done. There is, I think, no reason to believe that this situation will change radically because computers are being used for instruction rather than the simpler means of the past. It will still remain of central importance to the student who is working at home to have well-defined methods of accreditation and a well-defined institutional structure within which to conduct his instructional activities, even though they are centered in the home. There has been a recent increasing movement to offer television courses in community colleges and to reduce drastically the number of times the student is required to come to the campus. There are many reasons to believe that a similar kind of model will be effective in institutionalizing and accrediting home-based instruction of the interactive sort that CAI methods can provide. (3) It is likely that videodisks or similar devices will offer a variety of programming possibilities that are not yet available for CAI. But if videodisk courses are to have anything like the finished-production qualities of educational films or television, the costs will be substantial, and it is not yet clear how those costs can be recovered. To give some idea of the magnitude of the matter, we may take as a very conservative estimate in 1978-dollars that the production of educational films cost a thousand dollars per minute. This means that the cost of 10 courses, each with 50 hr of instruction, would be approximately 30 million dollars. There is as yet no market to encourage investors to consider seriously investing capital funds in these amounts. No doubt, as good, reliable videodisk systems or their technological equivalents become available, courses will be produced, but there will be a continuing problem about the production of high quality materials because of the high capital costs.
2 24
PATRICK SUPPES
(4) Each of the areas of research reviewed in Section 4 should have major developments in the next decade. It would indeed be disappointing if by 1990 fairly free natural-language processing in limited areas of knowledge were not possible. By then, the critical question may turn out to be how to do it efficiently rather than the question now of how to do it at all. Also, computers that are mainly silent should begin to be noisily talking “creatures” by 1990 and certainly very much so by 2000. It is true that not all uses of computers have a natural place for spoken speech, but many do, and moreover as such speech becomes easily available, it is reasonable to anticipate that auxiliary functions at least will depend upon spoken messages. In any case, the central use of spoken language in instruction is scarcely a debatable issue, and it is conservative to predict that computer-generated speech will be one of the significant CAI efforts in the decade ahead. The matter of informal mathematical procedures, or rich procedures of a more general sort for mathematics and science instruction, is a narrower and more sharply focused topic than that of either natural-language processing or spoken speech, but the implications for teaching of the availability of such procedures are important. By the year 2000, the kind of role that is played by calculators in elementary arithmetical calculations should be played by computers on a very general basis in all kinds of symbolic calculations or in giving the kinds of mathematical proofs now expected of undergraduates in a wide variety of courses. I also predict that the number of people who make use of such symbolic calculations or mathematical proofs will continue to increase dramatically. One way of making such a prediction dramatic would be to hold that the number of people a hundred years from now who use such procedures will stand in relation to the number now as the number who have taken a course in some kind of symbolic mathematics (algebra or geometry, for example) in the 1970s stand in relation to the number who took such a course in the 1870s. The increase will probably not be this dramatic, but it should be quite impressive all the same, as the penetration of science and technology into all phases of our lives, including our intellectual conception of the world we live in, continues. It goes without saying that the fourth main topic mentioned in Section 4, modeling of students, will have continued attention, and may, during the next decade, have the most significant rate of change. We should expect by 1990 CAI courses of considerable pedagogical and psychological sophistication. The student should expect penetrating and sophisticated things to be said to him about the character of his work and to be disappointed when the CAI courses with which he is interacting do not have such features.
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
225
( 5 ) Finally, I come to my last remark about the future, the prediction that as speech-recognition research, which I have not previously mentioned in this chapter, begins to make serious progress of the sort that some of the recent work reported indicates may be possible, we should have by the year 2020, or shortly thereafter, CAI courses that have the features that Socrates thought so desirable so long ago. What is said in Plato’s dialogue Phaedrus about teaching should be true in the twenty-first century, but now the intimate dialogue between student and tutor will be conducted with a sophisticated computer tutor. The computer tutor will be able to talk to the student at great length and will at least be able to accept and to recognize limited responses by the student. As Phaedrus says in the dialogue named after him, what we should aspire to is “the living word of knowledge which has a soul, and of which the written word is properly no more than an image.” ACKNOWLEDGMENT Research connected with this paper has been supported in part by National Science Foundation Grant No. SED77-09698. I am indebted to Lee Blaine for several useful comments, and to Blaine as well as Robert Laddaga, James McDonald, Arvin Levine, and William Sanders for drawing upon their work in the Institute for Mathematical Studies in the Social Sciences at Stanford. REFERENCES Adkins. K., and Hamilton, M. (1975). “Teachers Handbook for Language Arts 3-6” (3rd ed.. rev.). Computer Curriculum, Palo Alto, California. Alderman, D. L. (1978). “Evaluation of The TICCIT Computer-Assisted Instructional System in the Community College,” Vol. I . Educational Testing Service, Princeton, New Jersey. Allen, J. (1977). A modular audio response system for computer output. IEEE Inr. Conf. ASSP Rec. 77CH1197-3.597. Ashiba, N. (1976). Simple CAI system and an audiotutorial system. J . Conv. Rec. Four Ins!. Electr. Eng. Japan 6 , 177-180. Atal, B. S ., and Hanauer, S. L. (1971). Speech analysis and synthesis by Linear prediction of the speech wave. JASA 50, 637-644. Atkinson, R. C. (1968). Computer-based instruction and the learning process. A m . Psycho/. 23, 225-239. Atkinson, R. C., and Hansen, D. (1966). Computer-assisted instruction in initial reading: The Stanford Project. Read. Res. Q. 2 , 5-25. Atkinson, R. C., Fletcher, D., Lindsay, J., Campbell, J. O., and Barr, A. (1973). Computerassisted instruction in initial reading: Individualized instruction based on optimization . 8, 27-37. procedures. E d i ~ Techno/. Ballaben, G., and Ercoli. P. (1975). Computer-aided teaching of assembler programming. In “Computers in Education” (0.Lecarme and R. Lewis, eds.), pp. 217-221. IFIP, NorthHolland, Amsterdam. Barr, A.. Beard, M., and Atkinson, R. C. (1974). “A Rationale and Description of the
226
PATRICK SUPPES
BASIC Instructional Program” [TR 228 (Psych. and Educ. Ser.)]. Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, California. Barr, A., Beard, M., and Atkinson, R. C. (1975). Information networks for CAI curriculums. In “Computers in Education’’ (0. Lecarme and R. Lewis, eds.), pp. 477-482. IFIP, North-Holland, Amsterdam. Bitzer, D. (1976). The wide world of computer-based education. In “Advances in Computers” (M. Rubinoff and M. C. Yovits, eds.), Vol. 15, pp. 239-283. Academic Press, New York. Blaine, L., and McDonald, J. (1978). “Interactive Processing in the EXCHECK System of Natural Mathematical Reasoning.” Paper presented at the meeting of the California Educational Computer Consortium, Anaheim, California. Bork, A. (1975). The physics computer development project. EDUCOM 10, 14-19. Bork, A. (1977a). Computers and the future of learning. J. Coll. Sci. Teach. 7 ( 2 ) . 8890. Bork, A. (1977b). SPACETIME-An experimental learner-controlled dialog. In “Proceedings of 1977 Conference on Computers in the Undergraduate Curricula-CCUC8,” pp. 207-212. Michigan State University, East Lansing. Bork, A. (1978). Computers, education, and the future of educational institutions. In “Computing in College and University: 1978 and Beyond” (Gerard P. Weeg Memorial Conference), p. 119. University of Iowa, Iowa City. Bork, A,, and Marasco, J. (1977). Modes of computer usage in science. T . H . E . J . (Technological Horizons in Education) 4 (2). Brown, J. S., and Burton, R. R. (1978). Diagnostic models for procedural bugs in basic mathematical skills. Cognitive Science 2 , 155-192. Brown, J . S . , Burton, R. R., and Bell, A. G. (1975a). SOPHIE: A step toward creating a reactive learning environment. Inr. J. Man-Mach. Srud. 7 , 675-696. Brown, J. S.. Burton, R., Miller, M., deKleer, J., Purcell, S., Hausmann, C., and Bobrow, R. (l975b). “Steps toward a Theoretical Foundation for Complex, Knowledge-based CAI” (BBN Rep. 3135; ICAI Rep. 2). Bolt, Beranek & Newman. Cambridge, Massachusetts. Brown, J. S., Rubinstein, R., and Burton, R. (1976). “Reactive Learning Environment for Computer Assisted Electronics Instruction” (BBN Rep. 3314; ICAI Rep. I ) . Bolt, Beranek & Newman, Cambridge, Massachusetts. Brown, J. S . , Burton, R. R., Hausmann, C., Goldstein, I., Huggins, B., and Miller. M. (1977). “Aspects of a Theory for Automated Student Modelling” (BBN Rep. 3549; ICAI Rep. 4). Bolt, Beranek t Newman, Cambridge, Massachusetts. Brownell, W. A. (1953). Arithmetic readiness as a practical classroom concept. Elern. School J. 52, 15-22. Brownell, W. A., and Chazal, C. B. (1958). Premature drill. I n “Research in the Three R’s” (C. W. Hunicutt and W. J. Iverson, eds.), pp. 364-366 (2nd ed., 1960). Harper, New York. Brownell, W. A., and Moser, H. E. (1949). “Meaningful Versus Rote Learning: A Study in Grade Ill Subtraction” (Duke University Research in Education TR 8). Duke Univ. Press, Durham, North Carolina. Bunderson, C. V. (1975). Team production of learner-controlled courseware. In “Improving Instructional Productivity in Higher Education” (S. A. Harrison and L. M. Stolurow, eds.), pp. 91-1 I I . Educational Technology, Englewood Cliffs, New Jersey. Bunderson, C. V. (1977). “A Rejoinder to the ETS Evaluation of TICCIT” (CTRC TR 22). Brigham Young University, Provo, Utah. Bunderson, C. V., and Faust, G. W. (1976). Programmed and computer-assisted instruction. In “The Psychology of Teaching Methods” (75th Yearbook of the National Society for the Study of Education), Part 1, pp. 4-90.Univ. of Chicago Press, Chicago, Illinois.
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
227
Carr, B.,and Goldstein, 1. P. (1977). “Overlays: A Theory of Modelling for Computer Aided Instruction” (MIT Al Memo 406; LOGO Memo 40). Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge. Computer-based Education Research Laboratory (CERL) (1977). “Demonstration of the PLATO I V Computer-based Education System” [final (March) report]. University of Illinois, Urbana. CONDUIT (1977). “Computers in Undergraduate Teaching: 1977 CONDUIT State of the Art Reports in Selected Disciplines.” University of Iowa, Iowa City. Cronbach, L. J., and Snow, R. E. (1977). “Aptitudes and Instructional Methods.” Irvington, New York. Davis, R. B. (1974). What classroom role should the PLATO computer system play? I n “AFIPS-Conference Proceedings,” Vol. 43, pp. 169-173. AFIPS Press, Montvale, New Jersey. Dewey, J. (1966). “Democracy and Education.” Free Press, New York. Dugdale, S., and Kibbey, D. (1977). “Elementary Mathematics with PLATO” (2nd ed.). Computer-based Education Research Lab. (CERL), University of Illinois, Urbana. Fletcher, J. D., and Atkinson, R. C. (1972). Evaluation of the Stanford CAI program in initial reading. J . Educ. Psychol. 63, 597-602. Fletcher, I. D., Adkins, K., and Hamilton, M. (1972). “Teacher’s Handbook for Reading, Grades 3-6.” Computer Curriculum, Palo Alto, California. Goldberg, A., and Suppes, P. (1972). A computer-assisted instruction program for exercises on finding axioms. Educ. Stud. Math. 4, 429-449. Goldberg, A., and Suppes, P. (1976). Computer-assisted instruction in elementary logic at the university level. Educ. Stud. Math. 6, 447-474. Goldstein, 1. P. (1977). “The Computer a s Coach: An Athletic Paradigm for Intellectual Education” (MIT A1 Memo 389). Artificial Intelligence Lab., Massachusetts Institute of Technology, Cambridge. Goldstein, 1. P., and Papert, S. (1977). Artificial intelligence, language, and the study of knowledge. Cogn. Sci. 1 ( I ) , 8&123. Hawkins, C. A., ed. (1977). ”Computer Based Learning” (0.0.0. Mededeling 23,4 parts). Dept. of Research and Development of Higher Education, Rijksuniversteit, Utrecht, The Netherlands. Hunka, S. (1978). CAI: A primary source of instruction in Canada. T.H.E. J . (Technological Horizons in Education) 5 (9,56-58. Hunter, B . , Kastner, C. S., Rubin, M. L., and Seidel, R. J. (1975). “Learning Alternatives in U.S. Education: Where Student and Computer Meet.” HumRRO, Educational Technology, Englewood Cliffs, New Jersey. Jamison, D., Suppes, P., and Wells, S. (1974). The effectiveness of alternative instructional media: A survey. Rev. Educ. Res. 44, 1-67. Kimura, S. (1975). Development of CAI course generator for the National Institute for Educational Research’s CAI system at Tokiwa Middle School. PGET 75 (83), 43-50. Klatt, D. (1976). Structure of a phonological rule component for a synthesis-by-rule program. IEEE Trans. ASSP 24, 391. Laddaga, R., Leben, W. R., Levine, A., Sanders, W. R., and Suppes, P. (1978). “Computer-assisted Instruction in Initial Reading with Audio.” Unpublished manuscript, Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, California. Larsen, I., Markosian, L. Z., and Suppes, P. (1978). Performance models of undergraduate students on computer-assisted instruction in elementary logic. Instruc. Sci. 7, 15-35. Laymon, R., and Lloyd, T. (1977). Computer-assisted instruction in logic: ENIGMA. Teuch. Philos. 2 ( I ) , 15-28.
PATRICK SUPPES Lecarme, O., and Lewis, R.. eds. (1975). “Computers in Education.” IFIP, North-Holland, Amsterdam. Lekan, H. A., ed. (1971). “Index to Computer Assisted Instruction” (3rd ed.). Harcourt. New York. Levien, R. E. (1972). “The Emerging Technology: Instructional Uses of the Computer in Higher Education.” McGraw-Hill. New York. Levine, A., and Sanders, W. R. (1978). “The MISS Speech Synthesis System” [TR 299 (Psych. and Educ. Ser.)]. Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, California. Macken, E., and Suppes, P. (1976). Evaluation studies of CCC elementary-school curriculums, 1971-1975. CCC Educ. Stud. l, .1-37. Makhoul, J. (1975). Linear prediction: A tutorial review. Proc. I E E E 63 (4), 561-580. Markel, J. D., and Gray, A. H. (1976). “Linear Prediction of Speech.” Springer-Verlag, Berlin and New York. Partee, B., ed. (1976). “Montague Grammar.” Academic Press, New York. Poulsen, G., and Macken, E.(1978). “Evaluation Studies of CCC Elementary Curriculums, 1975-1977.” Computer Curriculum, Palo Alto, California. Russell, B. (1913). On the notion of cause. Proc. Arisror. SOC. 13, 1-26. Sakamoto, T. (1977). The current state of educational technology in Japan. Educ. Techno/. Res. 1, 39-63. Sanders, W. R., Benbassat, 0.V., and Smith, R. L. (1976). Speech synthesis for computer assisted instruction: The MISS system and its applications. S I G G U E Bull. 8 ( I ) , 200-211. Sanders, W., Levine, A,, and Gramlich, C. (1978). The sensitivity of LPC synthesized speech quality to the imposition of artificial pitch, duration, loudness and spectral contours. J . Acoust. Soc. Am. 64, S1 (abstract). Santos, S. M. dos, and Millan, M. R. (1975). A system for teaching programming by means of a Brazilian minicomputer. I n “Computers in Education” (0.Lecarme and R. Lewis, eds.), pp. 211-216. IFIP, North-Holland, Amsterdam. Schank, R. (1973). Identification of conceptualization underlying natural language. I n “Computer Models of Thought and Language” (R. C. Schank and K. M. Colby, eds.). Freeman, San Francisco. Schank. R. C. (1975). Using knowledge t o understand. I n “Proceedings of a Workshop on Theoretical Issues in Natural Language Processing” (R.Schank and B. L. Nash-Webber, eds.). Massachusetts Institute of Technology, Cambridge. Schank. R., Goldman, N., Rieger, C.. and Riesbeck. C. (1972). “Primitive Concepts Underlying Verbs ofThought” (AIM-162). Artificial Intelligence Lab., Stanford University, Stanford, California. Searle, B., Friend, J., and Suppes, P. (1976). “The Radio Mathematics Project: Nicaragua 1974-1975.” Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, California. Smith, R. L., Graves, W. H., Blaine, L. H., and Marinov, V. G. (1975). Computer-assisted axiomatic mathematics: Informal rigor. I n “Computers in Education” (0.Lecarme and R. Lewis, eds.), pp. 803-809. IFIP, North-Holland, Amsterdam. Smith, S. T., and Sherwood, B. A. (1976). Educational uses of the PLAT0 computer system. Science 192, 344-352. Stevens, A. L., and Collins, A. (1977). “The Goal Structure o f a Socratic Tutor” (BBN Rep. 3518). Bolt, Beranek & Newman, Cambridge, Massachusetts. Stevens. A. L., Collins, A . , and Goldin, S. (1978). “Diagnosing Students Misconceptions in Causal Models” (BBN Rep. 3786). Bolt, Beranek & Newman, Cambridge, Masachusetts. Su, S . Y. W.. and Emam, A. E. (1975). Teaching software systems on a minicomputer: A
TRENDS IN COMPUTER-ASSISTED INSTRUCTION
229
CAI approach. I n “Computers in Education” (0. Lecarme and R. Lewis, eds.), pp. 223-229. IFIP, North-Holland, Amsterdam. Suppes, P. (1957). “Introduction to Logic.” Van Nostrand, New York. Suppes, P. (1%0). ”Axiomatic Set Theory.” Van Nostrand, New York. (Slightly revised edition published by Dover, New York, 1972.) Suppes, P. (1975). Impact of computers on curriculum in the schools and universities. In “Computers in Education” (0.Lecarme and R. Lewis, eds.), pp. 173-179. IFIP, NorthHolland, Amsterdam. Suppes, P. (1976). Elimination of quantifiers in the semantics of natural language by use of extended relation algebras. Rev. Inr. Philos. 117-118, 243-259. Suppes. P. (1979). Variable-free semantics for negations with prosodic variation. In “Essays in Honour of Jaakko Hintikka” (E. Sarinen, R. Hilpinen, I. Niiniluoto. and M. Provence Hintikka, eds.), pp. 49-59. Reidel, Dordrecht, The Netherlands. Suppes, P., and Macken, E. (1978). Steps toward a variable-free semantics of attributive adjectives, possessives, and intensifying adverbs. In “Children’s Language” (K. Nelson, ed.), Vol. 1, pp. 81-115. Gardner, New York. Suppes, P., and Morningstar, M. (1972). “Computer-assisted Instruction at Stanford, 196668: Data, Models, and Evaluation of the Arithmetic Programs.” Academic Press, New York. Suppes, P., Jerman, J., and Brian, D. (1968). “Computer-assisted Instruction: Stanford’s 1965-66 Arithmetic Program.” Academic Press, New York. Suppes, P., Fletcher, J. D., Zanotti, M., Lorton, P. V., Jr., and Searle, B. W. (1973). “Evaluation of Computer-assisted Instruction in Elementary Mathematics for Hearingimpaired Students” [TR 200 (Psych. and Educ. Ser.)]. Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, California. Suppes. P., Searle, B. W., Kanz, G., and Clinton, J. P. M. (1975). “Teacher’s Handbook for Mathematics Strands, Grades 1-6” (rev. ed.). Computer Curriculum, Palo Alto, California. Suppes, P., Smith, R., and Beard, M. (1977). University-level computer-assisted instruction at Stanford: 1971-1975. Instruct. Sci. 6, 151-185. Suppes, P., Macken, E., and Zanotti, M. (1978). The role of global psychological models in instructional technology. In “Advances in Instructional Psychology” (R. Glaser, ed.), Vol. I, pp. 229-259. Erlbaum, Hillsdale, New Jersey. Vinsonhaler, J., and Bass, R. (1972). A summary of ten major studies of CAI drill and practice. Educ. Technol. 12, 29-32. VOTRAX Audio Response System VS-6.0 Operators Manual (n.d.). Federal Screw Works, Troy, Michigan. Wang, A. C., ed. (1978). “Index to Computer Based Learning.” Instructional Media Lab., University of Wisconsin, Milwaukee. Weiss, D. J., ed. (1978). “Proceedings of the 1977 Computerized Adaptive Testing Conference.” Psychometric Methods Program, Dept. of Psychology, University of Minnesota, Minneapolis. Winograd, T. (1972). “Understanding Natural Language.” Academic Press, New York. Woods, W. (1970). Transition network grammars for natural language analysis. Commun. Assoc. Comput. Much. 13 (10). 591-606. Woods, W. (1974). “Natural Language Communication with Computers” (BBN Rep. 1976). Vol. I . Bolt, Beranek & Newman, Cambridge, Massachusetts. Wu, E-Shi ( 1978). “Construction and Evaluation of a Computer-assisted Instruction Curriculum in Spoken Mandarin” [TR 298 (Psych. and Educ. ser.)]. Institute for Mathematical Studies in the Social Sciences, Stanford University, Stanford, California. Yob, G. (1975). Hunt the Wumpus. Creut. Comput. Sept.-Oct., 51-54.
This Page Intentionally Left Blank
Software in the Soviet Union: Progress and Problems S.
E. GOODMAN
Woodrow Wilson School of Public and International Affairs Princeton University
1. 2.
3.
4.
5.
Introduction . . . . . . . . . . . . . . . . . . . . . . . A Survey of Soviet Software . . . . . . . . . . . . . . . . 2. I Soviet Software before 1972 . . . . . . . . . . . . . . 2.2 Soviet Software since 1972 . . . . . . . . . . . . . . Systemic Factors . . . . . . . . . . . . . . . . . . . . . 3.1 Software in the Context of the Soviet Economic System . 3.2 Internal Diffusion . . . . . . . . . . . . . . . . . . 3.3 Stages in the Software Development Process . . . . . . 3.4 Manpower Development . . . . . . . . . . . . . . . Software Technology Transfer . . . . . . . . . . . . . . . 4 . I Mechanisms for Software Technology Transfer . . . . . 4.2 External Sources . . . . . . . . . . . . . . . . . . . 4.3 The Control of Software Technology Transfer . . . . . . ASummary . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments and Disclaimer . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . . . . . . . .
231 233 233 239 249 249 256 261 265 268 269 213 215 278 281 281
1. Introduction
It is only within the last decade that the Soviets have really committed themselves to the production and use of complex general purpose computer systems on a scale large enough to pervade the national economy. This goal has made it necessary for the USSR to devote considerable effort to upgrading and expanding its software capabilities. This paper is an attempt to provide a broad perspective on software development in the USSR. To this end, it will be convenient to classify loosely the factors that affect the production and use of software in the Soviet Union in terms of four categories: ( 1 ) those that depend on hardware availability; 231 ('opyrighl
ADVANCES IN COMPUTERS. VOL 18
0 1Y79 hy Academic
Press. Inc.
All rights of reproduction in any form reserved. ISBN O - I 2 - O I ? I I R - ?
S . E. GOODMAN
(2) those that are related to priorities in the allocation of effort and other resources; (3) those that are dependent on the nature of Soviet institutions and economic practices, i.e., systemic factors; and (4) those that involve technology transfers from foreign sources. Although these categories are neither independent nor mutually exclusive, they provide a useful framework for a survey and analysis. We will try to show that the Soviets have made substantial progress in removing limitations due to hardware availability, some progress as a result of changes in priorities, and as yet relatively little progress in overcoming an assortment of complex systemic problems that affect the development of software. Consequently, the Soviets will continue to borrow from foreign software technology, and they are now better equipped and motivated to do so. Soviet software progress and problems cannot be understood on a technical basis alone. Relevant economic and political aspects have to be examined to present a more complete picture. The USSR has permeated technology and economics with politics, and our survey and analysis must discuss software in the context of this overall environment. Although the Soviet situation is extreme, it is not unique. Software engineering and management goes beyond the technical details of program code everywhere. I n the last dozen years, Western literature has contained many articles that deal with social and economic aspects of software (e.g., Infotech, 1972; Boehm, 1975; Bauer, 1975; Horowitz, 1975; Buxton et al., 1976; Infotech, 1976; Myers, 1976; Wegner, 1977). The national and international level discussions in this paper are logical extentions of this. We are so used to our own environment that most of us do not think about its advantages or disadvantages relative to other systemic arrangements. Any serious study of the Soviet software industry’ must involve some implicit and explicit comparisons with its US counterpart. In most aspects, the Soviets come out a poor second. This is because the peculiarities of the software sector tend to highlight Soviet weaknesses and American strengths. One should be careful not to extrapolate these comparisons over a broader economic spectrum. It is difficult enough to write about the development and economics of software under the best of circumstances. It is particularly difficult when coupled with the special problems that afflict the study of the USSR. To help come to grips with this combination, an effort has been made to use as many sources as possible. These include a couple of thousand books and articles from the open literature (newspapers, marketing brochures, I We shall use the term “software industry” to denote broadly the totality of a nation’s software capacity.
SOFTWARE IN THE SOVIET UNION
233
trade journals, research reports, the scientific and technical literature, etc., through Fall of 1978). Of course, spacial limitations restrict the references to a small fraction of these. Unfortunately, the limited availability of Soviet and East European source material in the US necessitates the use of less-than-ideal hand-me-downs of various kinds. It is thus likely that the bibliography contains more “bugs” (e.g., misspelled names) than most. I have also had the benefit of a large number of private communications. Assorted constraints make it necessary to limit most of the discussion to nonmilitary, general purpose computing. 2. A Survey of Soviet Software
During the last ten years the USSR and its CEMA2allies have designed, developed, and put into production a family of upward compatible thirdgeneration computers known as the Unified System (ES) or Ryad.“ This system is an effective functional duplication of the IBM S/360 series, and provides Soviet users with unprecedented quantities of reasonably good general purpose hardware. The development of the Unified System is a watershed in Soviet thinking on software, and it reflects a major commitment by the Party and government to the widespread use of digital computers in the national economy. The appearance of the first Ryad production models in 1972 marks a clear turning point in Soviet software development.
2.1 Soviet Software Before 1972‘ Although the USSR was the first country on continental Europe to build a working stored program digital computer (the MESM in 1951), and quickly put a model into serial production (the Strela in 1953), the Soviets have been slow to appreciate the value of computers for applications other than small- and medium-scale scientific/engineering computations. Little effort was made to produce large quantities of suitable hardware intended for widespread general purpose use. A business machines industry was essentially nonexistent, as was a body of consumers who had the per-
* The Council for Economic Mutual Assistance is composed primarily of Bulgaria, Czechoslovakia. German Democratic Republic (GDR), Hungary, Poland, and the USSR. Cuba, Mongolia, Romania, and Vietnam also have affiliations. ES is a transliterated abbreviation of Edinaya Sistema, the Russian for Unified System. The Cyrillic abbreviation and an alternate transliteration Yes are also commonly used. Language differences among the participating countries produce other variants; for example, the Polish abbreviation is JS. Ryad (alternate transliteration: Riad) is the Russian word for “row” or “series.” The prefix R is sometimes used to designate computer models. Broad coverage of Soviet software before Ryad can be found in First AU Conf. Prog. (1%8); Second AU Conf. Prog. ( 1970); Ershov (1969); Drexhage (1976); and Ershov and Shura-Bura (1976).
234
S . E. GOODMAN
ceived need and priority to obtain such equipment. Before the early 1960s the military and scientific/engineering communities were the only influential customers with an interest in computing. However both were less enamoured with computers than their American counterparts, and the Soviet industry developed only to the extent where it could respond to this relatively limited demand. By 1971 less than 20 of the approximately 60 known computer models had been serially produced with more than 100 units apiece (Rudins, 1970; Davis and Goodman, 1978). The vast majority of these were small- to medium-scale second-generation machines, some of which were still in production during the Ninth Five-Year Plan (197 1-75) (Myasnikov, 1977). As of 1971, there were less than 2000 medium- and large-scale secondgeneration machines in the USSR,5 in contrast with the much larger number and variety in the West. Furthermore, the West had many more smaller computers. For example, by late 1963 IBM had built 14,000 1400 series machines (OECD, 1969), almost twice the total number of computers in the USSR in 1970. Thus the population of experienced programmers in the USSR remained relatively small, and there was a particularly critical shortage of modern systems programmers who had worked on large, complex, multifaceted software systems. This was compounded by the failure of the Soviet educational system and the computer manufacturers to provide the kind of hands-on, intensive practical training that was common in the US. Two handicaps shared by all Soviet computer models were a lack of adequate primary storage and the state of peripheral technology (Ware, 1960; Godliba and Skovorodin, 1967; Judy, 1967; Rudins, 1970; Ershov and Shura-Bura, 1976). Installations usually had 1-32K words of core memory. The most reliable and commonly used forms of input/output were paper tape and typewriter console. Card readers, printers, and their associated paper products were of poor quality and reliability. Until the mid- 1960s alphanumeric printers and CRT displays were essentially nonexistent; printers were numeric and used narrow paper. Secondary storage was on poor quality tape and drum units. For all practical purposes, disk storage did not exist in the USSR until Ryad. Tapes could not reliably store information for much longer than a month. Additional reliability in input/output and secondary storage often had to be bought Most of these were Ural-14 (1%5), Minsk-32 (1%8), and M-222 (1%9) computers. Performance was in the 30-50K operations/sec range for scientific mixes. AU three machines were relative latecomers to the period under discussion. The largest Soviet computer built in quantity before 1977 was the BESM-6( 1 % ~comparable to the CDC 3600 in CPU performance (Ershov, 1975). Over 100 were in use by 1972. All four machines were in production during most of the Ninth Five-Year Plan.
SOFTWARE IN THE SOVIET UNION
235
through duplication of hardware or redundant storage of information. For example, the 16-track magnetic tapes for the Minsk-22fihad six tracks for data, two for parity checks, and the remaining eight tracks simply duplicated the first eight as an apparently necessary safeguard. Perhaps most importantly, Soviet peripherals did not offer convenient means for software exchange. Punched tape, with its limitations with regard to correcting and maintaining software, was more commonly used than punched cards. Magnetic tapes often could not be interchanged and used on two ostensibly identical tape drives. One consequence of this was that almost all programming was done in machine (binary) or assembly language. By the late 1960s translators for a few languages were available for all of the more popular computer models, but they were not generally used. A good compiler could take u p most of core, and the programmer could not get his program listed on his numeric printer anyway. Thus there was a strong bias that favored the “efficiency” of machine or assembly language programming. Clearly some of this bias arose from real considerations, but some of it reflected the same sort of dubious “professional” factors that perpetuate the use of assembly language in the West. It also helped make a skilled programmer a relatively rare and widely sought after employee in the USSR. Enterprises competing for talent would ingeniously create new job titles with higher benefits. General purpose data processing and industrial applications were retarded the most by computing conditions. A severe handicap, in addition to those already mentioned, was the lack of an upward compatible family of computers with variable word length. Efforts to create such a family, the Ural-10 (Ufal-11, -14, and -16) series and early ASVT models (M-1000, -2000, -3000), did not work out well (Davis and Goodman, 1978). The hardware situation and the use of machine language inhibited the development of software that would permit computers to be used for nonscientific applications by large numbers of people having little technical training. As a result, the hardware that did exist was often underutilized. The fact remains, however, that by 1970 the USSR contained between 7000 and 10,000 computers and they could not be used at all without software.? While this figure may be small when compared to the almost 40,000 installed computers in the United States in 1967 (OECD, 1969), it The Minsk machines were the yeoman general purpose computers in the USSR before Ryad, with a production span covering the period 1%2-1975. In addition to the Minsk-32, there were the earlier -2, -22, -22M, and -23 models (all rated at about 5K operations/sec). Well over 2000 of these machines were built and many of them are in use today. Our estimates of the Soviet computer inventory tend to be higher than most others, e.g., Berenyi (1970). Cave (1977).
S . E. GOODMAN
was still large enough to necessitate a substantial effort and commitment of skilled technical people. Much of the past Soviet systems software effort has been in programming languages. This is reflected in the large proportion of the open publications devoted to this area, and is consistent with the given hardware constraints, the relatively formal academic orientation of Soviet software research personnel, and the historical pattern followed in the West. Something like 50 different higher level languages can be identified from the literature. Many are experimental and have had virtually no impact beyond their development groups. Most of the more widely used pre-Ryad programming languages were based on ALGOL-60. The popularity of this language is understandable in light of the European role in its creation, the fact that most Soviet programmers have had extensive training in mathematics, and its intended use for scientific/engineering applications. Compiler development began in the early 1960s and ALGOL-60 became available for most computer models after 1963 (Ershov and Shura-Bura, 1976). FORTRAN was also available for at least the Minsk machines, the M-220, and the BESM-6 from the mid-to-late 1960s. Soviet use of ALGOL-60 has been characterized by a number of home-grown variants (Drexhage, 1976). ALGAMS and MALGOL are designed explicitly for use on slow, small-memory systems. ALGEC and ALGEM have supplementary features that make them more suitable but still not very convenient for use in economic applications. ALGOS appears to have been an experimental language for the description of computer systems. ALGOGCOBOL (Kitov er a/., 1968) is a clear hybrid for data processing. ALPHA (Ershov, 1966) and ANALITIK (Glushkov et al., 1971b) are nontrivial extensions, the latter for interactive numeric computations. There was essentially no subsequent revision of these languages after the appearance of ALGOL-68. A survey of the Soviet open literature on programming languages before 1970 reveals none that were particularly well suited for economic and industrial planning, business data processing, or large integrated systems like airline reservations or command and control systems. Attributes crucial to such applications, like good inputloutput and report generation capabilities, were just not available. For all practical purposes, the more widely used programming languages in the USSR during this period were only good for scientific and engineering computations. Interest in the more widely used United States programming languages was not insignificant before Ryad. FORTRAN was used at quite a few installations in the USSR and Eastern Europe. No fully satisfactory reason is apparent, but the Soviet software community was strong in its opposition to the use of COBOL before 1966. However, government interest in
SOFITVARE IN THE SOVIET UNION
237
general purpose data processing increased significantly during the Eighth Five-Year Plan (1966-1970), and serious attention has since been paid to COBOL (Myasnikov, 1972). This includes an early effort to set up a minimal compatible COBOL set for Soviet use (Babenko et al., 1968). Other languages, including SNOBOL and LISP, attracted scattered adherents. The Norwegian general purpose simulation language, SIMULA 67, also became fairly popular. Hardware limitations retarded the development and implementation of economically useful operating systems. Until the appearance of the BESM-6 in 1965, the simplicity and limited flexibility of the available CPUs and peripherals did not necessitate the development and use of sophisticated systems software. This was reinforced by the failure of computer manufacturers to develop and distribute such products and by the lack of support services for either software or hardware (Gladkov, 1970; Novikov, 1972). As a result, users had to develop all but the most basic utility programs to enable the installation to function adequately in a single program mode. Most programs could not be shared with other computer centers having the same CPU model because of local modifications that were made in the course of hardware self-maintenance and the lack of uniform peripheral equipment. Gradually, conditions and perceptions improved and a number of packages of utility routines were eventually put together for the more commonly used machines. Later, multiprogramming batch systems were built for the larger computers such as the Minsk-32, the Ural-11, - 14, and -16. At least three different operating systems were developed for the BESM-6. The multiplicity of BESM-6 system projects is partially the result of the nontransferability of any one system to all installations, and a lack of communication between installations. Some of these efforts appear to have been “crash” projects that did not permit the full utilization of the software development talent available. All of these systems are primitive by Western standards and did not appear until long after hardware deliveries had begun. We do not know how widely they are used or how well they are supported. Maintenance of even centrally developed systems was largely the responsibility of the user installation. As might be expected, Soviet attempts to develop time-sharing systems were severely constrained by hardware. The USSR was deficient in every aspect of hardware needed for this mode of computing. A further handicap was the poor state of supporting technology such as ground and satellite communications. Data transmission by telegraph line at 50- 150 bitshec is still common in the Soviet Union (Leonov, 1966; Kudryavsteva, 1976a). There were a few pre-Ryad time-sharing projects (Doncov, 1971). The
2 38
S. E. GOODMAN
best known of these are the AIST project in Novosibirsk and the Sirena airline passenger reservation system. Neither has done well (Doncov, 1971; Drexhage, 1976; Aviation Week, 1972). The BESM-6 operating system developed by the Institute of Applied Mathematics supported time sharing at the Academy of Sciences’ computer center in Moscow (Bakharev ef ul., 1970; Zadykhaylo et ul., 1970). It does not seem to have amounted to much either. Some strange little “time-sharing” systems (e.g., Bezhanova, 1970) were so limited as to be unworthy of the name. There have been a few experimental multimachine configurations. The best known of these were the aforementioned AIST system and the Minsk-222, which was based on an assortment of Minsk-2 and Minsk-22 computers (Barsamian, 1968; Evreinov and Kosarev, 1970). Both projects were characterized by what could only be described as naive optimism in the form of unwarranted extrapolations and fatal underestimations. With the exception of work in the area of scientific and technical computing, the open literature was notably lacking in descriptions of significant, implemented, and working applications software systems. No doubt some existed in security sensitive areas, and there is evidence that software was available to help control certain transportation networks, such as the national railway system (Petrov, 1969; Kharlonovich, 1971). However, one gets the strong impression that computers in the USSR were not being used to do much beyond straightforward highly localized tasks. The literature contained papers on the theoretical aspects of such applications as information systems, but this work was generally of a formal mathematical nature and contributed little to the actual implementation of major systems. But things would soon change. The 1960s was a period of political and economic reevaluation with respect to the need for expanding the general purpose computing capability of the USSR. Soviet economic planners were distressed by falling growth rates and the rising percentage of nonproductive (e.g., clerical) workers. They were also having trouble controlling the sheer immensity and complexity of the economy. The Soviets were becoming increasingly aware of the economic and industrial potential of computing, and they were not oblivious to what was being done in the West. Public discussion of the use of computers, which had been widespread since the late 1950s, began to be supplemented by very high level Party endorsements (Holland, I97 lb) and practical measures. Attention was directed toward such esthetically unexciting, but practically important, problems as the standardization of report forms, the elimination of human errors in data reporting, etc. The national economic planning process itself became a prime candidate for computerization (e.g., Glushkov, 1971a). Unlike the United States, which got into data process-
SOFTWARE IN THE SOVIET UNION
239
ing through an established business machines industry characterized by a dynamic, fairly low-level, customer-vendor feedback relationship, most of the driving force behind the entry of the USSR came via push from the top of the economic hierarchy. 2.2 Soviet Software Since 1972
The most important necessary condition for upgrading the state of general purpose computing in the USSR was the creation of a modern upward compatible family of computers with adequate quantities of primary memory and a suitable assortment of periphecals. The first public announcement of what was to become the Unified System of Computers (ES EVM) came in 1967 (Kazansky, 1967). Within two years, the Soviet Union had enlisted the aid of its CEMA partners, and the decision was made to try to duplicate functionally the IBM S/360 by using the same architecture, instruction set, and channel interfaces. The first production units of the Soviet-Bulgarian ES-1020 (20K operationdsec) were announced in early 1972. By the end of 1974, the Hungarian ES-1010 minicomputer, the Czech ES- 1021 (40K operations/ sec), the Soviet ES-1030 (lOOK operations/sec; a Polish version never went into serial production), and the GDR ES- I040 (320K operations/sec) were in production, providing the USSR and most of Eastern Europe with about 1000 small- and medium-scale machines per year as of late 1975. The two largest computers in the series were to suffer considerable delays. The ES-1050 (500 K operations/sec) would not go into production until 1975-1976, the ES-1060 ( I S M operations/sec) would not appear until late 1977 (Khatsenkov, 1977; Trud, 1978a). The 1010 and 1021 are not based on the S/360 architecture and are not program compatible with the other models. In addition to the basic CPU models, the CEMA countries have been producing a reasonable range of peripheral devices. Although most of this equipment is at the level of IBM products that existed during the second half of the 1960s, they represent a tremendous improvement over what was formerly available. A much more extensive discussion of Ryad can be found in Davis and Goodman (1978). The policy to use the IBM instruction set and interfaces was clearly based on software considerations. This was perceived to be the safest and most expedient way to meet the high-priority national objective of getting an upward compatible family of general purpose computers into productive use in the national economy. The Soviets had failed in two previous attempts to produce such a family, and they must have been aware of, and frightened by, the major problems IBM had with S/360 software. There was no serious interest in, or perceived need for, pushing
240
S . E. GOODMAN
the frontiers of the world state-of-the-art in computer technology. An obvious course of action was to use a tried and proven system from abroad. The clear choice was the IBM S/360. By appropriating the W360 operating systems, they would be in a position to borrow the huge quantities of systems and applications programs that had been developed by IBM and its customers over many years. This would do much to circumvent the poor state of Soviet software and permit immediate utilization of the hardware. Although it seems that the Soviets greatly underestimated the technical difficulties of this plan, it has been followed with considerable success and represents one of the most impressive technology acquisitions in Soviet history. There are several S/360 operating systems (e.g., IBM S/360, 1974), the two most important of which are the disk-oriented system DOS/360 and the much larger OS/360, which consists of several versions that together contain a few million instructions in a bewildering array of modules. A tremendous volume and variety of documentation and training aids are available for these systems. There was no effective way to deny either the software itself or the documentation to the CEMA countries. Much of this is in the public domain and can be sent anywhere without license. Sources of information include IBM itself, tens of thousands of user installations all over the world, and the open literature. Several CEMA countries have legally purchased some of the small- and medium-scale S/360 systems, which include the software and the opportunity to participate in SHARE, the major IBM user group. Soviet and East European computer scientists could also legitimately talk to Western counterparts at meetings, by using Western consultants, through exchange visits, etc. Furthermore, the Soviets have demonstrated that they can illegally obtain entire IBM computer systems if they are willing to try hard enough. DOSES is the Ryad adaptation of the IBM 9360 DOS disk-oriented operating system. From the available literature, we cannot identify any major DOS/ES features that are not part of DOS/360 (IBWDOS, 1971; ISOTIMPEX, 1973; IBM S/360, 1974; Drozdov er al., 1976; GDR, 1976; Vasyuchkova er al., 1977). Both systems are subdivided into control and processing programs. These further subdivide into supervisor, job control, initial program loader, linkage editor, librarian, sodmerge, utilities, and autotest modules. The DOS/360 system librarian includes a source statement library, a relocatable library, and a core image library, as does DOYES. Both will support up to one “background” partition in which programs are executed in stacked-job fashion, and two “foreground” partitions in which programs are operator initiated. Both support the same basic telecommunications access methods (BTAM and QTAM) and the same translators (assembler, FORTRAN, COBOL, PL/l, and RPG).
SOFTWARE IN THE SOVIET UNION
241
DOS/360 uses OLTEP (On Line Test Executive Program) to test input/ output units; DOS/ES also uses OLTEP. The level of DOS/ES appears to be at or near the level of the final DOW360 Release 26 of December 1971. Similarly, OWES appears to be an adaptation of OS/360. It has three basic modes: PCP (Primary Control Program with no multiprogramming capability), M R (Multiprogramming with a Fixed Number of Tasks), and MVT (Multiprogramming with a Variable Number of Tasks) (Larionov et al., 1973; Peledov and Raykov, 1975; 1977; GDR, 1976). All handle up to 15 independent tasks. OWES supports translators for FORTRAN levels G and H and ALGOL 60. The levels of OS/ES seem to be around the IBM MFT and MVT Release 21 of August 1972. OS/ES MFT requires a minimum of 128K bytes of primary storage; OS/ES MVT needs at least 25613 bytes (Naumov et al., 1975). OS/ES is mentioned much less frequently in the literature than DOS/ES. No doubt this reflects on the fact that the great majority of Ryads are at the lower end of the line. It may also indicate serious problems in adapting OW360 to the ES hardware and problems with the supply of adequate quantities of core storage (many ES systems were delivered with about half of the planned core memory capacity). It is possible that DOS/ES may have been the only Ryad operating system operationally available for a couple of years. The ES assembly language, job control language, and operating system macros are identical with those of S/360 (references in last two paragraphs; Larionov, 1974; Mitrofanov and Odintsov, 1977).The Soviet literature preserves the style of IBM software documentation. Assorted error codes, messages, console commands, and software diagnostics were originally in English and identical to those used by IBM. Such things have since become available in Cyrillic, but we do not know if these are standard options. English error codes, etc., still seem to prevail. Several observers who were very familiar with IBM S/360 systems software have been able to identify fine details in ES software that leave little doubt as to the source of the product and to the degree to which it was copied. It is as yet unclear exactly how program-compatible the Ryad family members are with each other or with IBM products. Some serious testing by CDC of their purchased ES-1040 indicates a high level of IBM compatibility (Koenig, 1976). IBM systems software could be loaded and run on the 1040 without much trouble. It is not known if the Soviet-made Ryad hardware is as directly compatible with IBM software. The Soviets are investing literally thousands of man-years in the development of the Ryad operating systems (Rakovsky, 1978b), but we really do not know what all these people are doing. Hardware differences between the S/360 and Unified System, and between the different models of the Unified
242
S.
E. GOODMAN
System, may have made it necessary to adapt the IBM operating systems to each of the ES models. Now that IBM no longer supports either DOS/360 or OS/360, the socialist countries are on their own as far as the maintenance and enhancement of the two systems is concerned. A recent “new version” is not especially impressive. The Scientific-Research Institute for Electronic Computers in Minsk, the institute that probably adapted DOS/360 to the ES-1020, came out with DOS-Z/ES in 1976 (Kudryavsteva, 1976a). The most notable additions to DOS are an emulator for the Minsk-32 and some performance monitoring software. We do not know to what extent these enhancements are built into the operating system. More generally, all of the ES operating systems have gone through several releases since they were introduced. We cannot really tell to what extent this reflects the addition of significant capability enhancements, academic (i.e., noncost effective) design optimizing perturbations, or simple accumulations of fixes. We suspect that the Soviets try not to tamper with the operating systems unless they have to in order to get them to function adequately. This may have been the case with an announced real-time supervisor known as SRV, an OS/ES coresident program for providing fast response in a real-time environment. SRV seems to be an adaptation of the IBM S/360 Real-Time Monitor (IBM RTM, 1970; Naumov, 1976), but, unlike the situations with DOS and OS, there are substantial differences. The first USSR State Prize to be awarded for practical software work was announced at the end of 1978 (Trud, 1978b). In some ways it is remarkable that it took this long for the Soviet scientific and technical community to recognize the importance of software. The award was made for the Ryad operating systems. Not surprisingly, neither IBM nor people like F. P. Brooks, Jr., were named as co-winners. It is important not to underestimate the achievements of the CEMA computer scientists in functionally duplicating S/360. They have mastered the quantity production of reasonably modem hardware and they did succeed in the formidable task of adapting the S/360 operating systems to this hardware. This is not to say that they did not have considerable help from external sources, or that they did a good, or fast, or imaginativejob. In fact, the effort took them about as long as it took IBM in the first place, and they have yet to achieve S/360 quality and reliability standards across the Unified System product line. Nevertheless, they had the talent and resources to achieve the basic goals and, relative to their own past, they have acquired a much enhanced indigenous computing capability. Between 1975 and 1977, the CEMA countries came out with several “interim” Ryad models that are essentially evolutionary upgrades of
SOFTWARE IN THE SOVIET UNION
243
some of the earlier machines. These include the Hungarian ES-1012 (another mini), the Soviet-Bulgarian ES- 1022 (80K operations/sec), the Polish ES-1032 (200K operations/sec-the “real” Polish 1030), and the Soviet ES-1033 (200K operations/sec). In addition to these new CPU models, the CEM’A countries have been producing a small, but steady, stream of new peripheral equipment (CSTAC 11, 1978). Although the current peripheral situation is much improved over the pre-Ryad era, complaints about shortages of peripheral devices and their associated paper products are still common (Lapshin, 1976; Ashastin, 1977; SovMold, 1978; Zhimerin, 1978). The best evidence that the CEMA nations are basically satisfied with the policy of copying the IBM product line is the current effort to develop a new group of Ryad-2 models that are clearly intended to be a functional duplication of the IBM S/370 family (IBM S/370, 1976; Bratukhin et NI., 1976; CSTAC 11, 1978; Davis and Goodman, 1978). By early 1977 most of the new models were well into the design stage. By the end of 1978, the Soviet ES-1035 was claimed to be in production (Sarapkin, 1978) and prototypes for at least the GDR ES-1055 (Robotron, 1978) and the Soviet ES- 1045 (Kornrnunist, 1978) existed. The appearance of other prototypes and the initiation of serial production will probably be scattered over 1979-1982. A Ryad-3 project was recently announced (Pleshakov, 1978), but almost no details are available. S/37@like features to be made available in the new Ryad-2 models include larger primary memory, semiconductor primary memory, virtual-storage capabilities, block-multiplexor channels, relocatable control storage, improved peripherals, and expanded timing and protection facilities. There are also plans for dual-processor systems and greatly expanded teleprocessing capabilities. It is not clear if the Soviets intend to use the IBM S/370 operating systems to the same extent as they did those for S/360, or if they plan to build the Ryad-2 operating systems on the Ryad-1 OWES base. M. E. Rakovsky, a vice-chairman of the USSR State Planning Committee (Gosplan) and one of the highest ranking Soviet officials to be directly involved in the Ryad project on a continuing basis, has stated that “developing the Unified System’s Ryad-1 operating software to the point where it will handle all the functional capabilities of the Unified System’s higher-level Ryad-2 system will take between 1600 and 2000 man-years.” He goes on to say that this effort will be carried out at “two institutes that employ a total of about 450 programmers” (Rakovsky, 1978b). There is also some reason to believe that GDR Robotron’s new virtual operating system OS/ES 6.0 for the ES-1055 may be more of an original effort than was the ES-1040 operating system. Emulators have been announced as
244
S. E. GOODMAN
part of the initial offerings for the two most advanced of the Ryad-2 models: one for running programs for DOS/ES on the 1055 (Dittert, 1978), and one for Minsk-32 programs on the 1035 (Kudryavsteva, 1976b). The Unified System project has by no means absorbed the entire Soviet computer industry although this may seem to be the case since most of what appears in the Communist literature relates to Ryad. The joint CEMA effort has forced the Soviets to be more open about computer developments. The focus is on Ryad because it is by far the largest project and many of the others are officially classified. With respect to mainframe computers, the Unified System has roughly the same relative standing in the USSR as the IBM 360/370 series has in the United States; although most of the Soviet non-Ryad mainframes are smaller second-generation computers, whereas in the United States most of the non-IBM mainframes are technically competitive CDC, UNIVAC, Burroughs, etc., models. The most extensive, openly announced non-Ryad production is primarily in the form of assorted machines built by the Ministry of Instrument Construction, Means of Automation, and Control Systems (Minpribor). Many are part of the ASVT series: M-4030, M-5000, M-6000, M-7000, M-400, M-40, and, most recently, the M-4030-1 (Naroditskaya, 1977). The medium-scale M-4030 is compatible with the Ryad family at the operating 1975). The other models are minicomputers, system level (Betelin et d., the first of which, the M-5000 and M-6000, appeared in 1972-1973. The USSR also relies on imports from Hungary, Poland, and the United States to meet some of its minineeds. The ASVT line is widely used in Soviet industry and the literature indicates that a considerable amount of software has been developed for these machines. A great deal of substantive minicomputer related R&D is done in the Baltic states (e.g., SovEst, 1978). A joint CEMA effort is currently in progress to consolidate the scattered member nation minicomputer activities by establishing a new SM (Sistema Malykh-Small System) family (Naumov, 1977). Of the four announced machines, the SM-I, -2, -3, and -4 (SM-5 and -6 announcements are expected in 1979), at least the first three were in production by mid1978. Early indications are that a substantial amount of general purpose SM software is available, and that some form of ASVT program compatibility is possible (Filinov and Semik, 1977; Rezanov and Kostelyansky, 1977; TECHMASHEXPORT 1978a,b). These minis can be used with much of the peripheral equipment that has been developed for Ryad and ASVT.
Large scientific computers are under advanced development at the Institute of Precise Mechanics and Computer Engineering in Moscow, the developers of the BESM machines. Recently announced were the El'brus-1 and -2 (named after the highest mountain in Europe) (Burtsev,
SOFTWARE IN THE SOVIET UNION
245
1978; MosPrav, 1978). The El’brus-1 is thought to be based on the Burroughs architecture (Burtsev, 1975). This architecture is particularly well suited for ALGOL programming, the language greatly favored by Soviet computer scientists and scientific programmers. The El’brus-2 may be a loosely coupled collection of El’brus- 1 machines. Past experience makes it likely that the Institute of Applied Mathematics in Moscow will participate in the development of its systems software. The new large computers will probably be produced in small numbers and many of these will be used at military and other restricted installations. The majority will eventually displace BESM-~S,so a BESM-6 emulator is likely to be an important element in early El’brus software offerings. By the time El’brus deliveries start, the receiving installations will have been using their BESM-6s for up to 15 yr. There will be considerable resistance to program conversion. In addition to these large projects, there are a number of scattered smaller efforts that we know about. These include a few complete computer systems like the new line of RUTA models (Kasyukov, 1977) and the Nairi-4 (Meliksetyan, 1976), work on microcomputers [e.g., the SS-11 being built in Armenia (Kommunkt, 1977)], and some hand-held “programmed keyboard computers” [e.g., the Electronika BZ-21 being built in Kiev (Trud, 1977)l. We do not know anything about the software that is being developed for these relatively unimportant machines, but it would not be surprising if the software offerings to early purchasers were very meager. Work on highly modular recursive machines is currently in a rudimentary stage in both the US and USSR (Glushkov et al., 1974; EE Times, 1977). We have essentially no information on Soviet efforts to develop software for machines with this architecture. Relative to their pre-Ryad past, the Soviets have clearly come a long way in correcting hardware and systems software deficiencies. There are now 25,000-30,000 computers in the USSR and at least half of them are respectably modem systems. The Unified System and the ASVT-4030, in particular, provide a large, common hardware and systems software base. But how productive have these machines been, and how well have they been integrated into the fiber of the national economy? There is no question that the Soviets and their CEMA partners have given high priority to the use of computing as an important means to help modernize the economy and increase factor productivity. Indeed, the production of a large number of industrially useful programs began with the delivery of the first ES units. There are visions of great efficiencies to be achieved from the partition of this activity among the member countries (Rakovsky, 1978a), but since the various Eastern European economies differ considerably at the microeconomic level, one might well entertain doubts as to how well this will work out. The availability of ES and ASVT hardware has resulted in something of
246
S. E. GOODMAN
a minor software explosion. But this hardware is still backward by world standards. More important, the experience and personnel base necessary for the development of either large world-standard state-of-the-art software systems or large numbers of low-level everyday data processing programs is not something that can be put together in a short period. And perhaps, in the light of past Western practices, Soviet institutional structure tends to inhibit the customer-oriented design, development, and diffusion of software (see Section 3). By far, the most extensive and prominent software activity in the USSR relates to what are collectively called automated controllmanagementsystems (ASU). The ASU spectrum runs from the simple no-direct-control monitoring of a small production process to a grand national automated data system for planning and controlling the economy of the Soviet Union. A broad range of economichndustrialASUs is listed in Pevnev (1976). The creation of ASUs has become a major nationwide undertaking (e.g., Ekongaz, 1976; Zhimerin, 1978) and there are now literally hundreds of articles and books on ASUs appearing in the Soviet literature. A small sample of recent, general books includes Kuzin and Shohukin (1976), Pevnev (1976, which gives the best overall perspective), Pirmukhamedov (1976), Liberman (1978), and Mamikonov er a/. (1978). Descriptions of specific ASUs under development and more general articles on the subindustriya, ject often appear in the periodicals S~tsialistiche.~k~iya Ekonomicheskaya gazetu, and Pribory i sistemy upravleniya. A large number of industry-specific publications and the public press media also frequently carry articles on ASUs. Although a great many articles describing a great many ASUs have appeared, by US standards these articles give little substantive information. It is thus difficult to do much more than list a lot of specific ASUs (the reader will be spared this) or present some tentative general observations. The Soviet interest in ASUs at all levels is genuine and serious. ASUs are being pushed vigorously from above, and there is a certain amount of desire at every level of the economic hierarchy to be part of the movement. Two major obstacles to the successful infusion of ASUs into the economy are the resistance of management, who are comfortable in their preautomation environment, and the inexperience of Soviet computer scientists and programmers. The Soviets have been making steady progress in overcoming both problems. Industrial managers are beginning to appreciate the potential of computers for doing tasks that people do not enjoy, but which need to be done, and the software specialists are beginning to think more realistically about simple, useful systems that are within their capabilities to build. This gradual convergence seems to be getting a lot of small systems built and used. With few exceptions (Myas-
SOFIWARE IN THE SOVIET UNION
247
nikov, 1974), it appears that most of this software is not widely disseminated, but used only locally (e.g., Zhimerin, 1978). None of this work is particularly imaginative by US standards, but there is no reason to expect it to be. As we shall discuss at greater length in the next section, the Soviet economic environment is conservative and introverted. The Soviets are cautiously and independently repeating much of the learning experience that took place in the US in the late 1950s and 1960s. It would be surprising if they were doing anything else. The Soviets continue to expend considerable local effort on software for second-generation machines. Much of what is reported in the open literature is for the Minsk-32 (e.g., Kulakovskaya et a / . , 1973; Zhukov, 1976; Vodnyy transport, 1977), but this must be true more generally since almost half of the computers in use in the USSR are of pre-Ryad manufacture. The appropriation of most of S/360’s software has eroded the past ALGOL orientation of high-level programming in the USSR. FORTRAN and PL/1 are now widely used. The government has pushed COBOL since 1%9 and, given the emphasis on economic applications, it is not inconceivable that it could become the most widely used nonscientific language in the Soviet Union. Assorted CEMA computer centers have used LISP, SNOBOL, PASCAL, etc., and these languages will find their local advocates at Ryad installations. SIMULA-67 is an important simulation language (Shnayderman et d.,1977). So far, we have seen little of the Soviet-designed high-level languages on ES systems, although Ryad translators for some of these do exist. Most of what is done with regard to these languages may be intended to prolong the usefulness of programs written for second-generation computers, or to permit users to remain in the familiar and comfortable environment of these older machines. This would explain why ALGAMS, an ALGOL-60 variant explicitly intended for slow machines with small primary memories, has been made available as an option with DOS/ES (Borodich et a / . , 1977). Although frequent allusions to time-sharing systems appear in the Soviet literature (e.g., Bratukhin et a / . , 1976; Drozdov et a / . , 1976; SovRoss, 1976), it is not clear what is readily available and used. None of the Ryad-1 or interim models has virtual storage, and storage capacities are marginal. Much of the telephone system in the USSR is not up to supporting the reliable transmission of large volumes of information beyond a few kilometers. We have seen no explicit mention of the TSO (time-sharing option) extension of OS/360 MVT, which IBM announced in November 1969. Not one of the 20 large “time-sharing centers” scheduled for completion in 1975 was fully operational by early 1977 (Rakovsky, 1977). Now the goal is to have six by 1980 (Zhimerin, 1978). User demand for time
248
S. E. GOODMAN
sharing has only recently become serious enough to motivate more than academic exercises. The development of suitable hardware and software is currently being pursued (e.g., Bespalov and Strizhkov, 1978; Pervyshin, 1978), but most of this seems to be in rudimentary stages of development. Several experimental systems appear to be operational, and the ES-1033 with time-sharing capabilities has been advertised for sale in India (ElorgKomputronics, 1978) using OSlES. However, widespread time-sharing use seems unlikely as long as most Ryad installations are equipped to use only DOS/ES. The enhanced capabilities expected with the Ryad-2 models should bring further progress. There is considerable interest in database management systems (DBMS) in the USSR. Much of the work that is described as operational seems to be in the form of very low level, and localized, information retrieval systems. In the past, Soviet work in this area was severely constrained by a lack of disk and other secondary storage equipment, and by the poor state of I/O technology. Ryad and other developments have eased this situation somewhat, but there are still serious limitations. For example, most Soviet installations are still equipped with only 6-8 7.25 Mbyte IBM 231 I-like disk configurations that do not allow the interleaving of data transfers. IBM 3330-like disk drives are expected to be available in moderate quantities for nonspecial (i.e., nonmilitary or non-Party) users in 1979-1980. The new capabilities expected with Ryad-2 models, especially block-multiplexor channels, should also be helpful. Poland, the GDR, and the USSR are developing several DBMS based on Codasyl. The Soviet system is called OKA and was developed at the Institute of Cybernetics in Kiev (Andon et al., 1977). OKA runs on OWES 4.0 (MFT and MVT) and has both a batch- and time-sharing mode. OKA is currently being field tested at at least one unknown installation. There is an All-Union working group following Codasyl in the USSR. The Institute of Cybernetics in Kiev is working on two systems patterned after IBM IMS-2 and the experimental IBM relational DBMS System-R. The Soviet relational DBMS is called PALMA. The Soviets have been developing several specialized DBMS. Most of the publicly acknowledged work is oriented toward economic planning, including a system that is being field-tested by Gosplan. Soviet journals are filled with the description of experimental programming systems of various sorts. The relatively new (1975) journal Progrurnmirovanie has become one of the most academically prestigious outlets for this work. It also seems to be the only major, regularly published, openly available, Soviet journal devoted exclusively to research in programming and software, although other journals, e.g., Upravlyayuschiye sistemy i mtishiny, often contain informative articles. Few of these
SOFTWARE IN THE SOVIET UNION
249
articles are at the world state-of-the-art in software research (articles on Minsk-32 software appear with some regularity), and the theoretical work being done and the experimental systems being described seem consistent with the overall level of Soviet computing compared to that of the West and Japan. As far as we can tell, none of these products of Soviet research were offered as standard options with the early Ryad computers. Although many of these programming systems are being built to run on Ryads, it is not clear to what extent they are intended to become standard software options. It is important to emphasize that we currently have a rather poor overall picture of how well or how extensively the Soviets have been using the software they have announced, or even what they have had for a long time. The lack of publications like Datamution, the very limited access we have had to Soviet installations, etc., make it difficult to say much more than we have. 3. Systemic Factors
In spite of the real progress and future promise offered by improved hardware availability and official recognition and support, there are some deeply rooted systemic problems that will continue to constrain severely the development of the Soviet software industry. 3.1 Software in the Context of the Soviet Economic System*
To a first approximation, the Soviet government/economy is organized in a hierarchical, treelike structure. The highest level node in the tree is the Council of Ministers (COM). The next levels represent a few score ministries, state committees, and other high administrative agencies. Then there are intermediate levels of Republic, branch, and department administration and management. Finally, the lower levels contain the institutes and enterprises that are responsible for R&D and the production and distribution of goods and services. This is a large bureaucratic hierarchy that encompasses every economic aspect of Soviet society. As a result of this vertical structure, and a very long and strong Russian bureaucratic tradition, much of the Soviet economy is unofficially partitioned into assorted domains or fiefdoms. These exist along ministerial, geographical, and personality divisions. People and institutions in this structure genera Some general background references for this subsection include Granick ( l % l ) , Nove (1969). Bornstein and Fusfeld (l974), Kaiser (1976). Smith (1977), Berliner (1976), and Amann er a / . (1977).
250
S. E. GOODMAN
ally develop behavior patterns that please the higher level nodes in their domains.BThis behavior may or may not coincide with the goal of providing high-quality service or products to customers. Superimposed over this vertical hierarchy are a variety of horizontal relationships. The domains are not self-sufficient. In addition to directions from above, they get supplies and services from units in other domains and they, in turn, supply goods and services elsewhere. The centralized planning apparatus, in collaboration with other levels in the hierarchy, establishes suppliers and customers for almost every Soviet institute and enterprise. Although there is some flexibility in establishing these horizontal relationships, they are for the most part beyond the control of lower level management. One of the most important of the self-assigned tasks of the Communist Party is to expedite all sorts of government and economic activity. The Party intercedes to get things done. Although the Party organization is also subdivided into fiefdoms, it is more tightly controlled and operates freely across governmentleconomic domains. Finally, there are the unofficial, sometimes illegal, horizontal arrangements that are often created to enable an enterprise to function successfully in spite of everything else. In the centrally planned Soviet economy, there is no market or quasimarket mechanism to determine prices, producthervice mixes, rewards, etc. For the most part, all of this is worked out at high levels and by a centrally controlled haggling process, although lower level management has been granted some degree of flexibility by gradual reforms since 1%5. In this system quantity is stressed over quality, and production is stressed over service. Enterprises are told what to do. Failure to meet these imposed commitments can bring stif€ penalties. Success is rewarded, but there is little opportunity for the high-risk, big-payoff, innovative entrepreneurial activity that is common in the US. The central planners do not like much activity of this sort because it is difficult to control. The business practices that have evolved in this environment are not surprising. Enterprises are oriented toward the basic goal of fulfilling the performance indices that are given to them. These are usually narrowly defined quantitative quotas. Thus, for example, a computer producer's most important index may be the number of CPUs manufactured and a less important index may be the number of peripheral devices built. Rewards are paid for meeting the basic goals and for overfulfillment. Lists of suppliers and customers are provided by the planners. Plant management Of course, this behavior is not unique to the Soviet bureaucracy. It is characteristic of many bureaucracies, including most (if not all) of the US Government. However, in the USSR it is much more pervasive and there is no alternative to being part of this system. @
SOFTWARE IN THE SOVIET UNION
251
will obviously give first priority to meeting the CPU production norm, then priority goes to the peripherals. They do not want to overdo things, because this year's successes may become next year's quotas. Furthermore, it is clearly in their own best interests to haggle with the planners for low quotas. Since customer satisfaction is of relatively minor importance (particularly if the customer is far away or under another ministry), management is not going to divert its resources to installation and maintenance unless it absolutely has to. There is also an obvious incentive to try to retain the status quo. Once a plant operation has started to function smoothly, there is no market pressure to force innovation, improved service, and new products, All these things mean finding new suppliers, changing equipment, and retraining personnel. They involve serious risk, and local management cannot control prices or suppliers to balance the risk. There are strengths in this system. Central control and the powerful expediting role of the Party allow national resources to be concentrated in high-priority areas. The differences between the Minsk machines and Ryad show that much can be done on a respectably large scale once the high-level decisions have been made. Apathy disappears and labor quality improves on priority undertakings. Of course, the government and Party do not have the resources and cannot maintain enough pressure to do this sort of thing across the entire economy. Furthermore, it can be argued that some of this high-priority success occurs because these projects are really removed from the economic mainstream. Software development would seem to circumvent some of the systemic difficulties that plague other products. Once the basic hardware exists at an installation, software work does not depend to any great extent on a continuing and timely flow of material supply from outside sources. Not surprisingly, Soviet enterprises have a tendency to avoid intercourse with and dependence on the outside. It would seem easier to develop an inhouse software capability than one for spare parts or raw materials. It would also seem that commercial software houses would be able to provide better service than, say, a hardware maintenance group. The software house is not in the middle of a supply chain, the hardware maintenance group is. Since the software industry does not involve the distribution of material products, more casual horizontal vendorcustomer relationships would be expected to be less troublesome for the central planners. Finally, the problem of the mass production of copies of a finished product is reduced almost to the point of nonexistence. It would thus seem that software has been singularly blessed at both the macro- and microeconomic levels in the USSR. But high-level policy statements are not always easy to translate into practice, and the firm-
252
S. E. GOODMAN
level advantages just described may be less advantageous than they appear. The development of a broad national software capability is not like the development of a capability to build computing hardware or armored personnel carriers. The nature of software development places considerable emphasis on traditional Soviet economic weaknesses and is not well suited to the “annual plan” form of management that is dominant in the USSR. Before Ryad, hardware manufacturers did little to produce, upgrade, or distribute software. Few models existed in sufficient numbers to make possible a common software base of real economic importance. Repeated attempts to form user groups produced limited successes. Soviet security constraints restricted participation in sharing software for some models. Enterprises rarely exchanged programs. Contracts with research institutes to produce software products were often frustrating for the customer (e.g., Novikov, 1978). The research institute staff would be content with a prototype system that was not well tailored to the customer’s needs. Most users had little recourse but to modify and maintain the programs on their own. Conditions are gradually improving, but changes take time even where they are possible. One promising reform has been the establishment of the corporationlike production associations (Berliner, 1976; Gorlin, 1976).1° These support the creation of relatively large and efficient computer centers that should be able to better serve the needs of the association and its component enterprises. The association may contain a research institute with its own software group. On the surface, at least, an association appears to be a more viable unit for the production and utilization of software, and one that might be able to deal more effectively with other firms. However, seemingly reasonable reforms in the past have actually produced results opposite those that were intended (e.g., Parrott, 1977). It is as yet too early to evaluate the impact of this reorganization, either in general or with respect to software development. In the US there are a large number of companies that provide professional software services to customers. They range in size from giants like IBM to one-man firms. Some build systems and then convince users to buy them. Others ascertain customer needs, and then arrange to satisfy them. A variety of other services are also offered. Basically they are all trying to make a profit by showing their customers how to better utilize computers. To a considerable extent, the software vendors and service bureaus have created a market for themselves through aggressive selling and the competitive, customer-oriented, development of general purpose l o It is worth noting that enterprises engaged in the development of computer hardware were organized in loose research-production associations before they became generally fashionable.
SOFTWARE IN THE SOVIET UNION
253
and tailor-made products. There is probably no other sector of the American economy with such a rapid rate of incremental innovation.” The best firms make fortunes, the worst go out of business. Adam Smith would have been overjoyed with this industry. The Soviets appear to have no real counterpart to these firms for the customer-oriented design, development, diffusion, and maintenance of software. One enterprise, the Tsentroprogrammsistem ScientificProduction Association in Kalinin, has been publicly identified as a producer of ES user software (Izmaylov, 1976; Ashastin, 1977; Myasnikov, 1977). This organization is under Minpribor. We assume that the Ministry of the Radio Industry, the manufacturer of Ryad in the USSR, has some central software facilities available because of legal responsibilities. Some research institutes, computer factories, and local organizations develop and service software, but complaints about their work is common (e.g., Zhimerin, 1978) and praise is rare. We know little about what any of these places are doing or how they function. The average Soviet computer user does not seem to have many places it can turn to for help. This is particularly true of installations that are not near major metropolitan areas (e.g., Davidzon, 1971; Letov, 1975; ZarVos, 1976). The mere fact that we know so little about Soviet software firms is strong evidence that the volume and pace of their activities must be much below that of the American companies, or at least that benefits to users are limited by a lack of readily available information. Most American computer users are not very sophisticated and need to have their hands held by vendors and service companies. Most Soviet users are less sophisticated. It is inconceivable that the USSR has anything comparable to the American software companies that we do not know about, because then there is no way for the thousands of computer users in the Soviet Union to know about such services either. It is simply not the sort of thing that can be successfully carried on in secret. It must advertise in some way or it will not reach its customers. Soviet installations are now pretty much on their own with regard to applications software. The open literature seems to confirm this with articles on how “Such-and-Such Production Enterprise’’ built an applications system for itself. There are few articles on how some research institute built something like a database management system that is now being used at scores of installations in a variety of ways. Currently, Soviet installations are building lots of fairly obvious local systems. This pace may actually slow down once these are up and running because there are few effective mechanisms for showing users what they might do next. Unfortunately, there appears to be no study of the US software industry that would enable us to be more specific.
S . E. GOODMAN
Considerable potential for improvement exists. Although there do not seem to be many commercially developed software products in widespread, operational use, there have been quite a few articles on ASUs that are being developed with this goal (e.g., Bobko, 1977). Many of these are for management information systems intended for general or industryspecific users. There is high-level push for standardization of ASUs and the increased commercialization of software (Myasnikov, 1976; Zhimerin, 1978). Sooner or later, as they gain experience, some of the industrial and academic institutes that are doing software work will evolve into viable software houses. There are other possibilities. Right now computer installations are building up in-house software capabilities to meet their own needs. After a while there is bound to be some local surpluses of various kinds. We might see the gradual development of an unplanned trade in software products and programmers among enterprises. This sort of trading goes on all over the economy, and there is substantial opportunity for software. Finally, it is not inconceivable that a little unofficial free enterprise might evolve, as it does in plumbing and medicine. Small groups of bright young programmers might start soliciting moonlighting tasks. The extent of the software service problem may go beyond applications software. We know little about how new operating systems releases are maintained or distributed to users, although in 1976 the All-Union Association Soyuz EVM Komplex was established, along with local affiliates like Zapad EVM Komplex in the Ukraine and Moldavia, to service centrally both hardware and software (Trofimchuk, 1977). We do not know who produces the new releases or how changes are made. The Soviets are not in the habit of soliciting or seriously considering a broad spectrum of customer feedback. The research institutes that maintain the ES operating systems may only communicate with a few prestigious computer centers. New releases are probably sent on tape to users12who are not likely to get much help should local problems arise. New releases may well necessitate considerable local reprogramming, particularly if the users modify the systems software to their own needs. Once an installation gets an operating system to work, there is a tendency to freeze it forever (Reifer, 1978). There is a widespread users’ attitude that accepts the software service situation and is thus a major obstacle to progress. The legendary tradition for endurance of the Russian people, and the vertical structure and shortage of resources that strongly favor the vendor’s position, makes poor service a chronic and pervasive feature of life in the USSR. Improvement in the service aspects of the computer industry are taking place more slowly than are improvements in production. Most Soviet users can do I* This is actually an optimistic assumption. There is no evidence that new releases are not sent in a printed form that might require a major effort by users to put up on their machines.
SOFTWARE IN THE SOVIET UNION
255
little more than complain (complaints that would get at the core of the problem are politically unacceptable), and wait until the leadership perceives that the problem is serious enough to do something constructive. The Soviet Union has no counterparts to the market power of the average consumer and the flexibility for creating mutually desirable business arrangements that have built up the impressive commercial software industry in the United States. The introduction of computers into Soviet management practice has been coming along slowly. Conservative applications, like accounting systems, seem to be the rule. The use of simple management information and process control systems is gradually increasing. Although there is some Soviet research on the utilization of computer techniques for decision analysis and modeling management problems (Ivanenko, 1977), little seems to be put into practice. Soviet managers tend to be older and more inhibited than their American counterparts. The system in which they work stresses straightforward production rather than innovation and marketing decisions. Soviet economic modeling and simulation activity stress the necessity of reaching a “correct socialist solution,” and is not oriented toward being alert for general and unexpected possibilities in a problem situation. Furthermore, Soviet industry has learned not to trust its own statistics, and there may be a big difference between “official” and actual business practice. What does one do with a computer system for the “official” operational management of an enterprise when actual practice is different‘?Does one dare use the computer to help manage “expediter” slush funds, under-the-counter deals with other firms? A recent case indicates that these are serious problems (Novikov, 1978; WashPost, 1978). Soviet programmers may be in an odd position with respect to industrial management. It is not clear that the managers know what to do with them. Firms are oriented toward plan fulfillment; they are not as information oriented as their American counterparts. The work of a programmer is often not directly related to the enterprise’s plan, nor is his function as readily perceived as that of, say, a secretary or janitor. Management has to figure out what to do with these people and somehow measure their value to the enterprise. This is a big burden, and many of the older, highly politicized industrial managers are probably not up to doing this well. It will take the Soviets at least as long to learn to use their machines effectively as it took us.13 The USSR can claim what is potentially the world’s largest management application-an ASU for planning the entire Soviet economy l3 Americans should be reminded that some US management groups behaved similarly during the 1950s. The insurance industry, now among the largest and most committed computer users, is a notable case in point.
256
S. E. GOODMAN
(OGAS). The Soviets have been talking about a network of computer centers for this purpose since the late 1950s. An often cited plan calls for a hierarchy consisting of a main Gosplan center in Moscow, 80 regional centers, and 4000 local centers (Chevignard, 1975). Data will be consolidated upward and plans will be passed downward in this treelike structure. The literature on the subject is large, and this is neither the place to review nor to analyze the project except to comment briefly on some software-related aspects. On the surface, of course, it is ridiculous for the Soviets to talk about such an undertaking when data communication between computer centers often takes the form of someone carrying a deck of cards crosstown on a bus. The Soviets do not understand the operation of their own planning practices well enough to write down a useful set of specifications for the super software system that would be necessary to support such a large, highly integrated, and comprehensive network. The system is primarily a political football that is being struggled over by Gosplan and the Central Statistical Administration. From a software standpoint, it has helped them to start thinking, in some detail, about important problems like standardization, documentation, data-reporting procedures and formats, and the usefulness of their own statistics (Ekongaz, 1977). It has also spurred considerable investment in an assortment of data-processing systems. These products are useful and the experience is desperately needed. 3.2 Internal Diffusion
Before Ryad, the dissemination of software products and services was accomplished through a variety of mechanisms including national and regional program libraries, user groups, and informal trades. None of this was particularly effective or well organized [see references listed on p. 112 of Davis and Goodman (1978)l. For example, some libraries were little more than mail-in depositories that were not properly staffed, indexed, or quality controlled (Dyachenko, 1970; Galeev, 1973). The development of the Unified System was accompanied by a greater appreciation of the limitations of part practices. Ryad hardware would be pitifully underutilized if each user installation were left with an almost empty machine and expected to do all its own programming. This would have defeated the whole purpose of the new system. The creation of the Unified System, with its common hardware and software base, is a major step in the alleviation of the technical difficulties of portability-the transfer of software from one installation to another. The hardware mixes and self-maintenancepractices of the pre-Ryad days were severe limitations to portability. It should be noted however that this
SOFTWARE IN THE SOVIET UNION
257
in itself does not guarantee portability of systems. Programs developed at one IBM 360 installation in the West are not necessarily trivially transferable to another. Local differences in hardware and softwareincluding differences in operating systems-may make this difficult. Ryad marks a singular development in Soviet computing history: Its vendors are providing complete and modern operating systems and utility programs to all users. We do not know what the vendors are doing beyond this to promote standardization and dBusion. Standardization is an important form of diffusion since it facilitates portability and centralized maintenance. In the US, software standards exist primarily through the activities of important vendors; government efforts have had some success (notably with COBOL) but tend to be less effective (White, 1977). With their hierarchical system, one would think that the Soviets are in a particularly strong position to promote standardization and diffusion. For example, the detailed specifications for a programming language can be incorporated in an official State Standard (GOST) that has the force of law. Compilers that conform to this GOST could then be built for widely used computer models by centralized software groups and distributed to the users of these models. It would literally be against the law to change the syntax at an installation. Such a standard exists for the ALGAMS language (GOST, 1976). We do not know to what extent the Soviets are trying to standardize software in this way. We do not even know how this has affected the use of ALGAMS, a language that has been in use since the mid-1960s. Many programs must have been written in a lot of local variants of ALGAMS during this time. Are they being rewritten to run on compilers for the standardized version? Does the State Standard effectively encourage future programming, on the new computers, in this language that was specifically designed against the limitations of Soviet hardware of the mid-l960s? The Ministry of the Radio Industry, which has a legal near-monopoly over the production of mainframe computers, is in a strong position to push this kind of standardization and diffusion, but seems to have little motivation to work very hard at it. To some extent Minpribor acts as a competitive and mitigating influence. The Minpribor Minister, K. N. Rudnev, has been a dynamic force in promoting standards and customer service, and Minpribor has established the only publicly announced national customer software service. Since the Soviets currently seem to be doing better with hardware than software, perhaps one way to gauge software service is to see what is happening with hardware service. In 1977 the Council of Ministers “obliged” all ministries and departments to provide for centralized technical service for computers (Trofimchuk, 1977). Although it is not clear what these obligations are, it is clear that the extent and quality of this service
258
S . E. GOODMAN
leaves much to be desired (Fadeev, 1977; Perlov, 1977; Taranenko, 1977; lzvestiya, 1978). We find situations where a Ministry X has a geographically centralized service facility for its own enterprises using certain computer models. An enterprise in that area with that model, but under a different ministry, cannot use the service. This kind of bureaucratic fragmentation pervades all computing services and is a major obstacle to diffusion. In addition to the software services provided by the hardware vendors, diffusion in the US is greatly facilitated by independent software outlets. We would conjecture that relatively few of the successful independent software ventures in the US were started and principally staffed by people with only an academic background. IBM and other computer companies have been the real training grounds for these entrepreneurs, not the universities or government facilities like the National Bureau of Standards. It is, however, primarily the academics that the Soviets seem to turn to for help with software problems. This does not appear to have done them much good, and it is diacult to see where, in the Soviet institutional structure, they will be able to create an effective substitute for the American computer companies to train and diffuse aggressive and imaginative software specialists. As we noted earlier, the Soviets are in the early stages of developing their own counterparts to these firms, but it is as yet too early to do much more than speculate on the possibilities and their chances for success. User groups are also vehicles for software diffusion. Before Ryad, the Soviets tried several user groups. Lack of interest, the lack of sufficiently large user bases, poor communications, large geographical distances, a lack of hardware vendor support, and assorted bureaucratic aggravations severely hampered these efforts. Furthermore, the existence of many installations were secret, membership in some groups required security clearances, and lists of centers using the same models were probably not readily available. The BESM-6 and M-20/220/222 user groups seem to have been the most successful. These machines were particularly favored by the military and other high-priority users, and the importance of the clientele and their applications had to be a significant factor in these relative successes. These two groups hold regular technical meetings and have built up respectable libraries over the last 1&20 yr. It is likely that both had active support from the hardware developers and manufacturers. Most of the other user groups do not seem to have worked out as well. There is a Ryad-user group, but current indications are that it is not much more effective than the others (Taranenko, 1977). To be really successful, the Ryad users would have to be broken down into specific model
SOFTWARE IN THE SOVIET UNION
259
groups and each of these would have to be supported by the specific enterprises that developed that model’s hardware and systems software. Even then, a group’s effectiveness might be geographically confined. The Soviets have a respectable number of conferences and publications on computing, although efforts in this direction are handicapped by a lack of professional societies that are as active as the ACM, SIAM, and the IEEE. The Soviet Popov Society for electrical engineers does not engage in the same level of activity. In the USSR, the ministries and some particularly active institutes, such as the Institute of Cybernetics in Kiev, sponsor conferences and publications. Each year, they hold a few large national-level conferences and perhaps a couple dozen small, thematic conferences. Occasionally, the Soviet Union hosts an international meeting. Conference proceedings are neither rapidly published nor widely disseminated. Until 1975, with the publication of Progrummirovunie, there was no generally available software journal in the USSR. Articles on software were rare, theoretically oriented, and distributed over an assortment of other professional journals. Few journals are widely circulated or timely. At least two relatively substantive journals, Elektronnaycr Tekhniku Ser. 9 and Voprosi Radioelektroniki Ser. EVT, are restricted. In the West, some of the most timely information appears in periodicals like Datamation that are sustained by vendor advertisements. Soviet vendors do not have the motivation, outlets, or funds for advertising. They seem to have little interest in letting anyone know what they are doing. The Soviets claim to have “socialized knowledge” and it is thus easier to diffuse scientific and technical information in the USSR than it is in the capitalist countries. “Soviet enterprises are all public organizations, and their technological attainments are open and available to all members of society, with the exception of course of information classified for military or political reasons. The public nature of technological knowledge contrasts with the commercial secrecy that is part of the tradition of private property in capitalist countries. Soviet enterprises are obliged not only to make their attainments available to other enterprises that may wish to employ them but also actively to disseminate to other enterprises knowledge gained from their own innovation experience. The State itself subsidizes and promotes the dissemination of technological knowledge through the massive publication services of the All-Union Institute for Scientific and Technical Information [VINITI]” (Berliner, 1976).14This sounds better in theory than it works in practice. While services like those provided by VINITI and efforts to establish national programming libraries (Tolstosheev, 1976) are unquestionably useful, they do not provide the I‘ Not surprisingly, VINITI is at the forefront of Soviet work in large information retrieval systems.
260
S . E. GOODMAN
much broader range of diffusion services available in the US. Capitalistic commercial secrecy is overstated; very little remains secret for very long. The Soviets have no real counterpart for the volume and level of Western marketing activity. By comparison, lists of abstracts of products that have not been properly quality controlled for commercial conditions, that have no real guarantees or back-up service cannot be expected to be as effective a vehicle for diffusion. The Soviet incentive structure not only does not encourage dissemination of innovation particularly well, but it also often promotes the concealment of an enterprise's true capabilities from its superiors. The vertical structuring of the Soviet ministerial system works against software diffusion. Responsibility is primarily to one's ministry and communication is up and down ministerial lines. It is much easier to draw up economic plans for this kind of structure than it is for those with uncontrolled horizontal communication. Furthermore, each ministry appears determined to retain full control of the computing facilities used by its enterprises. In the West, software diffusion is a decidedly horizontal activity. Data processing and computing personnel and management talk to each other directly across company and industry lines, and people are mobile in a wide-open job market. This communication is facilitated by active professional organizations. Such arrangements do not exist to anywhere near the same extent in the USSR. It is not only the ministerial system that mitigates against the really effective encouragement of direct producer-customer horizontal economic activity. Often the various layers of local Communist Party organizations perform the role of facilitating horizontal exchanges. The Party needs visible activities that justify its existence and authority, and this is one of the most important. No serious erosion of this prerogative is possible. However, it is much easier for a local Party secretary to get a carload of lumber shipped than it is for him to expedite the delivery of a special purpose real-time software system. He can take the lumber away from a lower priority enterprise, but what can he do to get the bugs out of the software? He can throw extra people on the job, but that will probably only make matters worse. Software projects tend to react badly to the "Mongolian horde" approach often favored by the Soviets. The detailed enterprise level software transactions cannot be managed by politicians. This problem affects the diffusion of technical R&D to production enterprises in general. Software is an extreme case because it is so difficult to manage under any circumstances. One mechanism that has evolved to facilitate technical work is the emergence of very large enterprises and research institutes that are capable of handling most of their own needs in-house. Thus one finds many enterprises who own and operate comput-
SOFTWARE IN THE SOVIET UNION
261
ing facilities entirely on their This is basically a defensive reaction that improves local viability in a highly constrained environment. Globally, the wide distribution, limited use, and hoarding of scarce resources, particularly personnel, in bloated organizations is counterproductive. The Party and government do recognize this and have shown themselves prepared to give up some control to obtain increased efficiency in innovation. Most of these changes have related to highly technical R&D matters over which they have had little effective control anyway. Changes include the already discussed corporationlike associations and R&D contract work, and also reforms in innovation incentives and prices for new products (Berliner, 1976). This represents progress and will help the development and diffusion of software. 3.3 Stages in the Software Development Process
The Soviet literature is missing the detailed articles on software engineering that are so abundant in the Western literature. This would seem to indicate a lack of widespread appreciation of and serious common concern about the technical, economic, and management problems that plague the stages of development of large software systems. As they gain more experience, this situation is likely to change. Articles on programming methodology are beginning to appear in East European publications (e.g., InforElek, 1977), and the Soviets should soon follow. Such articles will become more common and, in time, there will be papers on case studies, programming productivity experiments, chief-programmer teams, etc. Until such studies are published, we have to content ourselves with a cursory description of some of the problems they are probably having with the various phases of the software development process. There are several nearly equivalent breakdowns of these stages. We will use the following list: producer-client get-together; requirements specification; system design; implementation; testing; maintenance; and documentation. Of course, the software development process is not a single-pass through this list. There are assorted feedback loops, iterations, and overlaps. In particular, documentation is not a distinct stage, but an activity that should pervade every stage. Nevertheless, the list suits our purposes. Prc~dircc~r-clitvit gct-together. This can obviously happen in one of two ways. Either the software producer seeks out the client or vice versa. The Soviets have trouble both ways. Producers in the USSR are not in the Is Computer rental seems to be nonexistent. Rental arrangements would complicate service obligations for the hardware manufacturers. There is a serious effort to establish large, "collective-use," computer centers. and these may eventually prove successful.
S.
E. GOODMAN
habit of seeking out customers. On the other hand, most Soviet enterprises are still naive customers for software. They do not know what they want or need or what is available. We know almost nothing about how Soviet firms negotiate software work, but they must be having even greater difiiculties than we have in the US in negotiating price, time, and manpower needs. In general, the Soviets themselves do not know how they determine prices for new products (Berliner, 1976).16 The annual plans of both the producer and client must limit the flexibility of the arrangements that can be made, and there is a serious shortage of experienced software specialists. Requirements specification. This refers to the translation of customer needs into a statement of the functions to be performed by the software system. The specifications should be at a level of detail that will make it possible to test the product unambiguously to see if they have been met. They serve the producer by making its task clear. This stage clearly demands good communications between the producer and client, something Soviet enterprises are not noted for in general. This stage also requires a great deal of patience and sympathy on the part of the software firm, something that is in short supply at most Soviet research institutes. Experience shows that software specifications change almost continuously as a result of the changing needs, better perception on the part of the customer, or because of problems encountered by the producer. It is important that the client regularly monitor system development progress and that the producer be receptive to client input. If not, then it is almost inevitable that the wrong product will be built. Given their highly centralized economic and political structure, the Soviets are in a position to take requirements specifications quite a bit further than any of the developed noncommunist countries. As we noted earlier, they can specify national (or lower level) standards that would be legally binding. Some serious effort to do this has been undertaken by the State Committee for Science and Technology and other agencies for ASUs (Myasnikov, 1976; Zhimerin, 1978). However, the rigidity of these requirements are being resisted by the enterprises, who want systems that are tailored to their individual desires (Bobko, 1977). As time goes on, and more and more individually tailored systems are built by the enterprises themselves and outside contractors, it will become more difficult and disruptive to impose requirements specifications from above. One can The Polish ELWRO-Service firm uses simple formulas based on unit prices for assembly language instructions. Price appears to be determined primarily by the number and type of instructions in the object code of the software (Mijalski, 1976). The USSR has been slow to appreciate the economic aspects of software development. It came as something of an initial shock to the Soviets when they learned that Western companies expected to be paid more than simple service fees for the software that they had built.
SOFTWARE IN THE SOVIET UNION
263
easily imagine the attractiveness of such uniform standards to the central planners and the opportunities they provide to overcome some of the systemic d8iculties that affect Soviet software development and diffusion. However, it is one thing to have the power to impose standards, but quite another to do it well. The technical problems are enormous. It will be very interesting to see what becomes of these efforts. System design. A good design is usually put together by a few talented people. The Soviet Union does produce such people. Right now, for the reasons discussed earlier and others yet to be noted, they lack experience and number. Their design options are also more restricted than those of their American counterparts since they have far fewer software and hardware tools and building blocks available, Zmplementufion. This generally refers to coding the design and getting the system up to the point where there are no obvious errors or where such errors are unimportant and can be patched. It is the most straightforward of the stages. However, it can be made unpleasant by a lack of hardware availability and reliability. Ryad has eased both of these problems considerably. It can also suffer from a lack of well-trained programmers and of available installation user services. These problems are not deeply systemic and we should see a steady improvement in the Soviet ability to handle this phase of software development. Testing. This is the verification that the coded programs and other components of the system satisfy the requirements specification. This stage generally ends with customer acceptance of a supposedly error-free or error-tolerant system. It involves program testing and consultation with the client as to the satisfaction of his needs. Testing often accounts for almost half of the total preacceptance development effort of large software projects. Soviet strength in mathematics and their interest in programming theory may eventually place them among world leaders in the field of formal proofs of program correctness. However, this is an abstract area that currently has little practical impact. Testing large complicated systems or real-time software is a completely different matter. We have seen little in the Soviet literature that realistically and specifically comes to grips with these problems. They do use a commission to approve computers for production and use, but we do not know if there is a counterpart for software. Software testing is also not the sort of activity that would be expected to show up on any of their measures of institute or enterprise productivity and is thus likely to suffer accordingly. Good system testing is a difficult and complex activity that requires highly skilled people. However, it is a frustrating and low profile thing to do. In light of common Soviet personnel utilization practices, it is likely to be assigned to the lowest ranking neophytes. To a considerable extent, Soviet problems with this stage are basically a
264
S. E. GOODMAN
matter of acquiring experience in building large software systems. It has taken the US a long time to learn to struggle with these difficulties, and the Soviets will have to go through the same painful learning experiences. One place where systemic considerations might be important again relates to customer docility. If the software developers can get away with not taking responsibility for the errors that are passed on to the user, then this is what will happen. The effort devoted to checkout is directly related to customer power. Mainrenancr. This refers to the continued support of a program after its initial delivery to the user. It includes the correction of errors that slipped through the checkout phase, the addition of new capabilities, modification to accommodate new hardware or a new operating system, etc. Good maintenance clearly extends the lifetime and economic value of software. Maintenance costs in the West are now running around 4 6 6 0 % of the total life cycle cost of major software systems (Boehm, 1977). As one extreme example, some Air Force avionics software cost about $75 per instruction to develop, but the maintenance of the software cost almost $4000 per instruction (Trainor, 1973). Maintenance can either be done by the original developer, the customer, or a third party. Extensive third-party arrangements currently seem out of the picture in the USSR, but could become important if software standardization becomes a reality to any appreciable extent. Vendor/producer maintenance requires a high quality of customer service and will be slow to develop there. It appears that the usual procedure has been for the customer to do its own maintenance. This could result in local modifications that would eliminate compatibility and lead to the resistance of centrally supplied updates or improvements. Documentation. Documentation encompasses design documents, comments in the code itself, user manuals, changes and updates, records of tests, etc. To be most effective and accurate, it should be done concurrently with all the other stages. This is not a particlarly interesting activity, and is often slighted unless there exists pressure on the software development group to do it. Good documentation can make checkout and maintenance much easier; poor documentation can cause terrible problems. It is difficult to see where serious pressure for the documentation of ordinary software would come from in the USSR. It is another activity that does not show up well in the measures of productivity. Customer pressure is not likely to be effective. Pressure in the form of State Standards will get software documented; but without strong customer involvement there is really no way to control quality and poor documentation can be a lot worse than none at all. This is likely to remain a long-term problem.
SOFTWARE IN THE SOVIET UNION
265
The almost total lack of convenient Xerox-like services in the USSR is a factor that adversely affects all the stages of the software development process. This is a means to quickly and reliably record and distribute changes in specifications, documentation, test data, etc. This capability is particularly important for large projects involving frequent updates that need to be seen by many people. The absence of fast photocopying facilities can lead to unnecessary delays and costly and dangerous loss of sychronization among the project subgroups. In a similar vein, there is a shortage of good quality user terminals. 3.4 Manpower Development
The training of technically competent software personnel and raising the computer consciousness of management is an important task in the development of a national software capacity. This diffuses and enhances the capability to produce and utilize software effectively, and is the ultimate source of products and services. The USSR trains more mathematicians and engineers than any other country. Both the quantity and quality of mathematical education in the Soviet Union, from the elementary school level (Goldberg and Vogeli, 1976) through postgraduate training, is at least as good as that in the US. For the most part, Soviet managers have engineering rather than business degrees (Granick, 1961). One might think that, with this personnel base, they would be in an unusually good position to rapidly develop a large-scale national software capacity. However, it is one thing to develop a strong national mathematics curriculum. It is quite another to train and utilize, say, a quarter million professional quality programmers and systems analysts (about half the number in the US) and a couple million scientists, engineers, administrators, and businessmen who do applications programming as part of their professional activities. This requires equipment. One does not become a skilled programmer unless one spends a lot of time programming. Schools and industrial training centers are generally low on the priority list for computer allocation. By 1976, Moscow State University, a school comparable in size to UC Berkeley, but with a curriculum much more oriented toward science and engineering, had among the best central computing facilities of any university in the USSR. This consisted of two BESM-6 machines, one of which was to be used in a new time-sharing system with 25 terminals. They were expecting to augment this with two ES-1020s by early 1977. The first ES- 1030 to go to a higher educational institution went to Leningrad State, another large prominent university, in 1975 (Solomenko, 1975). A major engineering school, the Moscow Aviation Institute, was
266
S . E. GOODMAN
still limited to a Minsk-22, a BESM-2, and two Minsk-32 computers in its computing center as of early 1976. These three universities are at the top of the educational hierarchy. The vast majority do much worse. As a result of this situation, there are many students still spending time learning to write small applications and utility programs in machine language for the medium-scale Minsk and Razdan computers and a host of small second- and third-generation computers such as the Mir, Nairi, and Dnepr lines. This may not be as fruitless as it seems, since a lot of these models are still in use in the general economy. The situation is currently changing. The important objective should be to get respectable numbers of the smaller Ryad models into the educational system. Once this is done, students will be trained on the dominant national hardwarehystems software base, and their immediate postgraduation value will be increased considerably. Ryad production capacity is such that this is likely to happen by the early 1980s. The software side of computing as an academic discipline went through an extended infancy that started in 1952 with A. A. Lyapunov’s first course in programming at Moscow State University (an interesting account of the early days can be found in Ershov and Shura-Bura, 1976), and lasted until the end of the 1960s. Not surprisingly, the new Soviet perspective on computing that emerged by the late 1960s included an appreciation of the need to train a much larger number of programmers and systems analysts. To help meet this need, separate faculties in “applied mathematics” were established around 1970 at universities in Moscow, Leningrad, Novosibirsk, and Voronezh (Novozhilov, 1971). In addition to these, and other more recent (e.g., Sabirov, 1978), separate faculties, computer science is also taught under the auspices of mathematics and electrical engineering departments. The Soviet academic community has a strong theoretical tradition. Peer group status considerations, and a shortage of hardware, tend to reinforce this bias. Thus there is considerable pressure to do esoteric computer science to maintain respectability among colleagues (Novozhilov, 1971). Many instructors have had little practical training of their own. So, for example, computer science under a mathematics faculty would be strongly oriented toward numerical analysis, formal logic, and automata theory. There was essentially no opportunity for a student to learn about such things as practical database management systems. Industrial cooperation programs have had only limited success in establishing a better theory/practice balance. Soviet university students getting on-the-job training at research institutes and industrial enterprises are often given menial tasks. The quality of university level education in the USSR varies consid-
SOFTWARE IN THE SOVIET UNION
267
erably across subject lines. Outstanding centers of learning in mathematics exist at many places. Training in mathematics and in some of the mathematically oriented science and engineering fields is as good there as anywhere in the world. On the other hand, the academic study of history and politics is severely circumscribed, rigid, and pervasive (the degree requirements for all technical fields include heavy course loads and examinations on Soviet ideology). Education in the range of subjects that lie between mathematics and the ideologically sensitive areas, including all of the engineering disciplines, seems to be more narrowly focused and rigid than it is in the US [see Granick (1961) for some interesting first-hand observations]. We do not have a good picture of how CS education is evolving in the Soviet Union, but it is likely that it is some kind of hybrid between mathematics and engineering. By US standards, it is probably heavy on mathematics and light on practical programming work. As more hardware becomes available at schools, as instructors gain more practical experience themselves, and as Soviet industry pushes to have its needs met, we can expect to see CS education move closer to US models. Although there are frequent complaints about the shortage of programmers and software specialists, there is little quantitative information on the output from the higher educational institutions or the shortfall that is perceived to exist. In addition to university-level training, there is also substantial activity in the large number of vocational institutes and night school programs. One thing is certain, there is currently an unprecedented effort under way to expand the base of people who can make use of the new computers. Where once 10,000 copies of a programming or software text was a large printing, now books on the ES system are appearing in quantities of 50,000 (Khusainov, 1978), 80,000 (Naumov et al., 1975), and 100,OOO (Agafonov et a/., 1976). Considerable efforts continue to be expended on software for second-generation machines, especially for the Minsk-32 (Zhukov, 1976-43 ,OOO copies). The problem of raising the computer consciousness of management is only part of the more general task of modernizing Soviet management structure, training, and practice. The magnitude of the problem is enormous. “Soviet sociologists have estimated that 60% of all administrative personnel in industry-including directors, deputy directors, chief engineers, heads of service departments, and shop foremen-are in their 50s and 60s. It is estimated that in the next 5-10 yr, when 30- and 40-yr-olds will move into responsible positions, approximately four million people will have to be trained for administration. This will amount to 40% of all such positions in industry. The number of managerial specialists (presumably above the shop level) to be brought into industry is estimated at 1.5 million” (Hardt and Frankel, 1971). In spite of much talk about improving
268
S . E. GOODMAN
managerial training along the lines of American models, little is apparently being done in practice (Holland, 1971a) and certainly nothing is being done on the scale just described. It is difficult to imagine how the American models would be effective in the context of Soviet economic institutional structure. Most consciousness raising will have to evolve on the job. 4. Software Technology TransfeP7
For the most part, the influence of the West on Soviet software development by the mid-1960s was via the open literature. Although this influence was very important (Ershov and Shura-Bura, 1976), the level of technology transfer was weak and there was not much product transfer. The reasons for this include the lack of suitable hardware, an underdeveloped interest in nonnumeric computing, the theoretical orientation of Soviet computer scientists, and the weak position of computer users.l* With the change of perception of computing that led to the Ryad undertaking, there came a commitment to produce and install complex general purpose computer systems in large enough numbers to make it necessary to upgrade general software capabilities. During the last decade, the rather low-key, localized, almost academic, Soviet software community has evolved into a serious industry with a long term and intensive program to acquire software products and know-how from abroad. There are several reasons to think that software technology would be particularly easy for the USSR to obtain from the rest of the world. This is an extraordinarily open technology. Most of the basic ideas and many of the details necessary to produce a functionally equivalent product are available in open sources. It is much more difficult to hide “secrets” in the product itself than is the case with hardware, and the distinction between Parts of this section are adapted from Goodman (1978). A more complete discussion of the nature and control of this problem is in preparation (CTEG. 1979). Is On rare occasions, influential users would take matters into their own hands. An important use of FORTRAN in the USSR stemmed from interest in Western applications programs on the part of physicists at the Joint Institute for Nuclear Research in Dubna and the Institute of High Energy Physics in Serpukhov. They had had considerable exposure t o the CDC applications programs at CERN in Switzerland and other research centers. Their interest and influence led t o the purchase of a CDC 1604, including software, that was installed at Dubna in 1%8 (Holland, 1971~).The CDC FORTRAN compiler was translated, line by line, into the machine language of the Soviet BESM-6 so that the applications programs could be run on this machine [the result has become known as “Dubna FORTRAN” (Saltykov and Makarenko, 1976)j. Here is an instance where active contact with the West produced a real stimulus to go out and get some useful software. However, this was a transfer that was not diffused much beyond BESM-6 users.
SOFTWARE IN THE SOVIET UNION
269
product and technology transfer is often blurred. Relatively little software is proprietary and much that is can still be obtained. Sources of information are abundant: conferences, journals, books, manuals, discussion panels, program listings. software libraries, consulting groups, and vendors. The Soviets have a large trained scientific/engineering manpower base1s that should be capable of absorbing the contents of foreign work and putting together similar products of their own. The successful appropriation of the complex IBM S/360 operating systems is proof that they can do this on a large scale. On the other hand, there are reasons why software technology transfer may not be as easy as it appears. Direct product transfers often run into problems at hardware interfaces. Even small differences in donor and borrower hardware can make conversion difficult. The Ryad hardware is effectively a functional duplication of S/360, but it is not identical to it. It may have taken the Soviets and their CEMA partners almost as long to adapt the DOS/360 and OS/360 operating systems to their Unified System hardware as it took IBM to build these systems in the first placC. Furthermore, it is possible for an unwilling donor to make it painful and time consuming to copy its products, e.g., by only releasing software in object code form or by inserting “time bombs” (code that destroys critical portions of the system after the passage of a certain amount of time or after a preset number of uses). Some of our most advanced software products cannot be transferred because the Soviets lack appropriate hardware. Most importantly, it is extremely difficult to effectively master the techniques and skills of software engineering and management. 4.1 Mechanisms for Software Technology Transfer
This subsection describes the active and passive mechanisms by which software technology is transferred. We adopt the definitions used in the Bucy Report (Bucy, 1976): Active relationships involve frequent and specific communications between donor and receiver. These usually transfer proprietary or restricted information. They are directed toward a specific goal of improving the technical capability of the receiving nation. Typically, this is an iterative process: The receiver requests specific information, applies it, develops new findings, and then requests further information. This process is normally continued for several years, until the receiver demonstrates the desired capability. Passive relationships imply the transfer of information or products that the donor has already made widely available to the public.
The term “passive” is used primarily in reference to donor activity. The receiver may be very active in its use of passive mechanisms. 19
They claim 25% of the world’s total of “scientific workers” (Ovchinnikov, 1977).
270
S . E. GOODMAN
An illustration of how the terms “active” and “passive” will be used in the context of software transfers might be helpful. There are two kinds of proprietary software: that which is generally available to the public and that which is not. The purchase of a publicly available system, perhaps with some basic training and maintenance service, is passive, even though the buyer might become very active in distributing or duplicating the software. The sale of software that is not publicly available would be considered a more active relationship. The donor is clearly contributing more than what is normally and widely available. If sale is accompanied by advanced training, then the donor relationship is that much more active. “How to build it yourself” lessons from the donor will be considered very active even if such services are publicly available. Listed below are a sample of mechanisms that can be used to transfer software products and know-how. They are roughly ranked by the level of donor activity. (One can easily imagine specific examples that might suggest some reordering, but this list is adequate for our purposes.) Joint ventures Sophisticated training (e.g.. professional-level apprenticeships) Licenses with extensive teaching effort Consulting Education of programmers and systems analysts Sale of computing equipment with software training Detailed technical documents and program listings Membership in Western user groups Documented proposals Conferences Academic quality literature Licenses and sale of products without know-how Commercial and exchange visits Undocumented proposals Commercial literature and exhibits
More active
Donor activity
More passive
-
The term “license” needs to be defined here since normal patent considerations do not apply to software (Mooers, 1977). We will take it to mean the provision of a copy of the software to a receiver who then has the recognized right to distribute it extensively within some domain. The distinction between this and a simple product sale may be a matter of a paragraph in a contract, but the distinction is worth making. It is easy to produce multiple copies of software products and the Soviets have control of a large, and economically isolated, domain of computer installations. Of course, some categories of software are more transferable than others. The following four rough (partially overlapping) categories are listed in order of decreasing ease of transferability:
SOFTWARE IN THE SOVIET UNION
271
( 1) Applications programs written in higher-level languages. (2) Systems and utility programs in machine or higher-level language form. (3) Large, highly integrated systems (e.g., multiprogramming operating systems, real-time air traffic control systems). (4) Microprograms and other forms of “software” that are very closely interfaced with and dependent on the hardware on which they are run and which they control.
Although it is difficult to quantitatively merge our two lists because the effectiveness of software transfer is so strongly dependent on such highly variable factors as local programmer talent, there is a clear qualitative merge. As one goes down the list of transfer mechanisms, their effectiveness decreases for all software categories. For any given mechanism, its effectiveness decreases as one goes down the list of software categories. If any of the listed mechanisms should be candidates for US Governmeht control, they should be the top four listed. An example will illustrate the third mechanism. In their efforts to adapt DOS/360 and OS/360 to the Ryad-1 models, it would have been of considerable help to the & M A countries if they had had a deal with IBM, or with a company that had considerable experience in the business of making non-IBM hardware compatible with IBM software, which would have included a license for the software and a teaching effort that would have showed them how to adapt it to the Ryad hardware.*O This effort might have gone further and included help in designing the hardware with the compatibility goal in mind. Such an arrangement could conceivably have substantially reduced the time it took the Soviet Bloc to acquire and adapt the systems on their own, and it could have provided a tremendously valuable transfer of know-how. Simple product transfer should be of much less concern than know-how transfers that will enable the Soviets to build up their indigenous software capabilities. The top four mechanisms transfer considerable know-how and short circuit the painful experience of learning through timeconsuming and costly trial and error. The delay of the acquisition of indigenous capability is a major goal of antitransfer measures. The lesser forms of licensing and product sale on our list are not as important. For example, IBM might have sold the Soviets a “subscription” to the S/360 operating systems. This could have taken the form of supplying one copy of each of the operating systems on tape plus informa20
No such arrangement actually existed
272
S.E. GOODMAN
tion on new releases, etc., and a license for distribution to all Ryad users. They would have had to adapt the software to the Ryad hardware themselves. This would have saved them the effort of obtaining it through other legal channels or by covert means, and IBM would have been able to cultivate good will and get some compensation for the use of its products. There was no effective way to deny the CEMA countries access to copies of this software; it was simply available from too many sources. The time that the Soviets could have saved through such an arrangement would not have been great. The time it took to adapt the software to Ryad must have been much greater than the time it took to acquire copies of it. But the importance of the passive mechanisms to software technology transfer to the USSR should not be underestimated. We think they contributed significantly to the massive appropriation of IBM S/360 software for the Unified System. They also d e c t training programs at all levels. Much written and oral material is available on subjects that relate to the management of software projects and on software engineering. These are areas where the Soviets are particularly weak. Passive material is publicly available in huge quantities. The Soviets have been using these sources for almost three decades and their influence is obvious in almost all Soviet software work. Before Ryad, hardware problems limited the use of direct product transfer. Now, of course, direct product transfer is an important source of useful software. However, it is important to point out that passive sources are of limited value for several of the most important phases of the software development process. These include the customer/developer relationship, certain aspects of specification and design, the higher levels of testing and integration, and maintenance. All of these stages become particularly important for the construction and support of large, highly integrated systems. Active soumes are also abundantly available in the West. In contrast to a hardware area, such as disk manufacturing technology where there are only a few really valuable potential donors, there are literally thousands of places in the US alone that have something to offer the Soviets in software products and know-how. The Soviets do not use these active mechanisms to the extent that they could (but there has been substantial improvement since the mid-1960s). USSR restrictions on foreign travel by its citizens is a severe constraint. The people they send out are helpful to their effort, but they are too few. They would have to send several hundred software specialists to the West each year, and most of these for extended study, to affect continuously and broadly their software capabilities. The leadership is very unlikely to do this. It might be politically and economically more acceptable for them to import Western experts who would spend extended periods showing
SOFTWARE IN THE SOVIET UNION
273
them how to manage large software projects and how to upgrade computer science education. They might also buy full or part ownership in Western software firms, and use the Western talent employed there to develop software for their use. The ELORG centers in Finland and Belgium represent moves in this direction. A more unlikely form of long-term joint venture would be to permit partial Western ownership and management of a Soviet enterprise. Some of the other CEMA countries allow this, but so far the USSR has not. On the other hand, the internal political situation in the USSR may change to militate against both the import and export of computer scientists after the death or retirement of Brezhnev (Yanov, 1977). 4.2 External Sources
The W360-Ryad software transfer was facilitated with considerable help from Eastern Europe, particularly the GDR. It is hard to avoid the impression that the “per capita” software capabilities of the GDR, Hungary, Poland, and Czechoslovakia exceed that of the USSR. This is probably the result of many factors, not the least important of which is the greater contact these countries have with the West European computing community. They have also had much more direct and indirect experience with IBM products. We would not go so far as to conjecture that the indigenous capacity of the USSR may have been such that the S/360-Ryad software transfer would have failed without help from Eastern Europe, but the role of these countries should not be underestimated. Hungary, the GDR, Poland, and Czechoslovakia are not only important conduits for facilitating software technology transfer from the West to the USSR, but they are also valuable sources of products and know-how in their own right. They have potentials for providing active mechanisms for personnel training, consulting, etc. As communist countries using a common hardware base, they are the best external source the Soviets have for many industrial- and management-related software products. They are also external sources that can be used directly in the development of military software systems, such as those used for command, control, and communications, for the Warsaw Pact. Problems that inhibit active involvement with the West, such as travel restrictions and a lack of hard currency, are much less important. Perhaps the greatest value of the Eastern Europeans to the USSR is as models for institutional arrangements and economic practices. In particular, Hungary and the GDR seem to be much more effective in the areas of software customer service and systems software support than the Soviets. Marxist theory may be opposed to an uncontrolled gaggle of profit-
274
S.E. GOODMAN
hungry, privately owned firms operating outside of a central plan, but it is hardly opposed to the development and maintenance of products that benefit the economy. The Hungarians and East Germans are showing that it is possible for communist economies to provide minimum basic software services to general users. The Soviet Union might learn much from them. Western Europe is both a conduit for US software technology and a source of innovation in its own right. Not surprisingly CEMA has easier access to US multinational corporations through their European companies than through US-based enterprises. The shared culture and language across East and West Germany makes for a particularly low barrier. Notable West European developments of direct value to the USSR include: the Europe-based ALGOL project, CERN in Geneva, SIMULA-67 (Norway), the Aeroflot airlines reservation system (France), and the International Institute of Applied Systems Analysis located in Austria.21 The most important sources are West Germany, England, and France. Others are Belgium, Denmark, Holland, Norway, and the politically neutral Austria, Finland, Sweden, and Switzerland. Joint ventures with firms in these countries may become an important transfer mechanism. The US remains the ultimate source of software technology. In addition to the IBM-Ryad connection, Soviet interest stems from the facts that more R&D is done here than anywhere else and that we are the largest repository of software products and information. The US is clearly the world leader in the development of large military-related software systems. English is an effective second language for almost all Soviet computer scientists. Finally, there is the nontrivial matter of prestige and the “Big Two” psychology. From the standpoint of career enhancement, it is more desirable for a Soviet citizen to come here than to travel anywhere else. Russian pride also seems to suffer less when they borrow from us than when they have to go to the Hungarians or Germans for help. The Soviets make less extensive use of the Japanese as a source of software technology transfer. This is partially because Japan has not developed as much software, although their potential is high. However, Japanese software institutional arrangements and development/ maintenance practices may be even less suitable for Soviet emulation than those of the US. In general, it would appear that cultural and language barriers make Japan a less attractive source than the West. A distinction should be made between commercial software, which is produced for sale, and noncommercial software, which is used only by its developers or distributed free or at a nominal cost. The latter is usually I’ It should be noted that all five of these important examples involve substantial US participation.
SOFTWARE IN THE SOVIET UNION
275
produced by nonprofit organizations (e.g., universities, government labs) and may be of high quality, but most of it is not tested, maintained, or protected to the same extent as commercial software. Commercial software has become a multibillion dollar business in the West. Over the last 10-15 yr, the companies in this industry have become increasingly aware of protecting the proprietary value of their products. The protective mechanisms include a variety of legal and technical options that appear to be reasonably effective, although in such a dynamic industry it is usually only a matter of time before a competitor comes up with an equivalent or better product. We do not know how well people who have been trained in the West, or in jointly operated facilities in Eastern Europe, are actually used. It is not clear if they are used in any particularly effective way to promote the internal diffusion of know-how. It is important to recognize that technology transfer will not solve the most basic Soviet software problem. The Soviets may be able to import enough turnkey plants for manufacturing automobiles to satisfy their perceived need for cars, but they are going to have to develop the internal capacity to produce most of their own software. There are thousands of computer centers in the USSR and they all need large quantities of software. Contacts with foreign sources are limited to only a very small fraction of the Soviet computing community. The orifice is too small to import the volume of software technology required, and internal systemic problems prevent its effective diffusion and use. Finally, these computer installations have their own special software needs that reflect their way of doing business and Western commercial applications software products may be unsuitable for these needs. 4.3 The Control of Software Technology Transfer
In terms of in-depth understanding and the avoidance of repetition of mistakes, the Soviets do not seem to have profited much, so far, from the Western experience. They consistently make the same mistakes and suffer from the same growing pains as we did. These are often exacerbated by difficulties of their own making. The Soviets have been making extensive use of Western software technology, but they currently seem satisfied with the short-term goals of recreating selected Western systems at a rate that may actually be slower than that with which the West built these systems originally. It is inevitable that the Soviets will significantly improve their software capabilities as they acquire more experience and as their perception of the role of software matures. Their interest in software technology transfer as
276
S . E. GOODMAN
a means of acquiring both products and know-how is likely to continue indefinitely. Furthermore, as their own indigenous capabilities improve, they can be expected to make more extensive and more effective use of transfer mechanisms and opportunities.** We could make life more difficult for them through various forms of control. Unfortunately, software control is more complex than the control of the kinds of technology that were used as examples in the Bucy Report (1976). The range of software products and know-how is enormous. Some of it, such as microprograms and sealed-in software (Mooers, 1977), can be controlled in much the same way as hardware. Some of it, such as numerical and combinatorial algorithms, is essentially mathematics and beyond any effective control (although the translation from algorithm to program is often nontrivial). Most software lies somewhere between hardware and mathematics, and we do not know how to protect this part of the spectrum. There are several different ways to try to control software. We could try to focus on those categories that are most amenable to control. For example, we might attempt to control the large, highly integrated systems, and give up on the applications programs in high-level languages and the small systems routines. Another approach would be to try to control the mechanisms of transfer. Thus we might regulate licenses with extensive teaching effort, joint ventures, etc., and ignore the mechanisms at the lower end of the list. A third approach would be to base controls on the potential military uses of the software. We could try to regulate software for pattern recognition, communications networks, test and diagnostic systems, command and control. Finally, we might use some form of “time-release’’ control over many products. All four approaches have serious definitional and enforcement problems. For example, where does technology transfer for management information systems end and transfer for command and control uses begin? Not the least of the problems faced by efforts to regulate software transfer is its huge number of sources. There is nothing that can be done to seal up all the ways to obtain noncommercial products and know-how from universities, laboratories, and the open literature. One of the largest single sources of readily obtainable software is the US Government, including the Department of Defense. Assorted US Government 22 We should not forget that transfers can go both ways. The Soviets will someday develop software products and ideas that American firms or the US Government would want to use. Systemically, we are capable of more effectively exploiting and diffusing software advances than are the Soviets. There is potential for a two-way flow of software technology transfer. Although the flow into the US would be much smaller than the outflow, we would probably make better use of what we get.
SOFTWARE IN THE SOVIET UNION
277
agencies literally give it away to the Warsaw Pact countries (CSTAC TT, 1978). It is more realistic to try to control commercial software. Commercial software houses distribute products that are usually better tested, maintained, and documented than noncommercial products. Regulation may delay acquisition or discourage the Soviets from obtaining as much software as they might if there were no regulations. The best specific forms of control might be the protective mechanisms the commercial software producers use against their market competitors. With their growing appreciation of the cost and value of software has come the desire and effort to protect it more effectively. The trend with the IBM operating systems is a case in point. With S/360, almost all the software was available to anyone who wanted to take it. With S/370 and the 303X models there is a continuing tendency to collapse the “free” software around the nucleus of the operating system, and give the user an option to purchase the rest. Unfortunately, some marginal US companies might be willing to let the Soviets have more than they would their market competitors. Thus government regulation would be necessary to supplement company practices. One of the best forms of control of software transfer is the control of hardware. Sophisticated software systems often require sophisticated hardware. Soviet general purpose hardware has reached a 360-level plateau and it will not be easy for them to develop advanced telecommunications and real-time processing hardware for widespread use. Software is basically an evolutionary technology. The closest it comes to revolutionary developments results from opportunities presented by major advances in hardware availability. Control over hardware technology transfer may be an effective way to delay acquisition of advanced capabilities. A basic problem in the formulation of controls is that we really do not understand what benefits past transfers have given the Soviets or how well they utilize transfer mechanisms. Did the CEMA countries learn more by adapting the S/360 operating systems to Ryad than they would have if they had built new operating systems? Would the latter have taken the same time as the former? Did they use fewer people than it would have taken them to do something more innovative? They devoted many manyears of many of their best people to the piecemeal debugging of the huge S/360 operating systems on the Ryad hardware. This time might have had a higher payoff, from the standpoint of enhancing their indigenous software capabilities, if these people had invested the effort in acquiring experience in large system design, integrated test design, and planning for maintenance.
278
S . E. GOODMAN
Perhaps the best statement on software technology transfer was made by Edward Teller: The Russians know all of our secrets; they know what secrets we will develop two years in advance. We are still ahead in electronic computers because there are no secrets. Without secrets we are advancing so rapidly that the Russians can’t keep up.
Although this statement was made in reference to computer hardware, and in that context it may be a bit exaggerated, there is no better short appraisal of the software situation. Ultimately the diversity, openness, and high rate of incremental innovation of the American software industry is the best protection it has.
5. ASummary
By and large, the development of Soviet computing has tended to follow closely the US technical pattern, but it has differed considerably in terms of timescale, philosophy, institutional arrangements, capital decisions, and applications. In particular, the USSR was slow to appreciate data processing, and to develop the technology to support the widespread use of digital computers for such applications. It is only within the last ten years that the Soviets have given the priority and resources necessary to the production and installation of complex general purpose computer systems in large enough numbers to make it necessary to improve greatly their software capabilities. Prior to this, computer use in the USSR was limited primarily to smalland medium-scale scientific and engineering computations. There was no well-developed business machines industry, nor was there an important clientele with a perceived need for such equipment. The Soviet military and technical communities were less enamoured with computers than their US counterparts, and the Soviet computer industry developed only to the extent that it could meet the relatively limited needs of these users. As a result, Soviet computing went through an extended infancy, with its first-generation hardwarehoftware period lasting to the mid- 1960s, and the second generation continuing into the early 1970s. Very few machines large enough to necessitate a real operating system were built. Storage and peripheral limitations restricted the use of high-level languages. The Soviets did not build the software that allowed computers to be used by many people who had not had much technical training. The shift to the production of large numbers of general purpose computers was forced by internal economic pressures and, most likely, by the greater needs of the military. A substantial commitment necessitated the
SOFTWARE IN THE SOVIET UNION
279
development of much improved hardware capabilities-most important, the creation of an upward compatible family of computers with a respectable assortment of peripherals. The Ryad-1 family, an effective functional duplication of the IBM S/360, provides the Soviets and their CEMA partners with a reasonably modem mainframe capability. The computers of this family have been produced in considerable quantities and give Soviet users an unprecedented assortment of peripherals and level of reliability. Soviet satisfaction with this hardware can be inferred from their continued development of evolutionary upgrades of the early Ryad models, and their further commitment to the development of the Ryad-2 series, based on the IBM S/370. There has been a parallel, although somewhat smaller, major effort devoted to the development of minicomputers: first to the ASVT models, and more recently to the CEMA SM family. This new, and substantial, base of mainframe, minicomputer, and peripheral hardware has done much to give the Soviets a broad general purpose national computing capability. Although backward by the current US state-of-the-art, it seems clear that it was never the intention of the Soviets to try to push the frontiers of either hardware or software technology. The overall plan was to put a large number of respectable, compatible computers into productive use as expeditiously as possible. To this end, it was not surprising that the Soviets decided to use an already proven technology in the form of the IBM S/360. Although they seriously underestimated many of the difficulties of trying to duplicate a sophisticated foreign technology, they felt that the appropriation of the S/360 systems and applications software was the safest and quickest way to achieve their primary goal. The Soviets have been making extensive use of Western software products, particularly in the area of systems software. They currently seem satisfied with the goal of recreating selected Western software systems at a rate that may actually be slower than that with which the West built them in the first place. In terms of in-depth understanding and the avoidance of repetition of mistakes in their own work, the Soviets do not seem to have profited much from the Westem experience. They consistently make the same mistakes and suffer from the same growing pains as we did. These are often exacerbated by difficulties of their own making. The Soviet economic system, with its vertical hierarchical structure and lack of opportunity for developing flexible horizontal relationships, seems ill-structured to support many of the software development practices that have worked well in the US. A strong hierarchical bureaucratic environment and a conservative incentive system effectively discourages entrepreneurial innovation. Enterprises are severely constrained with respect to
280
S . E. GOODMAN
findingboth suppliers and customers. By US standards, there is very little consumer pressure exerted on vendors, except in the case of special (e.g., military or Party) customers. The net result is that most Soviet computer installations have to rely on their own internal assets for most of their software needs. It is not even clear if they get much outside help with the systems software supplied by the hardware vendors. There is a long standing users’ attitude that accepts this situation and is thus a major obstacle to progress. These difficulties exist in many other sectors of the Soviet economy, but they appear to be especially serious in the sophisticated service-oriented software industry. In spite of these problems, Soviet software has come a long way during the last decade. The appropriation of IBM software for the Unified System was a substantial technological achievement. The volume, level, and intensity of software development and use has risen greatly over this period. The indigenous software capacity of the USSR has become respectable by world standards. Furthermore, as their own capabilities improve, they can be expected to make more extensive and more effective use of technology transfer mechanisms and opportunities. The Soviet software industry will need some systemic changes to function more effectively. It is not clear to what extent such reforms will be allowed to take place. As the Soviets gain more experience, and as their perception of the value and problems of software matures, we can expect to see considerable improvement take place within the present economic structure. Past reforms, such as the establishment of the corporationlike associations and the expansion of contracting arrangements, seem likely to benefit software development. But improvements within the existing economic environment would still appear to leave the Soviet software developmenthser community well short of the systemic advantages enjoyed by its US counterpart. Since software is such a widely dispersed and pervasive technology, it would seem impossible to permit major reforms here without also permitting them elsewhere in the economy. It is doubtful if the needs of computing alone could build up enough pressure to bring about broad reforms in the economic system. The USSR has lots of potential software talent and lots of need. The two have to be brought together in some effective way. Various forms of technology transfer from the West might serve as catalysts to help bring this about. However, the changes that will come will take time and have to fit in with the way things are done in the Soviet Union. Simple foreign transplants will not work. No reforms in a country that is as self-conscious as the USSR can be successful if they are divorced from Russian and Soviet traditions. But the history of Soviet computing shows a strong dependence on Western, and particularly US, technology and social/
SOFTWARE IN THE SOVIET UNION
281
economic practices. Effective solutions to Soviet software problems will have to have a hybrid character.
ACKNOWLEDGMEN.rS A N D DISCLAIMER Various forms of support are gratefully acknowledged. These include a NSF Science Faculty Fellowship, a Sesquicentennial Associateship from the University of Virginia, and a research fellowship from the Center for International Studies at Princeton University. Other support has come from the US Army Foreign Science and Technology Center, Department of Defense, and FIO/ERADCOM, Ft. Monmouth, New Jersey. Continued collaboration with N. C. Davis of the CIA has been particularly valuable. A couple dozen scattered paragraphs have been excerpted from Davis and Goodman ( 1978) and Goodman (1979). Permission has been granted by the ACM and the Princeton University Press. Some duplication was necessary to keep this article reasonably selfcontained. Permission to use the quotations from Berliner (1976) in Section 3.2 and from Hardt and Frankel (1971) in Section 3.4 was granted by the MIT and Princeton University Presses. The views expressed in this paper are those of the author. They do not necessarily reflect official opinion or policy of the United States Government.
REFERENCES* Agafonov, V. N. et a / . (1976). "Generator of Data Input Programs for ES Computers." Statistika, Moscow. Amann, R., Cooper, J. M., and Davies, R. W., eds. (1977). "The Technological Level of Soviet Industry." Yale Univ. Press, New Haven, Connecticut. Andon, F. 1. ef d.(1977). Basic features of data base management system OKA. Upr. S i s f . Mash. (2). Ashastin, R. (1977). On the efficiency with which computer equipment is used in the economy. Plan. Khoz. May ( 3 ,48-53. Aviafion W m k (1972). July 31, 14. Babenko, L. P., Romanovskaya, L. M., Stolyarov, G. K., and Yushchenko, E. L. (1%8). A compatible minimal COBOL for domestic serial computers. Presented at A U Conf. Prog., 1st. l%8. Bakharev, 1. A. ef a / . (1970). Organization of teletype operations and debugging in the IAM operating system. Presented at A U Conf. Prog., 2nd. 1970. Barsamian, H. (1968). Soviet cybernetics technology: XI. Homogeneous, general-purpose, high-productivity computer systems-a review. Rand Corporation, RM-5551-PR. Bauer, F. L.. ed. (1975). "Advanced Course on Software Engineering" (Munich, 1972). Springer-Verlag, Berlin and New York. Belyakov, V. (1970). How much does a computer need? Izvrstiya March I , 3. Berenyi, 1. (1970). Computers in Eastern Europe. Sci. Am. Ocf., 102-108. Berliner, J . S. (1976). "The Innovation Decision in Soviet Industry." MIT Press, Cambridge, Massachusetts.
* Foreign publication titles translated.
S . E. GOODMAN Bespdtov, V. B., and Strizhkov. G . M. (1978). The equipment complex of the Unified System for teleprocessing of data. Prih. Sist. Upr. (6), 9-12. Betelin, V. B.. Bazaeva, S. E., and Levitin, V. V. (1975). ”The ES-ASVT Small Operating System.” Order of Lenin Institute of Applied Mathematics, Academy of Sciences USSR, Moscow. Bezhanova, M. M. (1970). The Tenzor system program. Presented at AU C O I I ~Pro,q. . . 2ud. IY70.
Bobko. I. (1977). Testing. Sov. ROSS~VU July 12. 2. Boehm, B. W. (1975). The high cost of software. I n (Horowitz, 1975). 3-14. Boehm, B. W. (1977). Software engineering: R&D trends and defense needs. I n (Wegner, 1977), 1.1-1.43. Bornstein, M., and Fusfeld, D. R., eds. (1974). “The Soviet Economy: A Book of Readings” (4th ed.). Irwin, Homewood, Illinois. Borodich, L. 1. er d . (1977). “ALGAMS-DOS ES Computers.” Statistika, Moscow. Bratukhin, P. I., Kvasnitskiy, V. N., Lisitsyn, V. G., Maksimenko, V. I.. Mikheyev, Yu. A., Cherkasov, Yu. N . , and Shohers, A. L. (1976). “Fundamentals of Construction of LargeScale Information-Computer Networks” (D. G . Zhimerin and V. I. Maksimenko, eds.). Statistika, Moscow. Brich, Z. S., Voyush, V. I., Deztyareva, G . S., and Kovalevich. E. V. (1975). “Programming ES Computers in Assembly Language.” Statistika, Moscow. Bucy, J . F. (1976). An analysis of export control of U.S. technology-a DoD perspective. Defense Science Board Task Force Report (Feb. 4) on Export of U.S. Technology, ODDR&E. Washington, D.C. Burtsev, V. S. (1975). Prospect for creating high-productivity computers. Sov. Sci. 45. Burtsev. V. S. (1978). Computers: relay-race of generations. Pmi*c/ci April 4, 3. Buxton, J. M., Naur. P., and Randell, B. (1976). Software Engineering: Concepts and Techniques Proc. NATO Conferences, Garmish, West Germany, Oct. 7-11, 1%8; Rome, Oct. 27-31, 1%9. Petrocelli/Charter, New York. Campbell, H. (1976). Organization of research, development and production in the Soviet computer industry. RAND Corporation, R-1617-PR, Santa Monica, California. Cave, M. (1977). Computer technology. In (Amann ct a / . , 1977). 377-406. Chevignard. D. ( 1979. Soviet automation and computerization effort. DeJ N f i t . ( f o r i s ) Feb., 117-128. CSTAC TT (1978). Transfer of computer software technology. Jan. 20 Report of the Technology Transfer Subcommittee of The Computer Systems Technical Advisory Committee (CSTAC), U.S. Dept. of Commerce. CSTAC 11 (1978). COMECON Ryad-11 Report (Rev. I , Feb. 22). Foreign Availability Subcommittee (CSTAC), U.S. Dept. of Commerce. CTEG (1979). Computer Networks: An Assessment of the Critical Technologies and Recommendations for Controls on the Exports of Such Technologies. Computer Network Critical Technology Expert Group (CTEG), U.S. Dept. of Defense May.). Davidzon, M. (1971). Computers wait for specialists. Sot. I t i d . Dec. 2.5, 2. Davis, N. C., and Goodman, S. E. (1978). The Soviet Bloc’s Unified System of computers. ACM Comp. Sum. 10 (2), 93-122. Del Rio, B. (1971). Cybernetics: To a common denominator. Pravda Jan. 5. Dittert, W. (1978). ES-I055 computer. Szamirasrechnika (Hung.)Jan. Doncov. B. (1971). Soviet cybernetics technology: XII. Time-sharing in the Soviet Union. Rand Corporation, R-522-PR, Santa Monica, California. Drexhage, K. A. (1976). A survey of Soviet programming. SRI Tech. Rep. Proj. 3226.
SOFTWARE IN THE SOVIET UNION
283
Drozdov, E. A., Komarnitskiy, V. A., and Pyatibratov, A. P. (1976). “Electronic Computers of the Unified System.” Mashinostroyenie, Moscow. Dyachenko, A. I. (1970). Ukrainian Republic fund of algorithms and programs. Mekh. Avtom. Kontrola ( I ) , 61. Efimov, S. (1970). Horizontals and verticals of control. Izvestiya, March 8, 3. Ekonomicheskava Gazeta (1976). Sept. 1. Ekonomicheskaya Gazeta (1977). April 15. Electrical Engineering Times (1977). Nov. 28. ElorgKomputronics ( 1978). Growth of Soviet computers and Indo-Soviet cooperation: new high rate performance third generation computer ES-1033 from the USSR. May-June advertisement by VIO Elektronorgtekhnika, a Soviet foreign trade organization, and by Computronics, India, its marketing agent in India. Ershov, A. P. (1966). ALPHA-An automatic programming system of high efficiency. J . ACM 13, 17-24. Ershov, A. P. (1%9). Programming 1968. Avtomaf. Program. (Kiev) 3-19. Ershov, A. P. (1970). Problems of programming. Vestn. Akad. Nauk S S S R (6), 113-1 15. Ershov, A. P. (1975). A history of computing in the USSR. Datamation Sept., 80-88. Ershov, A. P., and Shura-Bura, M. R.(1976). Directions of development of programming in the USSR. Kibernetika 12 (6). 141-160. Ershov, A. P., and Yushchenko, E. L. (1969). The first All-Union conference on programming. Kibernetika 5 (3). 101-102. Evreinov, E. V., and Kosarev, Yu. G., eds. (1970). “Computer Systems.” Akademiya Nauk, Novosibirsk; translated and published for the National Science Foundation by Amerind Pub]., New Delhi, 1975. Fadeev, V. (1977). Who is to answer for computer servicing? Sots. Ind. Sept. 4, 2 . Filinov, E. N., and Semik, V. P. (1977). Software for the SM-3 UVK. Prib. Sist. Up. (lo), 15-17.
First AU Conf. Prog. (I%@. “First All-Union Conference on Programming” ( 1 1 vols.), Kiev. Excerpts translated in Sov. Cybern. Rev., July 1969, pp. 20-65. Galeev, V. (1973). The collection is large but the benefit is small. Pravda Jan. 8. GDR (German Democratic Republic) (1976). Ryad Overview. In “Rechentechnik Datenverarbeitung”.Memorex, McLean, Virginia (distr.). Gladkov, N. (1970). A help or a burden? Pravda Oct. 16 ( 2 ) . Glushkov, V. M. (1971a). The computer advises, the specialist solves. Izvestiya Dec. 15, 3. Glushkov, V. M. et al. (1971b). ANALITIK (Algorithmic language for the description of computational processes with the application of analytical transformations). Kibernetika 7 ( 9 , 102-134. Glushkov, V. M., Ignatyev, M. B., Myasnikov, V. M., and Torgashev, V. A. (1974). Recursive machines and computing technology. Proc. AFIPS Conf., pp. 65-70. North Holland, Amsterdam. Godliba, 0..and Skovorodin, V. (1967). Unreliable by tradition. Pravda Aug. 27, 3. Goldberg, J. G., and Vogeli, B. R. (1976). A decade of reform in Soviet school mathematics. CBMS Newsletter 0ct.-Nov. Goodman, S. E. (1978). The transfer of software technology to the Soviet Union. Presented at “Integrating National Security and Trade Policy: The United States and the Soviet Union,” a conference held June 15-17 at the U.S. Military Academy, West Point, New York. Goodman, S. E. (1979). Soviet Computing and Technology Transfer: An Overview. World Politics 31 (4).
S . E. GOODMAN Corlin, A. C. (1976). Industrial reorganization: the associations. I n (Hardt, 1976), 162-188. GOST 21551-76 (1976). “USSR State Standard for the Programming Language ALGAMS.” Standartov, Moscow. Granick, D. (1961). “The Red Executive.” Anchor Books, Garden City, New York. Hardt, J. P., and Frankel, T. (1971). The industrial managers. I n (Skilling and Griffiths, 19711, 171-208. Hardt, J. P., ed. (1976). “The Soviet Economy in a New Perspective.” Joint Economic Committee U.S.Congress, Washington, D.C. Holland, W. B. (1971a). Kosygin greets first class at management institute. S o v . Cyhem. Rev. May, 7-1 1. Holland. W. B. (1971b). Party congress emphasizes computer technology. Sov. Cybem. Rev. July, 7-14. Holland, W. B. (1971~).CDC machine at Dubna Institute. Sov. Cyhern. Rev. July, 19-20. Holland, W. B. (1971d).Commentson anarticle by M. Rakovsky.Sov. Cvhrm. Rev. Nov.. 33. Horowitz, E., ed. (1975). “Practical Strategies for Developing Large Software Systems.” Addison-Wesley, Reading, Massachusetts. IBM RTM (1970). ”Introduction to the Real-Time Monitor (RTM).” GH20-0824-0. IBM Systems Reference Library. IBM DOS (1971). “Concepts and Fac es for DOS AND TOS.” DOS Release 25, GC 24-5030-10, IBM Systems Reference Library. 1BM S/360 (1974). IBM S y s t e d 3 6 0 Models 22-195. I n .“Datapro Reports” (7OC-491-03). Datapro Research, Delran, New Jersey. IBM S/370 (1976). IBM System/370. I n “Datapro Reports” (7OC-491-04) Datapro Research, Delran, New Jersey. Informircio Elektronihcr ( H u n g . ) (1977). Three articles on structured programming and program correctness verification. 12 (4). Infotech Information Ltd. ( 1972). “Software Engineering.” International Computer State of the Art Report. Maidenhead, Berkshire, England. lnfotech Information Ltd. (1976). “Real-Time Software.” International Computer State of the Art Report. Maidenhead, Berkshire. England. ISOTIMPEX (1973). English language description of the ES-1020. Bulgarian State Trade Enterprise ISOTIMPEX, Sofia. (Untitled, undated, assume issued 1973.) Ivanenko, L. N. (1977). Imitation and game simulation of human behavior in technological and socioeconomic processes. Report on a conference held in Zvenigorod, May 27-June I , 1977. Kihernefiko 13 (3, 150. Izmaylov, A. V. ( 1976). Software system for the ‘Tver-ES’ automated control system. Ref. Zh. Kibem. (8), Abstract No. 86603. Izve.stivu (1978). March 14. 2. Judy, R. W.(1%7). Appendix: Characteristics of some contemporary Soviet computers. I n “Mathematics and Computers in Soviet Economic Planning’’ (J. Hardt et a/., eds.), pp. 261-265. Yale Univ. Press, New Haven. Kaiser, R. G . (1976). “Russia: The People and The Power.” Atheneum, New York. Kasynkov, I. (1977). Izvesfiytr March 4, 2. Kazansky, G . (1%7). Moscow Nedelyn Dec. 4 (7). Kharlonovich, 1. V. (1971). Automated system for controlling railroad transport. Avfom. Telemehh. Svynz (8). 1-3. Khatsenkov, G . (1977). Instantaneously subject to computers. Sol.\. I d . April 24. 1 . Khusainov, B. S. (1978). ”Macrostatements in the Assembler Language of the ES EVM.” Statistika, Moscow. Kitov. A. I., Mazeev, M. Ya., and Shiller, F. F. (1968). The ALGOL-COBOL algorithmic language. I n AU Conf. f r o g . , I s f , 1968.
SOFTWARE IN THE SOVIET UNION
285
Kmety, A. (1974). Demonstration of the R-20 at the capital city office for construction operations and administration. Szamifasfechnika (Budapest) April-May , 1-2. Koenig, R. A. (1976). An evaluation of the East German Ryad 1040 system. Pror. AFIPS Conf., pp. 337-340. Kommunisf (Yerevan) (1977). Nov. 29, 4. Komniunis~(Yerevan) ( 1978). Dec. 3 I , I . Kryuchkov, V., and Cheshenko, N. (1973). At one-third of capacity: Why computer efficiency is low. Izvrstiva June 14, 3. Kudryavsteva. V. (1976a). Sou. Eeloruss. April 25, 2. Kudryavsteva, V. ( 1976b). Sov. Belciruss. July 18, 4. Kulakovskaya, V. P. ef a/. (1973). “Minsk-32 Computer COBOL.” Statistika, Moscow. Kuzin, L. T., and Shohukin, B. A. (1976). “Five Lectureson Automated Control Systems.” Energiya. Moscow. Lapshin, Yu. ( 1976). Maximizing the effectiveness of computer technology. S O I . Itid. Sept. 1.
Larionov, A. M.. Levin, V. K., Raykov, L. D., and Fateyev, A. E. (1973). The basic principles of the construction of the system of software for the Yes EVM. Upr. Sisf. Mush. May-June (3). 129-138. Larionov, A. M., ed. (1974). “Software Support for ES Computers.” Statistika, Moscow. Leonov, 0. I. (1966). Connecting a digital computer to telegraph communication lines in a computer center. Mekh. Avtom. Proiz. (8). 4 0 4 2 . Letov, V. (1975). Computer in the basement. Izvestiyrr Aug. 22, 3. Liberman, V. B. (1978). “Information in Enterprise ASU.” Statistika, Moscow. Mamikonov, A. G . et ( I / . (1978). “Models and Methods for Designing the Software of an ASU.“ Statistika, Moscow. Meliksetyan. R. (1976). Nrdr/ya Dec. 27, 3. Mijalski, Czelslaw (1976). The principles, production and distribution of the software of MERA-ELWRO computers. Informufyku (Wursuw) Nov., 27. Mitrofanov, V. V., and Odintsov, B. V. (1977). “Utilities in OS/ES.“ Statistika, Moscow. Mooers. C. N. ( 1977). Preventing software piracy. I n “Microprocessors and Microcomputers” (selected reprints from Computer), pp. 67-68. IEEE Computer Society. Mo.skoi~.skcryrrPravtkr ( 1978). April 8. 3. Myasnikov, V. A. (1972). Need for improved computer technology. I z w s t i y u May 27, 2. Myasnikov, V. A. ( 1974). Automated Management Systems Today. Ekon. Organ. Promyshl. Proizv. (6). 87-%. Myasnikov, V. A. (1976). Sov. Ross. Dec. 24. 2. Myasnikov. V. A. (1977). Results and priority tasks in the field of automation of control processes in the national economy of the USSR. Upr. Sisr. Mash. ( K i e v ) Jan.-Feb. ( I ) , 3-6. Myers, G. J. (1976). “Software Reliability.” Wiley, New York. Naroditskaya, L. (1977). New computers are running . . . We audit fulfillment of Socialist pledges. Pruvdu Ukr. Nov. 18, 2. NASA ( 1977). Standardization, certification, maintenance, and dissemination of large scale engineering software systems. NASA Conference Publication No. 2015. Naumov, B. N. (1977). International small computer system. Prib. Sisf. Upr. (lo), 3-5. Naumov, V. V. (1976). Real-Time Supervisor (SRV). Prograrnrnirovanie May-June, 54-60. Naumav, V. V.. Peledov, G. V., Timofeyev. Yu. A , . and Chekalov, A. G. (1975). ”Supervisor of Operating System ES Computers.” Statistika, Moscow. Nove, Alec (1%9). “The Soviet Economy” (2nd ed.). Praeger, New York. Novikov, 1. (1978). They put their AMS up for sale. Prrcvdu March 13, 2. Novikov, N. (1972). Idle computers. Pravdcr Aug. 21.
286
S . E. GOODMAN
Novoshilov, V. (1971). The levels of mathematics. Izvestiya Jan. 17, 3. OECD Report ( 1969). Gaps in technology-Electronic computers. Organization for Economic Cooperation and Development, Paris. Ovchinnikov, Yu. (1977). Science in a nation of developed socialism. Izvestiyci Nov. 18, 2 . Parrott, Bruce B. (1977). Technological progress and Soviet politics. In (Thomas and Kruse-Vaucienne, 1977). 305-328. Peledov, G. V.. and Raykov, L. D. (1975). The composition and functional characteristics of the software system for ES computers. Programmirovanie Sept.-Oct. ( 3 ,46-55. Peledov, G. V., and Raykov, L. D. (1977). “Introduction to OWES.” Statistika, Moscow. Perlov, I. (1977). The ASU-Its use and return. Ekon. Zhizn (Tashkent) (6), 83-86. Pervyskin, E. K. (1978). Technical Means for the Transmission of Data. Ekon. Gaz. June (25). 7. Petrov, A. P. (1%9). “The Operation of Railroads Utilizing Computer Technology.” Transport, Moscow. Pevnev, N. I. (1976). “Automated Control Systems in Moscow and Its Suburbs.” Moskovsky Rabochy, Moscow. Pirmukhamedov, A. N. (1976). “Territorial ASU.” Ekonomika, Moscow. Pleshakov, P. S. (1978). Utilizing Automated Management Systems Efficiently: Computer Hardware. Ekonomicheskaya gazeta, July 31, 15. Rakovsky. M. (1977). Computers’ surprises. Pravda March 2, 2. Rakovsky, M. (1978a). According to a single plan. Pravda Feb. 3, 4. Rakovsky, M. (1978b). On a planned and balanced basis. Ekon. Gaz. June (23). 14. (Quotations from a translation in CDSP Vol. XXX. No. 24, p . 24.) Reifer, D. J. (1978). Snapshots of Soviet computing. Datamation Feb., 133-138. Rezanov, V. V., and Kostelyansky, V. M. (1977). Software for the SM-1 and SM-2 UVK. Prib. Sist. Upr. (lo), 9-12. Robotron (1978). EC- I055 electronic data processing system. VEB Kombinat Robotron Brochure, May 25. Rudins, George (1970). Soviet computers: A historical survey. Sov. Cybern. Rev. Jan., 6 4 4 . Sabirov, A. (1978). Specialty: cybernetics. Izvestiya March 12, 4. Saltykov, A. I., and Makarenko, G. I. (1976). “Programming in the FORTRAN Language” (Dubna FORTRAN for the BESM-6). Nauka, Moscow. Sarapkin, A. (1978). TO new victories. Sov. Beloruss. Jan. 4, I . Second AU Conf. Prog. (1970). Second All-Union Conference on Programming, Novosibirsk. (Translated abstracts in Sov. Cybern. Rev. May, 9-16). Shnayderman, 1. B., Kosarev, V. P., Mynichenko, A. P., and Surkov, E. M. (1977). “Computers and Programming.” Statistika, Moscow. Skilling, H. G., and Griffiths, F., eds. (1971). “Interest Groups in Soviet Politics.” Princeton Univ. Press, Princeton, New Jersey. Smith, H. (1977). “The Russians.” Ballantine, New York. Solomenko, E. (1975). Machines of the Unified System. Leningrudskaya Pravda May 15. Sovetskayu Estoniya (1978). March 15, 2. Sovetskaya Moldavia (1978). Jan. I , 2. Soverskaya Rossiya (1976). Sept. 11, 4. Tallin (1976). First IFAClIFIP Symposium on Computer Software Control, Estonia. Paper titles published in Prograrnmirovaniye (Moscow) (3, 100-102, and Vestn. Akad. Nauk S S S R ( I I ) , 1976, 93-94. Taranenko, Yu. (1977). How to service computers. S o t s . Ind. July 19, 2 . TECHMASHEXPORT ( 1978a). SM EVM Minicomputer Family: SM- I . SM-2. Marketing Brochure. Moscow.
SOfTWARE IN THE SOVIET UNION
287
TECHMASHEXPORT ( 1978b). SM EVM Minicomputer Family: SM-2. Marketing Brochure. Moscow. Thomas, J. R., and Kruse-Vaucienne, U. M.. eds. (1977). “Soviet Science and Technology, .’ pp. 305-328. National Science Foundation, Washington, D.C. Tolstosheev, V. V. (1976). “The Organizational and Legal Problems of Automatic Systems of Control.” Ekonomika, Moscow, pp. 49-50. Trainor, W. L. (1973). “Software-From Satan to Savior.” USAF Avionics Laboratory, Wright-Patterson AFB, Ohio. Referenced in (Boehm, 1975). Trofimchuk, M. (1977). HOWd o YOU work, computer? Pravda Ukrainy Sept. 7. Trud (1977). Jan. 14, 2. Trud (1978a). Jan. 4, 1. Trud (1978b). Nov. 7. Vasyuchkova, T. D., Zaguzoba, L. K., Itkina, 0. G . , and Savchenko, T. A. (1977). “Programming Languages with DOS ES EVM.” Statistika, Moscow. Vodnyy Transport (1977). Riga ship repair plant to use ASU with ‘Tver’ software system. Sept. 24, 4. Ware, W. H., ed. (1%0). Soviet computer technology-1959. Commun. ACM 3 (3), 131-166. Washington Post (1978). The battle of Minsk, or socialist man beats computer. March 28. Wegner, P., ed. (1977). Proc. Conf. on Research Directions in Software Technology. Final version to be published 1978, MIT Press, Cambridge, Massachusetts. White, H. (1977). Standards and documentation. I n (NASA, 1977), 20-26. Yanov. A. (1977). Detente after Brezhnev: The domestic roots of Soviet foreign policy. Policy Papers in International Affairs, No. 2. Institute of International Studies, University of California, Berkeley. Zadykhaylo, I. B. er a / . (1970). The BESM-6 operating system of the USSR Academy of Sciences’ Institute of Applied Mathematics. I n AU Conf. f r o g . , 2nd. 1970. Zarya Vosroka (1976). July 28, 2. Zhimerin, D. G . (1978). Qualitatively new stage. Ekon. Gaz. May 22, 7. Zhimerin, D. G., and Maksimenko, V. I., eds. (1976). “Fundamentals of Building Large Information Computer Networks.” Statistika, Moscow. Zhukov, 0. V. (1976). “Generation of Programs For Data Processing.” Statistika, Moscow. Zhuravlev, V. (1973). Translators for computers. Pravda Feb. 20.
This Page Intentionally Left Blank
Author Index Numbers in italics refer to the pages on which the complete references are listed.
A Adelson-Velskiy, G. M., 61, 98, 114 Adkins, K., 183,225, 227 Agafonov, V. N., 267,281 Akiyama, F., 137, 138, 168 Akl, S.. 97. 115 Alderman, D. L., 189,225 Allen, J., 204, 225 Amann, R., 249,281 Andon, F. I., 248,281 Andrews, H. C., 28.55 Arbuckle, T., 60, 115 Arlazarov, V. L., 61, 98. 106, 114. 115 Ashastin, R., 243, 253,281 Ashiba, N., 185,225 Atal, B. S., 202,225 Atkin. L. R., 61, 98. 117 Atkinson, R. C., 198, 203,225, 226, 227
Babenko, L. P., 237,281 Bailey, D., I I5 Baker, A. L., I 6 8 Bakharev, 1. A., 238,281 Bakker, I., 73, 115 Ballaben, G., 198,225 Barr. A., 198, 203,225, 226 Barsamian, H., 238,281 Bass, L. J., 171 Bass, R., 184,229 Baudet, G. M., 95, 115 Bauer, F. L., 232,281 Bayer, R., 168. 170 Bazeava, S. E., 244,282 Beard, M., 183, 193, 198,225, 226. 229 Bell, A. G.. 219,226 Bell, D. E., 138, 168 Bellman. R.. 106, 115 Belsky, M. S.. 60, 115 Belyakov, V., 281 Benbassat, G. V., 202,228
Benko, P., 115 Berenyi, I., 235,281 Berliner, H., 62, 70. 74, 77, 93, 94, I15 Berliner, J. S., 249, 252, 259, 261, 262,281 Bernstein, A., 60,115 Bespalov, V. B., 248,282 Betelin, V. B., 244,282 Bezhanova, M. M., 238,282 Bitzer, D., 176,226 Bjorkman, M., 159, 168 Blaine, L. H., 208, 211, 212,226, 228 Blum, H., 48,56 Bobko, I . , 254, 262,282 Bobrow, R., 218,226 Boehm, B. W., 232, 264,282 Bohrer, R., 168 Bork. A , , 190,226 Bornstein, M., 249,282 Borodich, L. I., 247,282 Borst, M. A,, 172 Botvinnik, M. M., 60, 115 Bowman, A. B., 171 Bratko, I., 106, 116 Bratukhin, P. I., 243, 247,282 Brian, D., 184, 222,229 Brich, Z. S., 282 Brown, J. S., 215, 218, 219, 220,226 Brownell, W. A., 215,226 Brudno, A. L., 95, 115 Bruning, R., 170 Bucy, J. F., 269,282 Bulut, N., 168 Bunderson, C. V., 189,226 Burton, R. R., 215, 218, 219, 220,226 Burtsev, V. S., 244, 245,282 Buxton, J. M.,232,282 Byrne, R., I15
C Cahlander, D., 73, 115 Campbell, H., 282 289
290
AUTHOR INDEX
Campbell, J. 0.. 203,225 Carr, B.. 217,227 Cave, M.,235,282 Chase, W. G., 109, 117 Chazal, C. B., 215,226 Chekalov, A.G., 241, 267,285 Cherkasov, Yu. N., 243, 247,282 Cheshenko, N., 285 Chevignard, D., 256,282 Church, K. W., 109, 115 Church, R. M.,109, 115 Clark, M. R. B., 106.115 Clinton, J. P. M.,229 Collins, A., 216,228 Comer, D., 162, 163, 169 Cooper, J. M.,249,281 Cornell, L., 137, 169 Crocker, S. D., 61, 98, 116 Cronbach, L. J., 213,227 Curtis, B., 172
D Davidzon, M.,253,282 Davies, R. W., 249,281 Davis, L. S., 28, 40,56, 57 Davis, N . C., 234, 235, 239, 243, 256, 281, 282 Davis, R. B., 176,227 de Groot, A. D., 60, 115 deKleer, J., 218, 226 Del Rio, B., 282 Dewey, J., 227 Deztyareva, G. S., 282 Dittert, W., 244,282 Dixon, J. K., 95, 117 Doncov, B., 237, 238,282 Donskoy, M. V., 61, 98, 114 Douglas, J. R., 85, 115 Drexhage, K. A., 233, 236, 238,283 Drozdov, E. A., 240, 247,283 Duda, R. O., 56 Dugdale, S., 176,227 Dyachenko. A. I., 256,283
E Eastlake, D. E., 61. 98,116 Edwards, D. J., 95, 116
Efimov, S . , 283 Elci, A., 169 Elshoff, J. L., 128, 142, 143, 169, 172 Emam, A. E., 198,228 Ercoli, P., 198,225 Ershov, A. P., 233, 234, 236, 266, 268.283 Euwe, M.,60, 115 Evreinov, E. V., 238,283
F Fadeev, V., 258,283 Fateyev, A. E., 241,285 Faust, G. W., 189,226 Felix, C. P., 131, 172 Filinov, E. N., 244,283 Fine, R., 105, 115 Fitzsimmons, A,, 133, 134, 169 Fletcher, J. D., 183, 184, 203,225, 227, 229 Frankel, T., 267, 281,284 Freeman, H., 48.56 Freeman, J., 55.56 Friedman. E. A., 155, 171 Friend, J., 220,228 Fu, K. S., 55.56 Fuller, S. H., 95, 96,115 Funami, Y.,137, 169 Fusfeld, D. R., 249,282 Futer, A. V., 106, 115
G Galeev, V., 256,283 Gaschnig, J. G., 95, %, 115 Gielow, K. R., 143, 170 Gillogly, J. J., 95, 96, 97, 115, 116 Gladkov, N., 237,283 Glushkov, V. M.,236, 238, 245,283 Godliba. O., 234,283 Goldberg, J. G., 265,283 Goldberg, A., 193, 195,227 Goldin, S., 216,228 Goldman, N . , 201,228 Goldstein, I. P., 201, 215, 217,226, 227 Goldwater, W., 76, 116 Gonzalez, R. C.. 7, 16, 27. 28,40,48,55,56 Good, I. J., 97, 116 Goodman, S. E., 234, 235, 239, 243, 256, 268, 281,282, 283
AUTHOR INDEX Gordon, R. D., 28,56, 131, 133, 16Y Gorlin, A. C., 252,284 Graham, L. R., 284 Graham, R. E., 28,56 Gramlich, C., 203,228 Granick, D., 249, 265, 267,284 Graves, W. H., 208, 212,228 Gray, A. H., 202,228 Green, W. B., 20, 24, 27,56 Greenblatt, R. D.. 61, 98, 116 Grey, F., 8, 56 Griffith. A. K., %, 116 Griffiths, F., 286
291
J Jamison, D., 184, 190,227 Jerman, J., 184, 222,229 Joseph, H. M., 28.56 Judy, R. W., 234,284
K
Kaiser, R. G . , 249,284 Kak, A. C., 7, 16, 27, 28, 40, 48, 55.57 Kanz, G., 229 Kaplan, J., I I6 Kastner, C. S., 173,227 Kasynkov, I., 245,284 H Kazansky, G., 239,284 Kennedy, D., 170 Habibi, A., 16, 56 Halstead, M. H., 131, 137, 143, 150, 156, Kharlonovich, I. V., 238,284 162, 163, 166, 169, 170, 171, 172 Khatsenkov, G., 239,284 Hamilton, M., 183,225, 227 Khusainov, B. S., 284 Hanauer, S. L., 202,225 Kibbey, D., 176,227 Hansen, D., 203,225 Kirnura, S., 185,227 Kister, J., 60, 116 Haralick, R. M., 55.56 Kitov, A. I . , 236,285 Hardt, J . P., 267, 281,284 Hart, P. E.. 56 Klatt, D., 204,227 Klobert, R. K., 170 Hart, T. P.,95, 116 Krnety, A., 285 Harvill, J. B., 170 Hausmann. C., 215, 218,226 Knuth, D. E., 94, 95, 98, 116 Hawkins, C. A., 227 Koenig, R. A., 241,285 Komarnitskiy, V. A., 240, 247,283 Hayes, J . , 61, I16 Kosarev, V. P., 247,286 Herman, G. T., 28,56 Holland, W. B . , 238, 268,284 Kosarev, Yu. G., 238,283 Horowitz, E., 232,284 Kostelyansky, V. M.,244,286 Huang, T. S., 8, 11, 15, 16.56 Kotok, A., 60, 116 Hubermann, B. J., 105, 116 Kovalevich, E. V., 282 Kovasznay, L. S. G., 28.56 Huggins, B., 215,226 Hunka, S . , 185,227 Kruse-Vaucienne, U. M., 287 Hunt, B. R., 28,55 Kryuchkov, V., 285 Kudryausteva, V., 237, 242, 244,285 Hunter, B., 173,227 Kulakovskaya, V. P., 247, 285 Hunter, L., 170 Kulm, G., 154, 170 Kuzin, L. T., 246,285 Kvasnitskiy, V. N., 243, 247,282
I
Ignatyev, M. B., 245,283 Ingojo. J . C., 170 Itkina, 0. G., 240, 287 Ivanenko, L. N., 255,284 Izmaylov, A . V., 253,284
L Laddaga, R.,203,227 Laemrnel, A.. 171, 172 Lapshin, Yu.,243,285
AUTHOR INDEX Larionov, A. M., 24 I , 285 Larsen, I., 193,227 Lasker, E., 76, 116 Laymon. R., 192,227 Leben, W.R., 203,227 Lecarme, 0.. 173,228 Legault, R., 5 , 8,56 Lekan, H. A., 173,228 Leonov, 0. I., 237,285 Letov, V., 253.285 Levein, R. E., 173,228 Levin, V. K., 241,285 Levine, A., 202, 203,227, 228 Levitin, V. V., 244,282 Levy, D., 61,64, 71, 77, 106, 107, 116 Lewis, R., 173,228 Liberman, V. B., 246.285 Limb, J. O., 16,56 Lindsay, J., 203,225 Lipow, M., 171 Lisitsyn, V. G., 243, 247,282 Lloyd, T., 192,227 Lorton, P. V.,Jr., 184,229 Love, L. T., 133, 134, 169, 171, 172
McCabe, T. J., 141, 171 McDonald, J., 21 I , 226 McGlamery, B. L., 27.56 Macken, E., 184, 201,228, 229 Magidin, M., 171 Makarenko, G. I., 268,286 Makhoul, J., 202,228 Maksimenko. V. I . , 243, 247,282, 287 Mamikonov, A. G. 246,285 Marasco, J., 190,226 Marinov, V. G., 208, 212,228 Markel, J. D., 202,228 Markosian, L. Z., 193,227 Marsland, T. A., 69, 73, 116, 117 Max, J., 8,56 Mazeev, M. Ya., 236.285 Meliksetyan, R., 245. 285 Mertz, P., 8,56 Michalski, R., 106, 116 Michie, D., 77, 106, 116 Middleton, D., 8 , 5 7 Mijalski, C., 262,285 Mikheyev, Yu. A.. 243, 247,282
Millan, M. R., 198,228 Miller, G. A., 155, 171 Miller, J., 171 Miller, M., 215, 218,226 Mitrofanov, V. V.,24 I , 285 Mittman, B., 61, 116, 117 Mooers, C. N., 270, 276,285 Moore, R. N., 94, 95, 116 Morningstar, M., 222,229 Morrison, M. E., 70, 116, 117 Moser, H. E., 215,226 Myasnikov, V. A., 234, 237, 245, 247, 253, 254, 262,285 Myers, G. J., 232,285 Mynichenko, A. P., 247.286
N Naroditskaya, L., 244,285 Naumov, B. N., 244,285 Naumov, V. V., 241, 242, 267,285 Naur, P., 232,282 Negri, P.,106, 116 Newborn, M. M., 61, 62, 95, 97, 105, 109, 115, 117 Newell, A., 60, 95, 117 Newman, E. B., 155, I71 Nilsson, N., 95, 117 Nove, Alec, 249,285 Novikov, I., 252, 255,286 Novikov, N., 237,286 Novozhilov, V., 266,286 Nylin, W.C., Jr., 170
0 Odintsov, B. V., 241,285 O’Handley, D. A,, 20, 24, 27,56 Oldehoeft, R. R.. 171. 172 Ostapko, D. L., 171 Ottenstein, K. J., 152, 169. 171 Ottenstein, L. M., 137, 171 Ovchinnikov, Yu.,269, 286
P Panda, D. P..40,56 Papert, S., 201, 217,227
AUTHOR INDEX Parrott, B., 252,286 Partee, B., 201,228 Pavlidis, T., 16, 55,56 Peledov, G. V., 241, 267,285, 286 Penrod, D., 84, 117 Perlov, I., 258,286 Pervyskin, E. K., 286 Peterson, D. P., 8.57 Petrov, A. P., 238,286 Pevnev. N. I., 246,286 Piasetski, L., 106, 117 Pirmukhamedov, A. N., 246,286 Pleshakov, P. S., 243,286 Poulsen, G., 184,228 Prasada, B., 11.56 Pratt. W. K., 16, 27, 28, 40, 48, 55.57 Purcell, S., 218,226 Pyatibratov, A. P., 240, 247,283
Rakovsky, M.,241, 243, 245, 247,286 Randell, B., 232,282 Raykov, L. D., 241,285, 286 Reifer, D. J., 254,286 Rezanov, V. V., 244,286 Rice, J. R., 136, 171 Richter, H., 69, 117 Rieger, C.. 201,228 Riesbeck, C., 201,228 Roberts, M. De V., 60, 115 Robinson, S. K., 171 Romanovskaya, L. M.,237,281 Rosenfeld, A., 7. 16. 27. 28, 40, 48, 55.56, 57 Rubin, M. L., 173,227 Rubin, Z. Z., 161, 171 Rubinstein, C. B., 16.56 Rubinstein. R., 215, 219,226 Rudins, George, 234,286 Russell, B., 228 Ruston, H., 150, 171
S Sabirov, A., 266,286 Sakamoto, T., 185,228 Saltykov, A. L., 268,286 Sanders, W. R., 202, 203,227, 228 Santos, S. M. dos, 198,228
293
Sarapkin, A., 243,286 Savchenko, T. A., 240,287 Schacter, B. J., 40,57 Schank, R. C., 201,228 Schneider, V. B., 137, 169, 171 Scott, J. J., 98, 117 Searle, B. W., 184. 220,228, 229 Seidel, R. J . , 173,227 Semik, V. P., 244,283 Shannon, C. E., 59, 117, 124, 172 Shaw, J . , 60, 95, 117 Shen, V. Y., 131, 151, 161, 172 Sheppard, S. B., 172 Sherwood, B. A., 176,228 Shiller, F. F., 236', 285 Shnayderman, I. B., 247,286 Shohers, A. L., 243, 247, 282 Shohukin, B. A., 246,285 Shooman, M. L., 171, 172 Shura-Bura, M. R., 233, 234, 236, 266. 268, 283 Simon, H. A., 60,95. 109, 117 Skilling, H. G., 286 Sklansky, J., 55,57 Skovorodin, V.,234,283 Slagle, J. R., 95, 117 Slate, D. J., 61, 98, 117 Smith, H., 249,286 Smith, R. L., 183, 193, 198, 202, 208, 212. 228, 229 Smith, S . T., 176, 228 Snow, R. E., 213,227 Solomenko, E., 265,286 Soule, S . , 69, 117 Stein, P., 60, 116 Stevens, A. L., 216,228 Stockham, T. G., Jr., 28, 57 Stolyarov, G. K., 237,281 Strizhkov, G. M., 248.282 Stroud, J. M., 129, 172 Su, S. Y. W., 198,228 Sullivan, J. E., 138, 168 Suppes,P., 183. 184, 190. 193, 195, 197, 198, 201,203, 207, 208, 220, 222,227, 228. 229 Surkov, E. M.,247,286 Symes, L. R., 172
T Tan, S. T., 106, 117 Taranenko, Yu., 258,286
294
AUTHOR INDEX
Tarig, M. A., 172 Thayer, T. A., 138, 171, 172 Thomas, J. R., 287 Thompson, J. E., 16.56 Timofeyev, Yu. A., 241, 267,285 Tolstosheev. V. V., 259,287 Torgashev, V. A,, 245,283 Torsun, I. S., 171 Trainor, W. L., 264,287 Tretiak, 0. J., I I , 56 Trofimchuk, M., 254, 257,287 Turing, A. M., 59, 117
U Uber, G. T., 143, 170 Ulam, S., 60,116
V Vasyuchkova, T. D., 240,287 Vinsonhaler, J., 184,229 Viso, E . , 171 Vogeli, B. R., 265,283 Voyush, V. I . , 282
W Walden. W., 60, 116 Walker, M. A., 172 Walston, C., 131, 132, 172 Wang, A. C., 173,229 Ware, W. H., 234,287 Wegner, P., 232,287
Weiss, D. J., 186,229 Wells, M., 60, 116 Wells, S., 184, 190, 227 Weszka, J. S . , 40.57 White, H., 257,287 Wiener, N., 59, 117 Wilkins, L. C., 16,57 Winograd, T., 229 Winston, P. M., 57 Wintz, P. A., 7, 14, 16,27,28,40,48,55,56, 57 Woodfield, S. N., 148. 172 Woods, W., 201,229 Wu, E-Shi, 199,229
Y Yamaguchi, Y., 11.56 Yanov, A,, 273,287 Yob, G., 217,229 Yushchenko, E. L., 237,281, 283
Z Zadykhaylo, I. B., 238,287 Zaguzoba, L. K., 240,287 Zanotti, M., 184,229 Zhimerin. D. G., 243, 246, 247, 253, 254, 262,287 Zhukov, 0. V., 247, 267.287 Zhuravlev. V.. 287 Zipf, G. K., 172 Zislis, P. M., 153, 172 Zucker, S. W., 40,57 Zweben, S. H., 143, 168, 172
Subject Index
A AIST project, 238 ALGAMS, in Soviet Union, 247, 257 ALGOL, in Soviet Union, 245, 247 ALGOL-COBOL, in Soviet Union, 236 ALGOL-60, in Soviet Union, 236 Algorithm, potential volume and, 122-123 Algorithm generator, 143-144 Aliasing, 4-5 All-Union Association Soyuz EVM Komplex, 254 All-Union Institute for Scientific and Technical Information, Soviet Union, 259 Alpha-beta algorithm, 94-97 Alpha-beta window, in computer chess, 97-98 American Geophysical Union, 156 ANALITIK, in Soviet software, 236 Analogies, method of, in computer chess, 98 Approximation techniques, 10-12 Arbitrary constants, freedom from, 166-168 Area, in digital picture, 49 ASU (automated controllmanagement systems), in Soviet Union, 246-247, 255256 Audio, choice of by university students, 206-207 Autocorrelation, of picture, 50
B BASIC language, in computer chess, I I I BELL chess opening library, 99 BELLE program, 78 Bernay's enumeration theorem, in computer-assisted information. 208-21 I BESM-6 computer, in Soviet Union, 237238, 265 Binary images, coding of, 15-16 Blitz chess, 106-1 10 Border curves, representation of, 43-44 BORIS chess macroprocessor, 110-1 1 1
Boundary volume, in software science, 146-147 BTAM system, in Soviet Union, 240 BUGGY instructional game, 220 Bugs classification and counting of, 137 total vs. validation, 137 Busyness measurement of, 52-53 in pixel classification, 32
C CAI, see Computer-assisted instruction California, University of, 190 CDC CYBER 170 series, 63 CDC CYBER 176 series, 71 CDC 6400 computer, 66 CDC 6600 computer, 61-62 CEMA (Council for Economic Mutual Assistance) countries, 233, 239-240, 244245, 269, 271-272, 277 Chain code, 43-44 curve segmentation and, 46 defined, 45 Change detection, 37 CHAOS program, 71, 73, 87, 98-99 Chess, computer, see Computer chess CHESS CHALLENGER program, 84, 1 1 0 - 1 13 Chess information, differential updating of, 98-99 Chess programs, chess-specific information in, 99-100 see also Computer chess CHESS 2.0 program, 61 CHESS 3.0 program, 68 CHESS 4.0 program, 61-63 CHESS 4.4 program, 64-65, 67 CHESS 4.5 program, 70-72, 74-78, 107 CHESS 4.6 program, 70, 73, 77-80, 82, 84-89, 98-99, 101-103, 105, 108 CHESS 4.7, 61, 90-91, 101, 110-111
295
296
SUBJECT INDEX
“Chunking” concept, in debugging, 137-138 COBOL, in Soviet Union, 257 Codasyl system, in Soviet Union, 248 Coding exact, 9-10 in image processing and recognition, 8-16 types of, 9-16 COKO program, 72, 100 College physics, computer-assisted instruction in, 190-191 Color edges, detection of, 34 Color picture, pixel color components in, 30-3 1
Communist Party, computer industry and, 260-26 1
Community college courses, computerassisted instruction in, 186-190 Component labeling, in representation process, 41-43 COMPU-CHESS microcomputer, 110 Computer-assisted instruction, 173-225 audio uses in, 201-207 in college physics, 190-191 in community college courses, 186-190 in computer programming, 198 current research in, 199-222 in elementary and secondary education, 175- I85
evaluation of, 183-185 future of, 222-225 informal mathematical proofs in, 207-212 in letter recognition by school children, 203-206
in logic and set theory, 191-198 in natural-language processing, 200-201 PLAT0 system in. 176-179 in postsecondary education, 185-199 student modeling in. 212-222 videodisks in, 223 Computer chess, 59-1 14 endgame play in, 100-106 finite depth minimax search in, 93 forward pruning in, 94 future expectations in, 113-1 14 horizon effect in, 93 Jet Age of, 62 mating tree in, 68-69 with microcomputers, 110-1 13 minimax search algorithm in, 92-93 opening libraries in, 99-100
Paul Masson Chess Classic and, 70 principal continuation in, 93 programming in, 110-1 1 I scoring function in, 93 speed chess and, 106-1 10 tree-searching techniques in, 92-99 Computer Curricular Corporation, CAI courses of, 179-185 Computer graphics, defined, 3 Computer program “completely debugged,” 135 comprehensibility of, 134 e.m.d. count for, 136-138 implementation level in, 123-125 machine language in, 120-121 vocabulary of, I 2 I volume concept in, 122 Computer programming, see Programming CONDUIT State of the Art Reports, 185 Connectedness, in representation, 41-42 Contour coding, 10 Contrast enhancement, in image enhancement, 17 Contrast stretching, 18 Control Data Corporation, 63 see also CDC-6600 computer Convex hull, defined, 49 Counting, representation and, 42-43 Critique of Judgment (Kant), 216 Curve detection, 35-36 iterative, 38-40 Curve segmentation, 46-48 Curve tracking, in sequential segmentation, 37-38
CYBER 176, 76
D Database management systems, in Soviet Union, 248 Debugging in “completely debugged program,” 135 error rates and, 136-141 Democracy and Education (Dewey), 213 Difference coding, 13 Digital picture, 2 Digitation defined, 2-3 work involved in, 3-4 Directionality spectrum, 46
297
SUBJECT INDEX Distortion, in pattern matching, 36-37 Dither coding, I5 DOWES system, Soviet Union, 240 DUCHESS chess program, 78, 82, 87-90, 98-99,104,109
E Edge detection in image processing, 29, 33-34 in picture segmentation, 33-34 Education, computer-assisted instruction in, 175- I85 see also Computer-assisted instruction Educational Testing Service, 189 8080 CHESS program, 1 I 1 Elementary education, computer-assisted instruction in, 175-185 Elementary mathematics, PLAT0 system in, 176-177 Elementary reading, computer-assisted instruction in, 178-179 Elementary-school children, letter recognition by, 203-206 Elongatedness, 49 ELORG centers, Finland, 273 ELWRO-Service, Poland, 262 Endgame play, in computer chess, 100-106 Error rates, in software technology, 136-141 ES- 1030 minicomputer, 239 ’I)*, relation with ql,146-148 Q*, use of in prediction, 148-150 EXCHECK system, 212
Fourier power spectrum, 50-51 Fourier transform coding, 14 Fourier transforms, in deblurring process, 23 Fuzzy segmentation, 38-40 Fuzzy techniques, 29, 38-40
G Geometric distortion, correction for, 20 Geometric normalization, 50 Geometric properties, in image processing, 49-5 I Geometric transformation, in image enhancement, 18-19 German Computer Chess Championship match, 69 GOST, in Soviet Union, 257 Gray-level-dependent properties, in image description, 51-54 Grayscale modification, 17-18
H H a u s d o a maximal principle, 21 1-212 High-emphasis frequency filtering, 24-25 Histogram flattening, 17-18 Hole border, defined, 44 Homomorphic filtering, in noise cleaning, 21 Honeywell 6050 computer, 62, 66 Horizon effect, in computer chess, 93-94 “How the West Was Won” game, 218
F
I
FIDE (Federation Internationale des cchecs), 61 n. Finite depth minimax search, in computer chess, 93 Foreign languages, computer-assisted instruction in, 199 FORTRAN algorithm in, 120 in computer chess, 1 1 I uf in. 148-150 problems in, 151-152 in software science, 121 in Soviet Union, 236 “well documented program” in, 153-154
IBM S/360 system, in Soviet Union, 239-240 IBM S/370 system, in Soviet Union, 243 ICAI, see Intelligent computer-assisted instruction Image see also Picture busyness of, 32, 52-53 enhancement of, see Image enhancement Fourier transform of, 12 projection of, 26 restoration of, see Image enhancement transforming of, 12-15 Image fpproximation, 10-12 Image coding, 8-9
SUBJECT INDEX Image enhancement, 16-28 deblumng in, 23-26 defined, 2 geometric transformations in, 18-19 grayscale modification in, 17-18 high-emphasis frequency filtering in, 24-25 inverse filtering in, 23 noise cleaning in, 19-23 reconstruction from projections in, 26-27 tomography as, 26-28 Wiener filtering in, 23 Image processing, 1-55 chain codes in, 45-46 definitions in, 1-8 description in, 48-55 pattern matching in, 34-37 representation in, 40-48 Image recognition, defined, 2-3 Image reconstruction, defined, 2 Institute of Cybernetics (Kiev), 248 Institute of Precise Mechanics and Computer Engineering (Moscow), 244 Institute of Theoretical and Experimental Physics program, 60, 100 Instruction, computer-assisted, see Computer-assisted instruction Intelligent computer-assisted instruction, 2 13-2 I4 see also Computer-assisted instruction examples of, 217-221 research in, 214-217 specialists’ knowledge in, 2 16-217 weaknesses of, 221-222 Interframe coding, 16 Inverse filtering, 23 Inverse Fourier transforming, 23 in pattern matching, 36 ITEP program, 60, 100 Iterative curve detection, 38 Iterative deepening, in computer chess, 97
J Japan, educational technology in, 185 Japanese software, in Soviet economy, 274 Jet Age of computer chess, 62-69
K KAISSA chess opening library, 99 KAISSA program, 61 Kalman filtering, in noise cleaning, 23 Killer heuristic, in computer chess, 97 Kotok-McCarthy program, in computer chess, 94, 100
L Language arts, computer-assisted instruction in, 182-183 Language level, defined, 125 Learning, in software science, 158-161 Leningrad State University, 265 Letter recognition, by elementary-school children, 203-206 Lines of code, extension of program language to, 130-132 LISP program, in Soviet Union, 237 Logic and set theory, computer-assisted instruction in, 191-198
M Mac Hack VI program, in computer chess, 60, 84, 94, 98 MASTER program, 73 Mastery, time required for, in software science learning, 158-160 MAT, see Medial axis transfer Matched filter theorem, 35 Mathematical information theory, “information content” of, 124 Mathematical proofs, computer-assisted instruction in, 207-212 Mathematics strands, in computer-assisted instruction, I8 1-182 Mating tree, 83 Medial axis transform, 45 Mental effort hypothesis, in software science, 129-130, 163 MESM computer, in Soviet Union, 233 Microcomputers, in computer chess, 110113 Minimax search algorithm, in computer chess, 92-93 Moire patterns, 4-5 Moscow Aviation Institute, 265
SUBJECT INDEX Moscow State University, 265-266 Motion detection, 37
N National Science Foundation, 189 Natural-language processing, CAI in, 20020 1 Net vocabulary ratio, in software science, 156-158 Nichomachean Ethics (Aristotle), 216 Noise cleaning, in image enhancement, 19-22 North American Computer Chess Championship, 71
0 Ohio State University, computer-assisted instruction at, 191-193 OLTEP program, Soviet Union, 241 On Liberty (Mill), 175 Operators, rank-ordered frequency of, 143 ORWELL program, 73 OSTRICH chess program, 73, 87, 11 1-1 12 Outer border, defined, 44
P PASCAL machine language, 151-152 Pattern matching distortion in, 36 in image processing, 29, 34-37 as segmentation, 34-37 Paul Masson Chess Classic, 70 PDP-I0 computer, 68 Perimeter, defined, 49 Phaedrus (Plato), 174, 225 Picture see also Image autocorrelation in, 50 brightness or color variation in, 2 busyness of, 32, 52-53 detection of lines and curves in, 35-36 digital, see Digital picture moments of, 52 relationships among regions of, 54 Picture segmentation, 28-40 local property values in, 33
299
PIONEER chess program, 81, 83 Pixel(s) in area measurement, 49 classification and clustering of, 28-33 in connectedness, 41-42 defined, 2 in difference coding, 13 in noise cleaning, 19, 21-23 in sequential segmentation, 37 in “skeletons,” 45 in spurious resolution, 6 thresholding method for, 29-30 PLATO biology course, 186-189, 202 PLATO “How the West Was Won” game, 218 PLATO mathematics course, 176-179 Poland, ELWRO-Service firm in, 262 Postsecondary education, computerassisted instruction in, 185-199 Potential language, as concept, 123 Potential volume, 122 Principal continuation, in computer chess, 93 Program clarity, measures of, 133-136 Program maintenance, 134-135 Programming see also Computer program clarity in, 133-136 computer-assisted instruction in, 198 in English prose, 154-158 in Soviet Union, 248 Programming rates vs. project size, 132-133 Properties, geometric, see Geometric Properties Pseudocolor enhancement, 17
Q Quantization defined, 6 false contours and, 6-7 tapered, 6
R Radio Mathematics Project, 220 Reading, computer-assisted instruction in, 182 Region properties, measurement of, 48-55
300
SUBJECT INDEX
Regions, relationships among, 54 Registration, in pattern matching, 37 Relaxation methods, curve detection in, 38-40 Representation, connectedness in, 4 1-42 Resolution, spurious, 6 RIBBIT program, 62-64 Run length coding, 10 Runs, representation of, 42-43 Ryad computer system, Soviet Union, 236-237, 240-244, 251, 256-258, 269271, 273-274, 279
S Sampling, 4-6 defined, 4 Sampling theorem, 4 School children, letter recognition by, 203206 Scientific Research Institute for Electronic Computers (Minsk), 242 Secondary education, computer-assisted instruction in, 175-185 Segmentation edge detection in, 33-34 fuzzy. 38-40 in image processing, 28-40 pattern matching in, 34-37 pixel classification in, 29-33 sequential, 37-38 Semantic partitioning, in software science, 153-154 Sequential segmentation, curve tracking in, 37-38 Sequential techniques, 29, 37-38 Set theory, computer-assisted instruction in, 191-198 Shannon-Fano-Huffman coding, 9-10, I5 Shape complexity, measurement of, 49 SIMULA 67, in Soviet Union, 237, 247, 274 Skeletons, representation by, 45-46 SNOBOL, in Soviet Union, 237, 247 Software, Soviet, see Soviet software Software analyzer, 154 Software science see ulso Computer program; Programming advances in, 119-168 basic metrics in, 120-121
boundary volume in, 146 clarity in, 133-136 defined, 119-120 error rates in, 136-141 extension of to “lines of code,” 130-132 grading student programs in, 150-153 implementation level in. 123-125 lack of arbitrary constants in, 166-168 language level in, 125 learning and mastery in, 158-161 measurement techniques in, 141-143 mental effort and, 129-130, 163-164 modularity hypothesis in, 162-165 net vocabulary ratio in, 156-158 operators and operands in, 141 potential volume in, 122-123 programming rates vs. project size in, 132-133 rank-ordered frequency of operators in 143- I46 relation between 7,and q2 in. 146-148 semantic partitioning in, 153-154 technical English in, 154-158 text file compression in, 161-162 “top down” design of prose in. 162-166 United States vs. Soviet Union in, 252554 vocabulary-length equation in, 126-128 volume in, 122 SOPHIE system, 219 Soviet bureaucracy, computer and, 249-250 Soviet computers models of, 234-235 shortcomings of,235 software in context of, 249-256 Soviet hardware see ulso Soviet computers; Soviet Union deficiency correction in, 245 marketing activity and, 260 since 1972, 239-249 Soviet software, 23 1-281 automated control/management systems in, 246-247, 255-256 autonomy in, 253 Communist Party and. 260-261 computer models and, 234 control of technology transfer in, 275-278 Cyrillic vs. English in, 241 deficiency correction in, 247
301
SUBJECT INDEX development process in, 261-265 documentation in, 264-265 economic system and, 279-280 European sources of, 273-274 external sources for, 273-275 improvements needed in, 254, 280-281 internal diffusion and, 256-261 Japanese sources of, 274 maintenance of, 264 manpower development for, 265-268 MESM computer and, 233 producer-client get-together in, 26 1-262 programming languages in, 236 regulation and control of, 276-277 requirements specification in, 262 since 1972. 239-249 Soviet economic system and, 249-256 Survey Of, 233-249 system design and implementation in, 263 systemic difficulties and, 251 technology transfer in, 268-278 testing of, 263-264 university education and, 265-267 United States sources of, 271-274 upgrading and distribution of, 252 Western influence on. 268-275, 279 Soviet Union ALGOL in, 247 ALGOL-60 in, 236. 245 ASUs in, 246-247. 255-256 central control in. 251 COBOL in, 257 computer consciousness level in, 267 computer models available in, 235-235, 239-249, 260 computer publications and conferences in, 259-260 computer use in, 234-235, 255-256, 278 database management systems in, 248249 FORTRAN in, 236 IBM products in, 239-240 Japanese software and, 274 management use of computers in, 255-256 minicomputers in. 279 programming in, 235. 255 Ryad project in, 236, 240-244, 251, 253, 256-258, 269-27 I , 273-274, 279 software in, see Soviet software
software technology transfer in, 268-278 S/360 computer system in, 271 time-sharing in, 237, 247 Unified System in, 256-257 Stanford University, computer-assisted instruction at, 193-198 Stereomapping, 37 Strands strategy, in computer-assisted instruction, 179- I8 1 Strips, in skeleton representations, 46 Student, modeling of in computer-assisted instruction, 2 12-222 Student errors, analysis of, 215
T TECH chess program, 97 TECHMASHEXPORT program, Soviet Union, 244 Technical English, in software science, 154-158 T E L L program, 73 Template matching, see Pattern matching TENEX operating system, 202 Text file compression, in software science. 161- 162 Texture edge detection, 35 Thin bridges, erasure of, 42 Thinning process, in representation, 45-46 TICCIT project, in computer-assisted instruction, 189-190, 202 Tomography, 26-28 Transform coding, 14 Transformations, 14-15 Transposition tables, in computer chess. 98 TREEFROG chess program, 64,66. 68, 7 1-72 Tree-searching techniques alpha-beta algorithm in, 94-97 alpha-beta window in. 97-98 forward pruning and, 94 iterative deepening in, 97 killer heuristic in. 97 method of analysis in, 98 minimax search algorithm and, 92-93 transposition tables in, 98 Tsentroprogrammsistem Scientific Production Association (Kalinin), 253
302
SUBJECT INDEX
U Undersampling, 4 Unified System of Computers (Soviet Union, 239 United States software science in, 252 and Soviet software technology transfer, 271-274,280-281
V Venn diagrams, 192 Videodisks, in computer-assisted instruction, 223 VINITI program, Soviet Union. 259
Vocabulary-length relation relative errors and, 165 text file compression and, 161-162 Volume, concept of in computer program, 122 VOTRAX system, 204
W Wiener filtering, 23 WITA chess program, 68 Wumpus game, computer approach to, 2 17-2 18
X X chess program, 68
Contents of Previous Volumes Volume 1
General-Purpose Programming for Business Applications CALVIN C . GOTLIEEI Numerical Weather Prediction A. PHILLIPS NORMAN The Present Status of Automatic Translation of Languages YEHOSHUABAR-HILLEL Programming Computers to Play Games ARTHURL. SAMUEL Machine Recognition of Spoken Words RICHARDFATEHCHAND Binary Arithmetic GEORGE W. REITWIESNER Volume 2
A Survey of Numerical Methods for Parabolic Differential Equations J I MDOUGLAS, JR. Advances in Orthonormalizing Computation PHILIP J. DAVISAND PHILIP RAEIINOWITZ Microelectronics Using Electron-Beam-Activated Machining Techniques KENNETH R. SHOULDERS Recent Developments in Linear Programming SAUL1. GLASS The Theory of Automata, a Survey ROBERTMCNAUGHTON Volume 3
The Computation of Satellite Orbit Trajectories SAMUEL D. CONTE Multiprogramming E. F. CODD Recent Developments of Nonlinear Programming PHILIP WOLFE Alternating Direction Implicit Methods RICHARDS. VARGA,A N D DAVIDYOUNG GARRET BIRKHOFF, Combined Analog-Digital Techniques in Simulation HAROLD F. SKRAMSTAD Information Technology and the Law REED C. LAWLOR Volume 4
The Formulation of Data Processing Problems for Computers WILLIAM C. MCGEE
303
CONTENTS OF PREVIOUS VOLUMES All-Magnetic Circuit Techniques A N D HEWITTD. CRANE DAVIDR. BENNION Computer Education HOWARD E. TOMPKINS Digital Fluid Logic Elements H.H. GLAETTLI Multiple Computer Systems WILLIAMA. CURTIN Volume 5
The Role of Computers in Electron Night Broadcasting JACKMOSHMAN Some Results of Research on Automatic Programming in Eastern Europe WLADYSLAW TURKSI A Discussion of Artificial Intelligence and Self-Organization GORDONPASK Automatic Optical Design ORESTES N. STAVROUDIS Computing Problems and Methods in X-Ray Crystallography L. COULTER CHARLES Digital Computers in Nuclear Reactor Design ELIZABETH CUTHILL An Introduction to Procedure-Oriented Languages HARRY D. HUSKEY Volume 6
Information Retrieval CLAUDE E. WALSTON Speculations Concerning the First Ultraintelligent Machine IRVINGJOHNGOOD Digital Training Devices R. WICKMAN CHARLES Number Systems and Arithmetic L. G A R NER HARVEY Considerations on Man versus Machine for Space Probing P. L. BARGELLINI Data Collection and Reduction for Nuclear Particle Trace Detectors HERBERT GELERNTER Volume 7
Highly Parallel Information Processing Systems JOHNC. MURTHA Programming Language Processors RUTHM. DAVIS The Man-Machine Combination for Computer-Assisted Copy Editing W A Y N E A. DANIELSON Computer-Aided Typesetting WILLIAMR. BOZMAN
CONTENTS OF PREVIOUS VOLUMES
305
Programming Languages for Computational Linguistics ARNOLDC. SATTERTHWAIT Computer Driven Displays and Their Use in MadMachhe Interaction ANDRIESVAN DAM Volume 8
Time-shared Computer Systems THOMAS N . PYKE,JR. Formula Manipulation by Computer JEANE. SAMMET Standards for Computers and Information Processing T. B. STEEL,JR. Syntactic Analysis of Natural Language NAOMISAGER Programming Languages and Computers: A Unified Metatheory R. NARASIMHAN Incremental Computation LIONELLO A. LOMBARDI Volume 9
What Next in Computer Technology W. J. POPPELBAUM Advances in Simulation JOHNMCLEOD Symbol Manipulation Languages PAULW. ABRAHAMS Legal Information Retrieval AVIEZRIS. FRAENKEL Large Scale Integration-an Appraisal L. M. SPANDORFER Aerospace Computers A. S. BUCHMAN The Distributed Processor Organization L. J. KOCZELA Volume 10
Humanism, Technology, and Language CHARLES DECARLO Three Computer Cultures: Computer Technology, Computer Mathematics, and Computer Science PETERWEGNER Mathematics in 1984-The Impact of Computers BRYANTHWAITES Computing from the Communication Point of View E. E. DAVID, JR. Computer-Man Communication: Using Computer Graphics in the Instructional Process FREDERICK P. BROOKS, JR.
306
CONTENTS OF PREVIOUS VOLUMES
Computers and Publishing: Writing, Editing, and Printing ANDRIES V A N DAMA N D DAVIDE. RICE A Unified Approach to Pattern Analysis ULF GRENANDER Use of Computers in Biomedical Pattern Recognition ROBERTS . LEDLEY Numerical Methods of Stress Analysis WILLIAM PRAGER Spline Approximation and Computer-Aided Design J. H.AHLBERG Logic per Track Devices D. L. SLOTNICK Volume 11
Automatic Translation of Languages Since 1%0 A Linguist’s View HARRYH. JOSSELSON Classification, Relevance, and Information Retrieval D. M. JACKSON Approaches to the Machine Recognition of Conversational Speech KLAUSW.OTTEN Man-Machine Interaction Using Speech DAVIDR. HILL Balanced Magnetic Circuits for Logic and Memory Devices R. B. KIEBURTZ A N D E. E. NEWHALL Command and Control: Technology and Social Impact ANTHONY DEBONS Volume 12
Information Security in a Multi-User Computer Environment JAMESP. ANDERSON Managers, Deterministic Models, and Computers G. M . FERRERODIROCCAFERRERA Uses of the Computer in Music Composition and Research HARRYB. LINCOLN File Organization Techniques DAVIDC. ROBERTS Systems Programming Languages R. D. BERGERON. J. D. GANNON, D. P. SHECHTER, F. W. TOMPA.A N D A. V A N DAM Parametric and Nonparametric Recognition by Computer: An Application to Leukocyte Image Processing JUDITHM . S. PREWITT Volume 13
Programmed Control of Asynchronous Program Interrupts RICHARD L. WEXELBLAT Poetry Generation and Analysis JAMESJOYCE Mapping and Computers PATRICIA FULTON
CONTENTS OF PREVIOUS VOLUMES
307
Practical Natural Language Processing: The REL System as Prototype FREDERICK B. THOMPSONA N D BOZENAHENISZTHOMPSON Artificial Intelligence-The Past Decade B. CHANDRASEKARAN Volume 14
On the Structure of Feasible Computations J. HARTMANIS A N D J. SIMON A Look at Programming and Programming Systems T . E. CHEATHAM, JR., A N D J U D YA. TOWNELY Parsing of General Context-Free Languages SUSAN L. GRAHAM A N D MICHAEL A. HARRISON Statistical Processors W. J. POPPELBAUM lnformation Secure Systems I. BAUM DAVIDK. HSIAOA N D RICHARD Volume 15
Approaches to Automatic Programming ALANW. BIERMANN The Algorithm Selection Problem JOHNR. RICE Parallel Processing of Ordinary Programs DAVIDJ . KUCK The Computational Study of Language Acquisition LARRYH. REEKER The Wide World of Computer-Based Education DONALDBITZER Volume 16
3-D Computer Animation CHARLES A. CSURI Automatic Generation of Computer Programs NOAHS. PRYWES Perspectives i n Clinical Computing KEVINC . O’KANEA N D EDWARDA. HALUSKA The Design and Development of Resource-Sharing Services in Computer Communications Networks: A Survey SANDRA A. MAMRAK Privacy Protection in Information Systems REINT U R N Volume 17
Semantics and Quantification in Natural Language Question Answering W. A. WOODS Natural Language Information Formatting: The Automatic Conversion of Texts t o a Structured Data Base NAOMISAGER
305
CONTENTS OF PREVIOUS VOLUMES
Distributed Loop Computer Networks MINGT. LIU Magnetic Bubble Memory and Logic T I E NCHICHENA N D Hsu CHANG Computers and the Public's Right of Access to Government Information ALANF. WESTIN