A dva nces in COMPUTERS VOLUME 5
Contributors to This Volume
CHARLESL. COULTER ELIZABETH CUTHILL HARRYD. HUSKEY JACKMOSHMAN GORDONPASK ORESTESN. STAVROUDIS TURSKI WLADVSLAW
Advances in
COMPUTERS edited by
FRANZ L. ALT National Bureau of Standards Washington, D.C.
MORRIS RUBINOFF University of Pennsylvania and Pennsylvania Research Associates Philadelphia, Pennsylvania
associate editors A. D. BOOTH R. E. MEAGHER
VOLUME 5
Academic Press New York
London lQ64
COPYRlGHT 8 1964, BY ACADEMIC PRESS, INC. ALL RIGHTS RESERVED NO PART OF THIS BOOK MAY BE REPRODUCED IN ANY FORM, BY PHOTOSTAT, MICROFILM, RETRIEVAL SYSTEM, OR ANY OTHER MEANS, WITHOUT WRITTEN PERMISSION FROM THE PUBLISHERS.
ACADEMIC
P R E S S , INC. 1 1 1 Fifth Avenue, New York, New York 10003
United Kingdom Edition published by ACADEMIC PRESS, INC. (LONDON) LTD. Berkeley Square House, London WIX6BA
LIBRARY OF CONGRESS
CATALOG CARD
NUMBER: 59-15761
Secorid Printing, 1969 PRINTED IN THE UNITED STATES OF AMERICA
Contributors to Volume 5
CHARLESL. COULTER,National Institutes of Health, Bethesda, Maryland ELIZABETH CUTHILL,David Taylor Model Basin, Washington, D.C. HARRYD. HUSKEY,University of California, Berkeley, California and Indian Institute of Technology, Kanpur, India JACKMOSHMAN, C - E - I - R INC., Washington, D.C. GORDON PASK,System Research Limited, Richmond, Surrey, England ORESTES N. STAVROUDIS, National Bureau of Standards, Washington, D.C. WLADYSEAW TURSKI,Computation Center, Polish Academy of Sciences, Warsaw, Poland
V
This Page Intentionally Left Blank
Preface
The survey articles included in this volume have been selected with two aims in mind: to arrive a t a balanced sampling of the computer field, and to emphasize the subjects of most active current interest. Nothing could be timelier, in a volume appearing a few months before a. presidential election in the United States, than the article on the role of computers in the broadcasting coverage of election results. The author, who played a prominent role in the behind-the-scenes preparations for the computer’s appearance on one of the major networks in the election night of 1960, gives an unbiased presentation of the various methods which can be used in predicting complete election results from the partial returns received in the course of the evening, of the difficulties encountered, and of the degree of success attained by the computer predictions. Another topic which has been much debated in the last few years is the state of computer development in the Soviet Union and its neighbors. The article on Automatic Programming in Eastern Europe sheds welcome light on this question. Also on the subject of automatic programming, the typical article on Procedure-Oriented Languages is designed to introduce the reader to the concepts and terminology of computer languages and to survey the state of the art; the article is a companion piece to the one on the Formulation of Data Processing Problems in the previous volume of this series. Artificial intelligence and self-organizing systems form another area. of intense current interest. The article presented here is a systematic unified treatment of these problems (artificial intelligence being considered as a special property of a self-organizing system). The extensive bibliography is likely to prove a valuable adjunct to the paper. To round out the coverage of different aspects of the computer field, there are articles on applications of computers to the design of optical instruments, to nuclear reactor design, and to the determination of the structure of crystals or molecules from X-ray diffraction patterns. These ale problems for which computers have long been used on a large scale, but where significant progress has been made in the recent past. Here again the bibliographies-especially the one on reactor design computation, which includes almost 400 items-are likely to be of essential service to the readers. June, 1964
FRANZ L. ALT MORRISRUBINOFF vii
This Page Intentionally Left Blank
Contents
CONTRIBUTORS TO VOLUME 5 . . . PREFACE . . . . . . . . . CONTENTSOF VOLUMES1. 2. 3. AND 4
. . . . . . . . . . . . . . . . . . . . .
v vii xiii
The Role of Computers in Election Night Broadcasting JACK MOSHMAN
1. 2. 3. 4.
Introduction . . . . . . . . . . . . . . Oddities Plaguing the Model Builder . . . . . . . Sources of Data . . . . . . . . . . . . . Communications on Election Night . . . . . . . 5. The Mathematical Model . . . . . . . . . . 6. Combining the Estimates . . . . . . . . . . 7 . An Example . . . . . . . . . . . . . . 8. National Estimates . . . . . . . . . . . . 9 . Estimated Turnout . . . . . . . . . . . . 10 . Other Elections . . . . . . . . . . . . . 11. Other Applications . . . . . . . . . . . . 12. The Future . . . . . . . . . . . . . . . 13. A Report of T V Monitor of Election Night Coverage for the 1960 Presidential Election, November 8-9. 1960 . .
1 2 3 5
5 9 10 11 12 13 14 14 15
Some Results of Research on Automatic Programming in Eastern Europe WCADYStAW TU RSKl
Introductory Notes . . . . . . . . . . . . . 1 . Soviet Union . . . . . . . . . . . . . . 2. Poland . . . . . . . . . . . . . . . . 3 . Other Countries of Eastern Europe . . . . . . . . 4. Survey of Methods of Programming . . . . . . . . Appendix 1 . Example of a Lyapunovian Program . . . . Appendix 2 . Kindler’s Algorithm for Programming of Arithmetic Formulas . . . . . . . . . . . . References . . . . . . . . . . . . . . . .
23 24 68 88 100 102 103 105 ix
CONTENTS
A Discussion of Artificial Intelligence and Self-organization GORDON PASK
1 . Introductory Comments . . . . . . . . . . . 110 2 . The Characterization and Behavior of a Self-organizing System . . . . . . . . . . . . . . . 116 3 . Artificial Intelligence . . . . . . . . . . . . 165 4 Other Disciplines . . . . . . . . . . . . . 204 5 . The Interaction between Men and Their Intelligent Artifacts 208 Glossary . . . . . . . . . . . . . . . 214 References . . . . . . . . . . . . . . . 218
.
Automatic Optical Design
.
ORESTES N STAVROUDIS
1. 2 3. 4.
Introduction . . . . . . . Tracing . . . . . . . Classical Methods of Lens Design . The Computer Applied to Lens Design References . . . . . . . .
. Ray
. . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
227 231 233 238 252
Computing Problems and Methods in X-Ray Crystallography
.
CHARLES L COULTER
1 . Introduction . . . . . 2 . General Computational Methods 3 . Available Programs . . .
References
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
257 270 283 284
Digital Computers in Nuclear Reactor Design ELIZABETH CUTHILL
1. 2. 3 4. 5
. .
X
Introduction . . . . . . . . . . . . . Development and Classification of Nuclear Reactor Codes Neutron Transport Equations . . . . . . . . Solution of the Neutron Transport Problem . . . . Other Calculations . . . . . . . . . . . References . . . . . . . . . . . . . .
. . . . .
.
289 291 297 306 326 326
CONTENTS
An Introduction to Procedure-Oriented Languages
.
HARRY D HUSKEY
1. 2. 3. 4. 5.
6
.
7. 8.
. . .
9
10. 11 12. 13 14. 15
.
Introduction . . . . . . . . . . . . . The Evolution of Computer Languages . . . . . . . . . . . . . A Typical Digital Computer A Language for Describing Computers . . . . . A Simple One-Address Computer . . . . . . . A Square-Root Example on the One-Address Computer Relocatability . . . . . . . . . . . . . An Assembly Program . . . . . . . . . . The Square-Root Example in Assembly Language . . An Algebraic Language Translator . . . . . . . Alternative Methods of Translation . . . . . . Algorithmic Languages . . . . . . . . . . Comparison of Features of Algorithmic Languages . . Some Special Languages . . . . . . . . . . Summary . . . . . . . . . . . . . . References . . . . . . . . . . . . . .
Author Index Subject Index
. . . . .
.
.
. . .
. .
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
349 350 353 353 357 358 360 361 362 363 367 368 369 374 375 376 379 391
xi
Contents of Volume 1
General-Purpose Programming for Business Applications CALVIN C. GOTLIEB Numerical Weather Prediction NORMAN A. PHILLIPS Thc Present Status of Automatic Translation of Languages YEHOSHUA BAR-HILLEL Programming Computers to Play Games ARTHURL. SAMUEL Machine Recognition of Spoken Words RICHARDFATEHCHAND Binary Arithmetic GEORGEW. REZTWIESNER
Contents of Volume 2
A Survey of Numerical Methods for Parabolic Differential Equations JIMDOUGLAS, JR. Advances in Orthonormalizing Computation PHILIP J. DAVISA N D PHILIP RABINOWITZ Microelectronics Using Electron-Beam-Activated Machining Techniques R . SHOULDERS KENNETH Recent Developments in Linear Programming SAULI. GASS The Theory of Automata, a Survey ROBERTMCNAUGHTON xii
Contents of Volume 3
The Computation of Satellite Orbit Trajectories SAMUELD. CONTE Multiprogramming E. F. CODD Recent Developments in Nonlinear Programming PHILIPWOLFE Alternating Direction Implicit Methods GARRETT BIRKHOFF, RICHARDS. VARGA, AND DAVIDYOUNG Combined Analog-Digital Techniques in Simulation K. SKRAMSTAD HAROLD Information Technology and the Law REED C. LAWLOR
Contents of Volume 4
The Formulation of Data Processing Problems for Computers WILLIAMC. MCGEE All-Magnetic Circuit Techniques DAVIDR. BENNIONAND HEWITTD. CRANE Computer Education HOWARDE. TOMPKINS Digital Fluid Logic Elements H. H. GLAETTLI Multiple Computer Systems WILLIAMA. CURTIN xiii
This Page Intentionally Left Blank
Advances in COMPUTERS VOLUME 5
This Page Intentionally Left Blank
The Role of Computers in Election Night Broadcasting JACK MOSHMAN C-E-I-R INC. Washington, D.C.
1. 2. 3. 4. 5. 6.
7. 8. 9. 10. 11. 12. 13.
Introduction . . Oddities Plaguing the Model Builder . Sources of Data . . . Communications on Election Night The Mathematical Model . . Combining the Estimates . * An Example . . National Estimates . . Estimated Turnout . Other Elections . . Other Applications. . The Future . . A Report of TV Monitor of Elcction Night Coverage for the 1960 Presidential Election, November 8-9, 1960. .
. .
.
1 2
3 5
5 9 10 11
12 13 14 14
15
1. introduction
The American public became aware of electronic digital computers on election night of 1952 when UNIVACI was used by the Columbia Broadcasting System to help project early presidential returns and estimate the outcome of the first Eisenhower-Stevenson election. Computers have been used in each Congressional and presidential election since, as well as in the analysis of many gubernatorial and local contests. The use of compu'ters has become so accepted that one network was induced to reverse its original decision to eliminate this facet of its election night program; advertisers were reluctant to sponsor the program without the computer. The mathematical models on which projections are founded vary in their.detai1 from group to group and from election to election. This paper describes some of the features to be found in some or all of the 1
JACK MOSHMAN
models. It should not be construed that the following description is necessarily representative of any one program. 2. Oddities Plaguing the Model Builder
It is worthwhile to note several curiosities about election data. The number of precincts a t which polling will take place is not precisely known in advance of the election. It is not unusual for last minute actions to take place which will combine several precincts into one, or to split one precinct into several. Precinct designations and boundaries change from one election to another. This requires that analysis of precinct patterns for several elections to be preceded by an exhaustive study to insure that the precinct designation continually encompassed the same area and that the characteristics of the population residing in the area underwent no sudden or drastic change. The evolution of a neighborhood from lower middle-class, single-family dwellings to luxury apartments would obviously have an important impact on voting behavior. State-wide patterns of reporting are important in the projection process. These have reasonable, but not exact, stability over time, other things being equal. Greatest corrections typically are applied during the early reporting phase. Maine, for example, historically reveals a bias exceeding 15% in favor of the Republicans when 5 to 10% of the precincts have reported. This bias decreases to zero as more precincts report. A complementary pattern is to be found in returns from Illinois ; the bias favors the Democratic Party. In some cases, Florida for example, an early Democratic bias is replaced by a smaller Republican bias which is gradually damped to zero. When voting machines are introduced into an important area, reporting from that area will be accelerated. Figure 1 shows how a typi-
0
0.25
0.50
0.75
I .oo
Froctlon of prscinch raportad
FIQ.1. Typical state reporting patterns,
2
ROLE OF COMPUTERS IN ELECTION NIGHT BROADCASTING
cal pattern may be distorted by the introduction of voting machines. Note that the abscissa of the graph in Fig. 1 is the fraction of precincts reported and not the fraction of the vote reported. To within a small error, the former can be computed a t any time on election night; the latter metric is subject to much greater variation. Another factor of importance, with a lesser frequency of occurrence, is a shifting of the boundary of a time zone. Such a shift moved Knoxville, Tennessee, from Central to Eastern time in the mid 1940’s. The entrance of Alaska and Hawaii into the Union as states has spread the polls over a major segment of the earth. Few people realize that Nome, Alaska, is on Bering Standard Time, seven hours behind Eastern Standard Time. I n 1960 the first poll to close was in Waterville Valley, New Hampshire, a t 12:lO A.M. EST and the latest in Nome at 8:OO P.M. BST or 3:OO A.M. EST the following day in New York.
3. Sources of Data Two basic types of data sources are available to model builders. Time-dependent voting statistics are available from previous elections in an aggregated manner. Generally, for any specific contest, wire news services and broadcasting media will report the number of precincts from which returns have been received a t each of many times during the night, and the number of votes cast for each candidate. This information generally is reported for an entire state or some major metropolitan area such as Cook County, Illinois. The other basic data source for many past elections is a complete detailed tabulation of the final vote in each election precinct with subtotals by county and state. Possibly a subtotal by other political subdivision may also be available. Supplementary information also exists about many demographic, ethnic, economic, and social characteristics for each state and, in many cases, smaller political units. Among the most useful published sources used in 1962 were: (1) Bureau of the Census, “Congressional District Atlas of the United States.” Government Printing Office, Washington, D.C., 1960, Supplement No. 1, January, 1962.
Atlas contains state maps showing county and Congressional District boundaries and major cities, plus detailed maps of selected counties and CongressionalDistricts. Supplement contains information (as of January 1, 1962) on redistricting based on 1960 Census of Population.
3
JACK MOSHMAN
(2) Bureau of the Census, “Congressional District Data Book (87th Congress).” Government Printing Office, Washington, D.C., 1961. Data for United States as a whole, by states, and by Congressional Districts: area (1960); population (1960) by race, age and sex, marital status, household and family status; vote cast by major party, for President (1960, 1956, 1952), for Representative (1960, 1958, 1956, 1954, 1952); housing (1960) by tenure and vacancy status, condition, number of rooms, density of occupancy. Data for “whole-county” Congressional Districts (districts comprising one or more entire counties, and at-large districts): vital statistics (1959); bank deposits (1960); agricultural statistics (1959, 1954); statistics on retail trade, wholesale trade, selected services, and manufactures (1958); mineral industries (1958, 1954); local government (1957); taxable property (1956). Data on nonwhite population for selected Congressional Districts (all districts in 14 southern states and all other districts in which nonwhites constitute 10% or more of the total population): population and housing data as in first paragraph above, but for nonwhites only. Apportionment of membership in House of Representatives, by states, census years 1790-1960. Population of smallest and largest Congressional Districts of the 87th Congress, by states, 1950 and 1960.
(3) “Congressional Quarterly Almanac (87th Congress, 1st Session),” Vol. XVII. Congressional Quarterly, Inc., Washington, D.C., 1961. Winning Senators’ and winning Ropresencatives’ percentage of total vote (the latter compared with winning Presidential candidate’s percentage), by states, 1952-1960. Official returns in 1960 election for President, Representatives, Senator and Governor, all candidates (actual count, percentage of total vote, and plurality), by states and Congressional Districts. Official rctums in off-year and special elections, 1959 and 1E60. Campaign expenditures reported for 1960 elections, by House and Senate candidates; list of major contributors with amounts contributed. Also contains detailed information on activities of both Houses, including roll-call votes. (4) “Congressional Quarterly, Weekly Report,” Congressional Quarterly, Inc., Washington, D.C., 1962. Assorted timely information on dates, candidates, demographic characteristics and redistricting. Also provides analyses and interpretive reports of selected contests and regional. (5) Richard M. Scammon (ed.), “America Votes,” Vol. 1, 1956; Vol. 2, 1956-57; Vol. 3, 1958; Vol. 4,1960. Governmental M a i m Institute, Washington, D.C.
4
ROLE OF COMPUTERS IN ELECTION NIGHT BROADCASTING
4. Communications on Election Night
On election night, computers have become an important elementbut only one element-in vast news gathering chains designed in a somewhat different manner by each news service, radio, and TV network to bring in, analyze, and present election results to the public in short order. The news beat is to the one who makes the closest projection earliest and not to the one who tallies up the final vote. Precinct stringers relay information to state or district news headquarters over zealously guarded telephone lines as soon as the backs are removed from voting machines or as ballots are counted. Wire news service teletypes, some fed with data totaled by computers, pass the information on to hundreds of newspapers and into TV news rooms. Local stations maintain open or “hot” telephone lines and closed TV circuits with network centrals. Stringers a t bellweather or key precincts may call across the continent directly to central data collection points. The main computers are fed huge volumes of data in the necessary input formats and simultaneously check, cross-verify, and process information received from many different sources. Computer input data in the form of punched cards or paper tape may be prepared a t the computer site, if remote from the television studio. Frequently, in the past, the original input medium was prepared a t the studio, where most conventional news reports were directed, Cards (or tape) were then sent to the remote computer site by wire channels such as card transceivers or Daspan links. The type of information processed by the computer, of course, is determined by the capacity of the computer a n d the mathematical model developed for the current election. 5. The Mathematical Model
A valid mathematical model must recognize the peculiarities of the electoral college system. The number of electors from each state is equal t o the number of Representatives and Senators sent by that state t o the Congress. An exception is the District of Columbia, which was awarded three electors by the 23rd Amendment. Although not legally bound t o do so, electors from each state are presumed t o cast their ballots for the presidential candidate receiving a plurality of votes cast in their state. Fundamentally, an estimate of the projected proportion of the vote for the candidate of Party A in state i when a fraction f of the precincts have reported may be represented as
Pif = W p $ ) +
W,1”,2’
+ W,p13),
(5.1) 5
JACK MOSHMAN
where P, is an over-all estimate of the projected vote in state i when a fraction f of precincts in state i have reported; Pi;) is an estimate based on the returns in state i only; Pi2)is a pre-election estimate based on polls, press, and informed judgment; Pi3)is an estimate for state i based on national trends; and W, are normalized weight functions ( h = 1, 2, 3). The intrastate estimate Pi;) is generally composed of two components. One represents the actual vote percentage and the other is a correction which depends on f. One may write
We let A , be the vote cast for Party A by a fraction f of the total number of precincts in state i, and B, is the vote for Party B’s candidate. It is assumed that we are concerned with a two-party race. The correction term Yif may be evaluated in various ways. One possibility is t o consider a function such as was shown in Fig. 1. The ordinate can be taken to be I”.d =
P*(’) d - P$),
(5.4)
the difference between the experienced vote at fractional precinct reporting f and the final vote. I n this event
Yif = - Pi!
9
(5.5)
and the pif function must, in some way, be retained in the machine memory for each state. A simpler approach is to let where pi is derived by the least Ecquares fit of pif on (1 - f) for state i. Where the relationship is linear, both corrections coincide. The greater the degree of nonlinearity, the greater the disparity between corrections. The use of the /Ii requires storing only one coefficient per state. The estimate Pi‘’ is obtained prior to the election. It may be a composite of published or private polls, the informed opinion of political scientists on the press, or any source. The term Py) may appear in various guises. One approach is to define a term y which measures the extent to which the fractional vote differs from an estimate of the Party A vote obtained by extrapolating a linear fit t o the party vote over previous elections t o the current one. 6
ROLE OF COMPUTERS IN ELECTION NIGHT BROADCASTING
For example, Fig. 2 shows a plot of Party A votes in state i by X’s. If we let Yi,be the extrapolation of the solid line fitted to previous elections’ data, then
is an estimate of the national departure from historical trend. The summation is taken over all states i for which f > 0 and coif is a normalized weight.
T-4k“.
T-12
T-8
T-4
T
Year
FIQ.2. Party A trend in State i.
Another related technique lies in the use of key precincts. The key precincts may be chosen by one of three criteria: (1) The precincts should be barometric with respect to the national pattern. Barometricity is the property of a precinct to have its vote division be within a specified tolerance of matching the percentage split of the vote nationally. A properly barometric precinct will possess this property for more than one election. (2) Precincts selected should be “swingometric.” Barometric precincts are always swingometric, a less restrictive property than the former. Swingometric precincts are those whose “swing’) from one election to the next with respect to the same political party mirror the national swing. By “swing,” political scientists refer t o the difference in the percentage of the popular vote allocated to a political party from one election to the next comparable one. (3) Precincts may be selected t o be a heterogenous collection. The intent is to include precincts which include as a dominant, or major, force each identifiable population group which may constitute a voting bloc. Groups may be defined in terms of race, geography, economic status, ethnic background, and other factors. 7
JACK MOSHMAN
No matter which criterion, if any, is followed, it is important that key precincts selected on the basis of historical statistics be investigated t o verify that their status or composition has remained static. Furthermore, to be useful, the selected precincts must report early. Early reporting is a function of the poll closing time and of the existence of voting machines and, to a lesser extent, of precinct size. Let Vht be the vote of the hth precinct in year t. If H precincts were selected, then
is an estimate of swing in the current year t as compared to the previous election year t‘. If the precincts are of the first or second types described above, the normalized weight w h = 1/H. If the precincts are representative of the various components of the voting population, w,, equals the proportion of the total population that is represented by the hth precinct. I n any event one may then take
where V,,, is the actual vote in state i in year t’. Equation (5.9) states that the third estimate is basically one obtained by applying national swing estimates from early reporting precincts to the previous election returns from that state. Where possible, W , (h = 1, 2, 3) are the normalized inverses of the variances of P$), P/’), and Pi3),respectively. Depending on procedures used, variances may be estimated directly from the live data or may be supplied on the basis of historical evidence. For any reasonable procedure, it is necessary that (5.10)
and lim W , = lim W , f+1
=
0.
(5.11)
f+l
One may note also that estimates may exist for each state prior t o the election and that these estimates are subject to modification as early returns filter in even if these early returns are from other states, exclusively. A further correction may be applied by making use of known demographic and economic information. For example, one may define
ROLE OF COMPUTERS IN ELECTION NIGHT BROADCASTING
to be the difference between actual running vote and the pre-election estimate. Let wif 2 0 be a weight function with the properties wio = 0,
(5.13) (5.14)
From (5.14) it follows that (5.15)
since the denominator is a monotonic increasing function off. Let tAi( A = 1, 2, . . . ,A ) be a numerical measure of property A. For example, for h = 1, g,, may be the median income in state i. For h = 2, tAimay be the proportion of nonwhite voters in state i. Assume that one may express
APif = Then estimates of the mizing
+
ulfli
uA( A =
+
uzfzi
+
*
*
*
+
aA4Ai.
(5.16)
0 , 1, . . . , A ) may be obtained by mini(5.17)
%
taking the summation over all states and the District of Columbia for which f exceeds some threshold. Denoting by A P , an estimate of APi using (16)) and letting p y = P*(l) if + (5.18) then (5.1) may be modified to read
Pif = WIfP$)+ WZfPj*)+ w3fP:3)+ WPfP$’.
(5.19)
I n this event, (5.11) must be modified to read lim W , = lim W , = lim W , = 0. f+1
f+1
(5.20)
f-tl
6. Combining the Estimates
The values assigned to the W , ( h = 1, 2, 3, 4) were inverses of the variances of their factors, respectively, normalized by the sum of these reciprocals. We will call the variances uh2,where uI2is a function off, uZ2 is generally a constant, u32may be a function off or H depending on 9
JACK MOSHMAN
the definition of Pi3),and uq2is generally dependent on J& '.
Thus
i
Denote the variance of Pifby 2
=
Then
F(
1
l/o*2)
is a conservative estimate. Then if
an estimate of the probability that the candidate of Party A will be victorious in state i is
7. An Example Suppose that 30% of the precincts in State X have been reported. The reported vote is 52% for Party A . Pre-election estimate of the Party A vote was 53%. Historical records show that when 30% of the precincts have been reported, the Party A vote is underreported by 4%. From the foregoing information we have P'$i) = 0.52 corresponding to f = 0.30. The correction for historical voting patterns uz,3= 0.04 so that P$:i = 0.52 0.04 = 0.56. The pre-election estimate gives Pg)= 0.53. The value of y based on K < H key precincts was 0.02 which, added to 0.52, the final result of the previous election, provided a value of P$)= 0.54. Finally, the demographic analysis showed AP, = -0.01, SO that Pd$4J = 0.52 - 0.01 = 0.51. For each estimate, standard errors are available. The estimates, their standard errors, and variances may be conveniently displayed as in the accompanying tabulation.
+
10
h
P:;'
%
OAP
Wh,
1 2 3 4
0.56 0.53 0.54 0.51
0.03 0.05 0.04 0.08
0.0009 0.0025 0.0016 0.0064
0.49 0.17 0.27 0.07
ROLE OF COMPUTERS IN ELECTION NIGHT BROADCASTING
The final column is based on (6.1).Simple arithmetic provides
Px.3= C Wh.3P$.\ = 0.546. h
from (5.19)and
u(K",
=
c
1 (1/0*2)
= 0.000437
h
from (6.2),so that
~ ( x= ) d0.000437 = 0.021 is a conservative estimate of the standard error of Px,3. Now, from (6.3), ux.3 =
0.546 - 0.5 0.021
=
2.19;
so that, following (6.4),
exp ( - t 2 / 2 ) d t
= 0.9857
is the probability estimate that state X will give its electoral votes to Party A . 8. National Estimates
To obtain the probability that Party A will obtain a majority of the electoral vote, consider the generating function
dY) = 7 {PiY"
+ ail,
(8.1)
where Eiis the electoral vote in state i ; qi = 1 - p i ; and, for convenience, we call, E = C Ei the total vote of the Electoral College. i
Now, by expanding (8.1)and collecting like powers of y,one may write
Then ri is the probability that Party A will obtain exactly j electoral votes. Let [XI be the smallest integer larger than x. Then if
E*
= [E/2], I2
JACK MOSHMAN
is the probability that Party A will win, assuming always a two-party race. By symmetry, the probability that Party B will win is
if E is odd. If E is even, E*
P,
=
CTj, j=O
Another measure of interest is g A ,the expected electoral vote to be garnered by Party A:
and similarly
8, = E
- 8,
(8.8)
is the vote expected to be obtained by Party B's candidate. Upper and lower confidence limits, L, and L,, for b, with confidence coefficient 6 are obtained by defining L, to be the largest integer for which E
CTj > (1 - 6)/2 j=L,
(8.9)
and L,to be the smallest integer such that (8.10)
Finally, the odds favoring a Party A victory are (P,/P,) to one. 9. Estimated Turnout
Various procedures have been used to estimate the total turnout on Election Day. For national elections, provision must be made to estimate the turnout in some states based on partial returns in other states. I n doing this, one can utilize historical patterns of the fractional vote reported as a function of the fraction of precincts reported. For reported states one may define vJ as the mean fraction of the total vote in state i which has been reported when a fraction f of the precincts have reported. Then an estimate of the total vote in state i is
fit')
=
Est(A.
21
+ B.a1)
=A -,
+--. B, VV
(9.1)
ROLE OF COMPUTERS IN ELECTION NIGHT BROADCASTING
An estimate for unreported states might be based on historical correlations between Hil, the total turnout, and registration figures W i in that state, or some national registration norm W,. If the estimated relation is then (9.1) and (9.2) provide independent estimates of the turnout. A combined estimate is H . = alfHl1) aZfHi2)
+
Qlf
+
9
(9.3)
Q2f
where at (i = 1, 2) are appropriate weights. For consistency u10 =
0
(9.4)
and lim aZf= 0.
(9.5)
f+1
A somewhat different estimate, having a similar basic philosophy, is
where the summation is taken over all states i for which fi exceeds some threshold, the prime refers to the previous election, and g(fA7)is an appropriate function of f,, the national fraction of precincts reported. 10. Other Elections
Elections for local or state-wide offices may be regarded as special cases of the presidential race. State estimates are generally applicable to these contests. Gubernatorial elections are generally decided on the basis of, a t most, a composite of an analog of P$)and Pi;)as defined following Eq. (5.1). Congressional and Senatorial contests may include a P$) and Pi;) analog also. The index i may be subjected to obvious changes of interpretation. For the House of Representatives, the estimated composition by political party may be obtained by using the generating function in (8.1) where each Eiis unity and the multiplication takes place over the 435 House districts. More simply, the estimated number of seats to be held by Party A is 435
8,
=
CPi,
(10.1)
i-1
13
JACK MOSHMAN
a.nd Party B will capture
s,
= xqi = 435 i
- s,.
(10.2)
For the Senate, one recognizes that only about one-third of the seats are a t risk in any election. One may formally let i run from 1 to 100, corresponding to all seats, but define p i to be 1 or 0, for seats not a t risk, according to whether Party A or Party B has the affiliation of the incumbent. 11. Other Applications
Incoming votes were checked for gross errors and consistency both before posting and prior to processing in the projection models. Among the checks made were: (1) valid codes and formats, (2) votes for each party and number of reported precincts t o be
greater than comparable number in the previous message, (3) total vote from an area does not exceed total registrations, (4) the percentage vote split between candidates does not change too sharply, (5) average vote per precinct is reasonable. Where a message was rejected, i t was generally referred to a member of the staff for a manual check. In addition to vote projections, computers have successfully been employed to analyze the reported vote. Prior election results for selected precincts or districts were pre-stored. These were matched with the actual vote recorded on election night. Precincts were selected based on previous voting records and their social, ethnic, economic, and geographical characteristics. Comparisons between current and past votes were available on demand by various combinations of characteristics. Computers have also been used to tally the incoming vote and to post totals on the display boards used in the television studios. 12. The Future
It appears t o be a safe prediction t o expect computers t o be used for election programs for quite some time. Theinnate,andpossiblyinane, desire of the general television viewer for early projections of the final vote will continue to highlight this aspect of the computer’s role. 14
ROLE OF COMPUTERS IN ELECTION NIGHT BROADCASTING
Greater and greater emphasis will be placed on the analytical and explanatory functions that can be performed. It is likely that tandem operations will see a high-speed machine providing projections and a lower-speed system with a large capacity random-access memory supplying the interpretive aids. Future developments in the form of direct data inputs from either standard teletype tapes or optical scanning devices are likely. The state of the art permits direct display of totals and summarizations; their utilization is quite possible. Automatic charting of voting trends and other graphical outputs are certainly technically feasible, but unlikely to be used because of a real or fancied impression that this would not be grasped and understood by the “average)’ viewer. Gimmickry in the form of an oral output from the machine may be introduced a t some time. It is unlikely that we will see the day when each voter’s selection is registered directly into the store of a central national computer, bypassing completely the massive counting and communication processes. 13. A Report of TV Monitor of Election Night Coverage for the 1960 Presidential Election,l November 8-9,1960
The following represents a report from monitors of the Election Night telecasts of the Columbia Broadcasting Company (CBS), American Broadcasting Company (ABC), and National Broadcasting Company (NBC). The CBS show was built around an IBM 7090. Comment was provided by Walter Cronkite and Howard K. Smith. A RemingtonRand UNIVACI was the basis of the ABC reporting. Individuals appearing on camera were John Daly, Don Goddard, and Max Woodbury. On the NBC telecast a RCA-501 was used. Comment was provided by Richard Harkness, Chet Huntley, and David Brinkley. What follows is a simultaneous chronology of projections and related comments which are felt to be of some interest t o the reader. I n t’he early hours of the evening, even when a winner was projected, the numerical recapitulation of electoral and popular vote a t times showed an “Undecided” category, the basis for which was generally based upon network policy. Either no reports were received at all from certain states or the reports were based upon very few votes. Later in the evening all networks eliminated the undecided category and alloThis unaudited report was compiled by Winston Riley, 111. The information presented herein is extracted from Riley’s compilation, which appeared a8 an unpubreport. liehed C-E-I-R
15
JACK MOSHMAN
cated the vote for each state whether or not returns had been received. Individual state reports which appeared frequently on the CBS and NBC networks are omitted from the summary. The CBS report also included analytical reports of the vote from identifiable geographic, ethnic, and socio-economic groups. It was CBS policy to show the popular vote split rounded to integral percentages. Considering the cl.osenessof the race, small but significant differences were obscured. I n retrospect, the use of odds was universally deplored. By some individuals, odds were equated to the closeness of the result rather than the confidence with which a projection of over 50 percent of the electoral vote will hold through the evening. The precision and arithmetic used in the RCA-501 resulted in a maximum confidence of 0.997. When translated to an odds ratio, this resulted in 332.3 to 1, a figure often quoted to the chagrin of the writer and the C-E-I-R-RCA team. For the record, the final results gave Mr. Kennedy 303 electoral votes to Mr. Nixon’s 219. Fifteen votes were cast for other candidates, The major party split of the popular vote was 50.1 yo for Kennedy and 49.9% for Nixon.
16
6:58 The IBM 7090 was introduced. Several pan shots were shown. It was emphasized that the machine only considers data given it; projections, not predictions, are to be provided.
7:37 On the basis of 1 % of the precincts reporting, the computer projects a Nixon election, which could be as much as 459 electoral votes. This is 2% better than Eisenhower this same time last election. Odds for Nixon given at 2 t o 1.
7:04 On the basis of 0.3 of 1% of precincts reporting, UNIVAC gave odds of 10 t o 1 for Kxon.
Nixon Kennedy Undecided
6 5 8 The RCA-501 was introduced. “Projections, not predictions” theme was given. Pan shots with technicians milling about were shown.
Electoral Vote 275 103 159
5
C
z rn I-
; -
0 Z
8:16 On the basis of 4% of precincts reporting by 8:12, the 7090 gives the odds a t 1 1 to 5 for a Kennedy victory.
Electoral vote 240 Nixon Kennedy 297
Popular vote States 49% 23 51% 27
-r 8:25 Odds of 6.3 to 1 for Kennedy quoted. Electoral Popular vote vote States Nixon 102 48.9% 16 Kennedy 187 51.1% 15 Undecided 248 19
$
Time
Time
Time
(EST)
CBS
ABC
8:46 On the basis of 2.3% of the
9:04 On the basis of 7% of the precincts reporting a t 9:00, odds are 49 to 1for Kennedy
Electoral vote Nixon 212 Kennedy 325
Popular vote States 48% 52%
precincts reporting, UNIVAC gives Kennedy the election with 7 to 5 odds. Electoral vote Nixon 188 Kennedy 216 Undecided 133
8:37 Kennedy odds now 3 to 1. Electoral Popular vote vote States Nixon 105 49.4% 16 Kennedy 151 50.6% 13 Undecided 281 21
9:05 Odds are 22 to 1 for Kennedy.
Electoral vote
20
30
Nixon Kennedy
188 349
Popular vote 48.6% 51.4%
9:20 Current odds are 15 to 1 for
Kennedy. 9:30 On the basis of 11yo of precincts reporting, 13 to 5 for Kennedy. Kennedy is now 6% better than
Electoral vote Nixon Kennedy
198 339
Popular vote 48.7% 51.3%
Stevenson a t the same time last election. 9:40 Leonard Hall, Chairman of the
Republican National Committee, is interviewed and states that the computer predictions have bounced from 6 to 1 to 20 to 1 to 15 to 1. “I think we ought to throw these computers in the junk pile &s far as elections go.”
9:42 Odds quoted a t 25 to
1 on
Kennedy. Electoral vote Nixon Kennedy
190 347
Popular vote 48.5% 51.5%
10:04 5 to 1 odds for Kennedy, with 276 electoral votes for Kennedy.
10:09 Leonard Hall, interviewed, says that the computers are gyrating back and forth. “I hope I never have to follow them.” (See CBS announcement a t 9:40).
1O:lO On the basis of 13% of the precincts reporting by 1O:OO: Electoral Popular vote vote States 190 49% 21 Nixon Kennedy 340 51% 29 “Hall’s statement that the odds are flip-flopping is true, because the computer is data and programming dependent,” said the commentator. 10:34 On the basis of 22% of the precincts reporting: Electoral Popular vote vote States 226 49% 21 Nixon Kennedy 311 51% 29 2
\o
10:05 Odds are 332.3 to 1 for Kennedy. Electoral Popular vote vote Nixon 129 47.4% Kennedy 408 52.6%
p
0, rn
2
n 0
$ A
g -
z
rn r-
10:20 Odds 250 to 1 for Kennedy. Electoral Popular vote vote Nixon 153 48.4% Kennedy 384 51.6%
10:34 Odds of 100 to 1 for Kennedy, with 307 electoral votes for Kennedy.
10:34 Odds 333 to 1 for Kennedy.
Electoral vote 134 Nixon Kennedy 400
Popular vote 48.1 yo 51.9%
EJ
Time
0
(EST)
Time
CBS
(EST)
ABC
Time (EST)
NBC
11:05 Odds are 90 to 1 for Kennedy.
Electoral voteZ Nixon Kennedy
174 355
11:35
Nixon Kennedy
Popular vote 4s.9y0 51.1%
Electoral votee
Popular vote
216 313
49% 51%
11:47 Odds 6.5 to 1 for Kennedy.
Nixon Kennedy
Electoral vote2
Popular vote
230 299
49.1% 50.9%
12:05 Odds 331 to 1 for Kennedy.
Electoral votez Nixon Kennedy
192 337
Popular vote 48.8~~ 51.2%
12:13 On the basis of 45% of the pre-
cincts reporting, the odds are 100 to 1 for Kennedy. Electoral Popular vote vote States 208 49% 25 Nixon Kennedy 329 51% 25
12:42
Electoral vote2 Nixon Kennedy
200 329
Popular vote 48.9% 51.1%
i; n x
1:04
Electoral vote2 Nixon 199 Kennedy 330
Popular vote 48.9% 51.1%
;a
F rn g
c)
0 1 :36
2:05
3:03
$
Electoral voteZ Nixon 188 Kennedy 341
Popular vote 48.8%
Electoral vote2 Nixon 188 Kennedy 341
Popular vote 48.8% 51.2%
2z
Popular
z0
Electoral vote2 Nixon 192 Kennedy 337
51.2%
vote
z ? I
5
48.8%
51.2%
$ 0 c1
Commencing at 11:05 p.m., the NBC reports showed a total of 529 electoral votes. Eight unpledged electors, elected in Mississippi, $I -I were deleted from the totals. z 0
This Page Intentionally Left Blank
Some Results of Research on Automatic Programming in Eastern Europe WLADYStAW TURSKI Computation Center Polish Academy of Sciences, Worsaw, Poland
Introductory Notes . 1. Soviet Union . 1.1 Fundamentals of Programming 1.2 Development of “Programs Which Program” 1.3 Some Results of Optimization of Translators 1.4 Soviet Work on ALGOL . 1.5 Non-Lyapunovian Approach 1.6 Special Purpose Autocodes . 1.7 Use of Topology in Programming 1.8 “Philosophy” of Automatic Programming 2. Poland 2.1 Historical Remarks and Underlying Philosophy 2.2 Work of the SAKO Group 2.3 KLIPA . 3. Other Countries of Eastern Europe . 3.1 KlouEek-VlEek Symbolic Language for Economic Data Processing 3.2 SedlBk’s Program for Investigation of Solutions of Differential Equations 3.3 Kalmhr’s Detachment Procedure . 4. Survey of Methods of Programming . Appendix 1: Example of a Lyapunovian Program Appendix 2 : Kindler’s Algorithm for Programming of Arithmetic Formulas . References
.
.
.
.
.
.
.
.
.. . . . . . . .
. . . . . . . . . *
23 24 24 33 42 48 52 58 62 66 68 68 71 80 88 88
.
92 95 100 102
. .
103 105
Introductory Notes
The author of the present article considers it impossible to cover in a single paper all interesting aspects of the vast area of research on automatic programming in Eastern Europe. Therefore the reader will kindly accept the author’s apologies for inadequate description, omission of many interesting points, and, frequently, too brief a dis-
23
WtADYStAW TURSKI
cussion of papers mentioned. It is hoped that the reader will consider the present article as a kind of summary rather than a complete survey. As for references that may be made to the relevant literature in English the author suggests that Professor Carr’s report on a visit to the Soviet Union [5] should be consulted for general impressions;l one of the sections of Professor Chorafas’s book [6] provides an excellent though short analysis of the main trends in Soviet research, and Capla’s paper 131 present& many interesting data on hardware. It is however our duty to warn the reader that all three references are based on somewhat obsolete information. The reference list appended t o the present article is necessarily incomplete, though it includes a few more items than are referred to in the text. For the papers having English translations, corresponding references are given; however, no systematic search has been conducted in order to find all translations available. Two Western [a, 71 and one Soviet [68] reference papers are listed and any of them may be consulted for missing papers. Also, a recent paper [19a] should be noted, which is pertinent to this discussion. There was no attempt either to make comparison with relevant Western research or to find out mutual influences, except when a problem was of obviously international character, e.g., ALQOLcompilers. Finally, if in relating somebody else’s work the author has committed blunders or overlooked misprints, he wishes to take the blame for such oversights. All opinions expressed are the author’s only, and do not necessarily agree with those of his colleagues or superiors. 1. Soviet Union 1.1 Fundamentals of Programming
On considering Soviet achievements in automatic programming one can easily notice that a great part of the work done on the problem is connected in one way or another with Professor Lyapunov of Moscow University. The main results of the research carried out under Lyapunov’s guidance and influence up to 1953 were published for the first time in the first volume of Problemy Kibernetiki (Problems of Cybernetics), where papers by Lyapunov, his collaborators, and his pupils predominate, [27, 41, 42, 46, 55, 721. It seems that apart from several very interesting exceptions, discussed in Section 1.5, all major work on automatic programming in ’See also, Contemporary Soviet Computers, Datamation 9, No. 11, 2 P 3 0 (1963).
24
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
the USSR is done in notation, and uses concepts and definitions conceived by Lyapunov’s team. Since, moreover, Lyapunov’s paper [all provides a very convenient departure point for our sight-seeing tour of Soviet developments in our problem, we shall begin our venture with a brief discussion of that remarkable paper, leaving for later pages the analysis of the different approaches to the ever-exciting question of how to cause a soulless machine to perform the ingenious task of converting mathematical thoughts into sequences of binary codes. Computational procedures involved in solving various problems on digital computers may generally be considered as algorithms for information processing. The initial data constitute the information to be processed; the results constitute the processed information. Such information-processing algorithms consist of more or less uniform stages which are executed by the computer one after another in some ordered sequence. Each stage itself may be considered to perform some strictly defined information processing. We shall say that each of these stages is performed with the help of an operator. Consecutive performance of operators will be called product of operators. It often happens that the order of execution of some operators depends on the results of performance of other operators; thus there arises the necessity of having operations that check the fulfillment (or otherwise) of logical conditions which govern the order of execution of operators. These conditions are frequently represented by predicates.’” Finally, a third set of elements is formed by so-called steering parameters which are used to indicate repetitive use of some operators. The number of repetitions depends on the value of these parameters. All algorithms are to be built up of these three kinds of elements: operators, logical conditions, and steering parameters. This is achieved by means of calculational schemes, i.e., products of operators and logical conditions. Each logical condition, which may be simply a predicate, is followed by an arrow indicating transfer of control in case the condition is not fulfilled. If the condition is fulfilled, the next (in the left-to-right sense) operator is being executed. I n Lyapunov’s notation arrows are broken in two parts: one, pointing up, occupies the place immediately after “This expression is in a smse very troublesome; the corresponding Russian term npegukarn has been translated into English by other authors as: logical variable [I21 or logic-algebra function [ 7 l ] . It is hoped that the adopted term, predicate, will cause no ambiguity; in addition, the following remark may be useful: a predicate is a function of some logical or arithmetical relation, defined in such a way that, if the relation is true, the predicate assumes value one, otherwise the value is zero. Predicates are closely related to claueee in ALGOL.
25
WtADYStAW TURSKI
the logical condition to which it belongs, the other part, pointing down, precedes the operator, or another logical condition as the case may be, which is to be obeyed, or checked, if the original condition happens to be not satisfied. Arrows are identified by numbers written above the principal line occupied by the calculational scheme. This may be observed in the example given in Appendix 1. I n the calculational schemes (which are independent of any hardware) operators are represented by capital letters with subscripts indicating dependence on steering parameters. A product of operators may be recorded in the condensed form: n
A , . A, * A ,
*
* *
An = II Ai. i-1
Logical conditions are represented by small letters, and predicates take the form of functions whose arguments are the conditions t o be checked. Predicates may be of one of the following four types:
0) P(1.I
2
pl)>
(ii) P(a Q b ) , (iii) p ( a = b ) , (iv) P(a # b ) ,
where a and b are variables whose values should be known before evaluation of the predicate is initiated. Lyapunov frequently uses a rule which says that, if non-fulfillment of a condition leads to skipping of just one operator, arrows are omitted altogether in the calculational scheme. Arrows without identifying numbers may be used with the understanding that transfer of control is made from an unnumbered up-arrow to the first following downarrow. I n order to illustrate this notation we shall consider two examples.
Example 1. Let the operator A , generate and print out integer k2. The following three schemes are equivalent and represent the calculational scheme for printing (a) squares of even numbers if p 8 is a logical condition which is satisfied when s is even, or (b) squares of odd numbers if pd is a logical condition which is satisfied when s is odd: n
n
i-1
i=l
2)1?A141)2fA2J.. * .Pn?An = n PCTA~J = II p.iA.i.
(1.1.1)
Example 2. Let us consider a calculational scheme for solving simultaneous linear equations: A a j i x i = ujs+l, i-1
26
j = 1 , 2 , . . . ,n.
(1.1.2)
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
Assuming that the diagonal elements of the matrix are not zeros we may solve the set (1.1.2) by the following scheme: n
n
n+l
n
(n I3 B i j p ( i = j ) CkI3= l Aijk)mII= lDm i=l j=1
( 1.1.3)
where operator Bij generates c = ajJaii, operator A,, generates ajk = aik- caik and transfers it into the location previously occupied by ajk, operator Dm generates x,,,= lamm, operator C replaces c by zero, and parentheses have their customary mathematical meaning. Since the digital computers possess finite memories it is essential to make the computational schemes as short as possible. This aim is achieved by executing similar operators, i.e., operators differing by the value of steering parameters only, by the same pieces of code. That is to say, we would like to record in the machine memory only the initial form of the operators forming the scheme, and make the machine not only execute them but also prepare re-execution of some of the operators. For this purpose Lyapunov introduces several types of control operators. Their role consists in preparing the machine memory for execution of consecutive operators and necessary control transfers. The following list is, in Lyapunov’s opinion, complete enough to secure solution of quite complicated problems: ( 1 ) readdressing operators, (2) restoring operators,
(3) transfer operators, (4) forming operators, ( 5 ) parameter change operators, (6) parameter introduce operators, ( 7 ) operators for switching logical conditions.
For some types of programs it will be essential to introduce freely additional logical conditions which will secure the desired sequencing of operators. Lyapunov insists on separation of two similar terms: computational scheme and programming scheme. I n his terminology a computational scheme deals with abstract operators; a programming scheme deals with the programs (or pieces of code) which realize these operators. Thus his definition of programming scheme may be formulated as follows. A programming scheme for a given problem is a product of program realizations of operators and logical conditions (that is, a realization of their product) which possesses the following property: When all these realizations of operators (and logical conditions) are fed into a computer together with the needed initial data, automatic process27
WLADYSLAW TURSKI
ing of recorded information will be initiated and will not be stopped until the desired solution is found. At the beginning of the work of each operator, the machine memory will be in a condition which makes the performance of the operator possible. This somewhat diffused definition may be explained by the rule for obtaining programming schemes from calculational ones. The rule reads as follows: To obtain the programming scheme one should furnish the corresponding calculational scheme with (i)control operators which will provide such conditions of machine memory as are necessary for execution of successive calculational operators and (ii) logical conditions which will secure desired sequencing of calculational operators. Now we shall consider the purposes of control operators in more detail. Readdressing operator is a generic name given to pieces of code which change the address parts of some of the program instructions, viz., in those instructions that are relevant to parameter-dependent operators. The changes thus introduced are prescribed by changing values of parameters. Readdressing operators are denoted by capital F and the following convention is obeyed: F(i) increases the value of the parameter i by 1, while F(ki) or Fk(i) increases the value of the parameter i by k. A notation like F(3i, 5j), equivalent to F(3i) * F(5j), is sometimes used in order to point out that it may be possible to change both parameters by the same instruction. Restoring operators form a subclass of readdressing operators singled out for their ability to restore the initial value of a parameter. Lyapunov does not introduce any special symbols for the restoring operators. Transfer operators transfer numerical data from one location to another; in symbolic notation they assume the form [a+b], where a denotes data t o be transferred and b the location to which this data is t o be transferred. (Using often the same symbol to represent both contents and location, Lyapunov intenticinally smooths over the difference between the two.) Forming operators is the name given to pieces of code which generate the initial form of some operators of the program. These operators transfer previously set instructions, or combinations of them, into prescribed locations in the body of the program, thus forming a new operator from separate pieces. A forming operator may sometimes be used instead of a restoring one, this being especially convenient when the number of readdressings t o be performed on a parameter is not known in advance. Generally, if the forming operator generates the operator B, the 28
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
notation @ ( B )is used to denote this fact; when, however, the forming operator is used as a restoring operator, a symbolism similar to transfer operators is employed; e.g., if the forming operator is to restore the initial value s of the parameter i we shall write {s+i). It may be of some interest to note that, while restoring operators are likely to perpetuate minor mutilations of a program already recorded, forming operators are more likely to avoid this fault. Parameter change operators are generalized readdressing operators, the generalization consisting in application rather than in formal structure, viz., parameter change operators are used when not only readdressing controlled by a parameter is desired but also introduction of the numerical value of altered parameter into arithmetic formulas. Notation is the same in both cases. Sometimes it happens that the value of a parameter becomes known as a result of execution of some operators. Then, a necessity arises for having special parameter introduce operators which would introduce the value of a parameter thus obtained into operators which depend on it. The forming operators quite often may be considered as introducing initial values of parameters. I n the preparation of programs for fairly complicated problems, a situation may occur where logical conditions governing control transfers may depend on results of previously performed calculations; e.g., depending on results obtained, we may need to check either the condition 01 or the condition p. Moreover, it may happen that not only the choice of conditions, but also eventual transfer of control may depend on previously obtained results. I n such cases the operators for switching logical conditions are used. Those operators are most frequently realized by means of transfer of previously stored pieces of code which may be readdressed or changed in any other way if necessary. I n Appendix 1 the interested reader will find an example of a rather complex programming scheme. Let us now consider the so-called standard Lyapunov operators [27]:
A, arithmetical (computational) operator; P, logical operator (predicate or logical condition); F, readdressing operator; 9, forming operator; 0, restoring operator. All other operators will be called nonstandard operators and denoted by H. Standard operators are, in a sense, homogeneous and connected. 29
W tADYSt AW TURS KI
The homogeneity of standard operators consists in the functional similarity of all machine instructions composing the code representation of a given operator; that is to say, either all instructions perform some arithmetical operations, or all of them are readdressing instructions, etc. I n other words, the task of the entire group of instructions realizing a given operator may be formulated by a single “command.” The group of instructions forming the machine representation of an operator is said to be connected because it consists of a string of instructions with one “entrance” only, and all these instructions follow each other tightly, with no “empty” locations; there are no alternative paths inside of one group of instructions. These two properties are extremely helpful not only in programming but, and especially so, in debugging procedures, when a program written in machine code is analyzed, since they make it possible to break the program into pieces corresponding to original operators. Moreover, some new types of standard operators were found on the basis of such an analysis (cf., e.g., checking procedures described in Section 1.2). It is perhaps worthwhile to observe that Lyapunov’s notation, although meant to be machine-independent, is obviously pertinent to one-address machines. This becomes particularly clear when one compares Lyapunov’s notation (or its modification due t o Yu. I. Yanov [72]) with the notation adopted in [3112.For the sake of consistency we shall henceforth abandon Lyapunov’s notation and adopt Yanov’s, thoroughly described in [31, 381. The main difference between them lies in the symbolism employed to denote “jumps.” Yanov uses small a instead of Lyapunov’s small p and replaces arrows by so-called left and right strokes, L and _I. The notation Atla LAt2 * * _J A,, means 2
-
m
Z
m
that, if 01 = 1, control is transferred to A,, , while if u = 0, control is transferred to At3. The notation used in [27], obviously designed for three-address computers, differs from that of Lyapunov and Yanov in the following manner. The ith logical condition (or “predicate”) is followed by an n
open bracket [ with two numbers, denoting the ordinal numbers of m
operators to which the control is to be transferred; the upper number is the ordinal number of the operator to be executed if the preceding logical condition is satisfied, otherwise the control is transferred to the operator designated by the lower number. The mth and nth operators are to be preceded by closed half-brackets, _I and a
BNotation similar to this is adopted by Ershov [IZ].
30
z
1 , respectively. For
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
example, letting S represent either A or P (in the sense of Lyapunov’s 3
2
standard operators), S,1S,S,c _IS, means that, if S3 happens t o be true, 4
3
the next operator t o be executed is S,, otherwise S , . I n order to make this notation more concise, in some cases the lower parts of open brackets and the entire corresponding closed brackets are omitted. This is always so if after the ith logical condition the lower number is i 1; e.g., the 3
+
2
scheme just mentioned may be rewritten S,lS,S,rS,. I n Soviet literature three types of repetitive calculations are distinguished: (1) iterative cycle ]A,P,L, 2
1
(2) cycle with readdressing ]A,F,P,L, 3
1
(3) cycle with readdressing and restoring JA,F,P,LO,. 3
1
I n all these cases A, denotes the computational part of a cycle; F, denotes readdressing; P,, P, are conditions checking whether the cycle is completed; and 0, is a restoring operator. It is of interest to note that most Soviet-made digital computers possess built-in facilities for all three types of cycles. Generally speaking, Lyapunov recommends a multilevel preparation of programs, viz., to begin with a programmer should split the algorithm into a few big parts and conduct a thorough study of “how these pieces work together”. Then, the parts should be split into operators in order to provide a calculational scheme. The last stage of “intelligent” work consists in writing down the programming scheme. Afterwards, the “mechanical” job of coding should be performed in order to produce the machine code for the given problem. This approach to programming, so simple in principle, becomes much more complicated when one tries to optimize (in any sense) the programming scheme. Lyapunov has pointed out that there exist a t least two different methods for simplifying and optimizing programming schemes. One method consists in formulating formal criteria of goodness of programming schemes and inventing formal rules of transforming schemes into equivalent but simplified ones; another method, strongly advocated by Lyapunov himself, consists in material transformations of programming schemes. The second method requires a certain amount of ingenuity on the programmer’s part and rather fair knowledge of the mathematics behind the algorithm, hence this method is in a sense useless as a possible basis for automatic programming. I n the paper under considerc
31
WtADYStAW TU RSKl
ation, Lyapunov gives a number of examples which have become standard in Soviet nonautomatic programming technique. The first method has been thoroughly studied by Yu. I. Yanov in a series of extremely interesting papers [70, 71, 721. This study, closely related to both Lyapunov ideas of programming and A. A. Markov’s theory of algorithms [47],is a splendid example of far-reaching research, conducted with the help of the most modern apparatus of formal logic. Unfortunately Yanov’s work is outside the scope of the present survey, and thus we shall not discuss it in any detail. One remark, however, may be made without formal discussion; viz., judging from frequent quotations of the series in Soviet and foreign papers on the theory of autoprogramming this work represents not only a formal but also a very powerful practical tool for the authors of autocodes. Before closing this section, we present one more fundamental concept due to the Soviet school of automatic programming, namely the so-called “logical scale.”3 Let us suppose that the calculational scheme is divided into n stages each containing a predicate p assuming one of two possible values 0 or 1 depending on the ordinal number of the stage currently under execution. Let the machine memory cells be of L bits, then s consecutive cells of the memory will be called a logical scale for the program if they satisfy three conditions: (i) s > n / L . (ii) There exists a possibility of counting individual bits in the array of s cells according to the principle that bits 0,1, . . . , L - 1 belong to the first cell, bits L, L+ 1, . . . , 2L - 1 t o the second, etc. (iii) All bits corresponding to those stages in which p = 1 are filled with unities, all others being zeros. Thus, once such a scale is recorded in machine memory, it easily may be used to govern all control transfers dependent on p . There are, generally speaking, two possible ways of using logical scales. The first one is to introduce a digital probe, i.e., a number represented by unity on the first bit location of a cell and zeros on all others. The Boolean product of the digital probe and the first cell of the logical scale is not zero if (and only if) p = 1 for the first stage of calculations. Hence, if a computer is provided with a built-in check for zero accumulator, organization of transfer control becomes rather trivial. For the following 3Unfortunately, I was not able to dctcrmine when and where this concept appeared in print for the first time, though I am told by Dr. Grcnicwski that this device was commonly used in Poland and Czechoslovakia as early as 1956. Soviet authors, e.g. [32], [ a l l ,unanimously consider M. R. Shura-Bura as being the inventor of the logical scale.
32
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
stages the unity in the digital probe is automatically shifted by one bit location to the right after accomplishment of the current stage. The second way of using the logical scale is more efficient for computers with no built-in checks for zero accumulator, but with a check for negative (or positive, as the case may be) content of this register, Suppose that the very first bit location of the cell is the sign bit. Then, loading the accumulator with the first cell of the logical scale, we may organize the jumps in such a manner that jumps conditioned by negative accumulator content will agree with what is desired in the case of p = 1. As soon as the first stage is accomplished the entire scale is shifted by one bit “leftward” and so on. The logical scale method has numerous applications in the practice of programming and is by no means limited to the checking of logical conditions. It may be used successfully, e.g., for identification, on a list of n items differing by one characteristic; we form a logical scale sufficiently long to have n bits a t least, and for items having the given property we enter unities on the corresponding bits of the scale. Such an application of logical scales is employed by Ershov in his algorithm for translation of arithmetical expressions (cf. Section 1.3). Bibliographical notes. Lyapunov’s theory, outlined above, may be conveniently studied in a book by Kitov and Krinitskii [32],which is meant as a textbook for university students. Part of an earlier book by Kitov [31] has been translated into English and appeared in Vol. I1 of “Frontier Research on Digital Computers,” edited by J. W. Cam, 111, University of North Carolina. A refined formal work on programming schemes for algorithms is presented in Ershov’s paper [13]. A very fine example of material transformations of programming schemes is due to N. G. Arsentieva [ I ] who considered generally applicable algorithms of linear algebra and simplifications that may be made when, say, dealing with symmetric or diagonal matrices. Iliffe [24] has used the Lyapunov/Yanov notation and results of Yanov’s research on formal transformations of programming schemes. I n an interesting paper by R. I. Podlovchenko [51]some formal methods for transforming programs (not necessarily algorithms) are discussed. 1.2 Development of “Programs Which Program”&
A first attempt to automatize the programming procedures was made in 1954 by two scientific workers of the Computing Laboratory of the 41n this article, in analogy to Eastern European usage, the terms “programs which program” or “programming programs,” or the abbreviation PP, are used for what English-speaking authors have variously called translators, processors, compilers, etc.
33
WtADYStAW TURS KI
Soviet Academy of Sciences, V. M. Kurochkin and L. N. Korolev. The main results of their work were two programs for the BESM computer which performed translation of arithmetical formulas and assembling of programs according to their programming scheme^.^ At the same time in the Mathematical Institute of SAS the PP-1 translator for logical, readdressing, and restoring operators was constructed by Miss G. s. Bagrinovskaya, E. Z. Lyubimskii, and S. S. Kamynin. A revised and enlarged version of this translator, the PP-2, was built by S. S. Kamynin, E. %. Lyubimskii, M. R. ShuraBura, Miss E. S. Lukhovitskaya, V. S. Shtarkman, and I. B. Zadykhajlo. This programming program for the STRELAcomputer is the first Soviet fully automatic translator which produces an object program in machine code from a source program written in the form of a programming scheme (with some additional information added). Arithmetic formulas in the source program are given explicitly. However, there is one thing to remember. Up to the present time, no known Soviet-made computer possesses an alphanumerical input device. All computers can accept numerically coded information only.6 Thus, a source program written in accordance with the rules of a program that programs (e.g., PP-2) should be numerically coded before the actual key punching. Such a procedure has one considerable advantage; viz., Soviet scientists, when composing rules for an external language, are not restricted by the limited number of different symbols available on keyboards. As a matter of fact, introducing a new symbol into the set of allowed ones means just one thing, that one more numerical code is to be added to the already existing coding dictionary. This advantage is really a big one and makes programming in external languages very easy, but the price paid for it is not small either. A large number of coders are employed for performing a very tedious and laborious tasktranslating alphabetic expressions and conventional mathematical symbols into their “numerical representation.” This work is not only dull, it is also apt to cause many hardly detectable errors; to prevent errors, each program is coded by two different persons [27] and results are checked against each other. Now we proceed to the main principles of PP-2 as a t y p i p l programming program. The first stage of programming consists in writing down the program6The historical part of this section is based on Ershov’s lecture [I,?]. ‘The first known exception t o this rule is the Soviet-made URAL-2,installed in the Computation Centre of the Polish Academy of Sciences, with paper tape readers attached to it.
34
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
ming scheme (cf. Section 1.1).Next, each A operator should be specified, by supplying (i) mathematical formulas describing calculational procedures incorporated in it, and (ii) a list of quantities which are used by these formulas. At this stage a distinction is to be made between results and intermediary results. Every number generated in the course of execution of an operator A, which serves as an operand for subsequent pieces of that operator but is not its output, and therefore irrelevant for other operators, is called an intermediary result. Cells allocated for these quantities are called working cells or temporary storages. It is quite irrelevant from a general point of view which cells are used as working ones. For all P operators, corresponding logical functions should be specified together with necessary logical conditions, relations serving as arguments for predicates, and directories for transfer of control. Specifications for F operators include list of operators to be readdressed and number of corresponding parameters. For 4, operators, specified items are quantities to be loaded in standard cells and ordinal numbers of operators in which quantities should be replaced by contents of standard cells. For 0 operators, the ordinal numbers of the operators to be restored should be given. Finally, for nonstandard operators, H, pieces of code should be recorded. The next stage of programming consists in the preparation of the so-called list of conventional numbers which serve, in a sense, as identifiers. Each conventional number consists of 12 bits divided in two groups. The first consisting of three or four bits, represents the type of the quantity associated with the given number: variable, operator, working ceL1, etc. The second identifies the individual quantity within an array of quantities which are similar, i.e., have an identical first group. This method is rather inconvenient, since it does not provide facilities for indexing. In some programming programs, a third group of bits is introduced just for that purpose. Arithmetical formulas are written according to the following rules : (1) A formula may contain arithmetical operations which are built into the computer, or which are executed by standard subroutines permanently recorded in machine memory. This is not a very severe restriction since the set of allowed operations is fairly extensive. (2) There are no priority rules for operations, hence a suitable number of pairs of parentheses should be introduced. (3) All formulas should be linearized (in the typographical sense). (4) The quantity which is the result of the formula is written to the right of the "=" sign. 35
WLADYSLAW TURSKI
Thus, e.g., the formula
+a +b
+ + 1)
In x - 2 / ( c a a b*Inx should be rewritten as z=-
1
*
+
*
In x
+d
e(l+a+b’lnz)
(1.2.1)
(l+a+(b.lnx) -Z/((~+a+1).Inx)+(d.exp(l + a ( b * In 2))))/ (a ( b * In 2)) = z. (1.2.2)
+
+
As we have already said, variable identifiers as well as operation symbols are replaced by conventional numbers manually, The algorithm for producing the object code from arithmetic formulas may be described as follows [42]: (i) A left-to-right scan finds the first operation that can be executed, a corresponding piece of code using conventional numbers is produced in such a way that the result is loaded into the first empty working cell. The coded part of the formula is erased and in its place the number of the working cell (address) is recorded. (ii) The entire formula is scanned to the end, and all similar operations with identical operands are replaced by the number of this working cell, At this stage one restriction is compulsory, viz., not really all similar operations are replaced but only those that follow a lefthand (open) parenthesis. This restriction is somewhat milder in the case of add and multiply operations; namely, operations similar to that already coded are replaced when they follow not only the open parenthesis but also the “+” and “ - ”signs. (iii) Trivial parentheses, i.e., parentheses embracing one quantity only, are erased. (iv) Control is transferred to the first stage, unless the formula is already reduced to one of the two forms r, = x or a = x, where rk denotes the kth working cell. In that case, the final piece of code, the “final load,” is produced. The algorithm described above will produce the following code for formula (1.2.2): l + a =rl In x = r, b rz = r8 r1 r s = r 4 c rl = r6 r 6 - r , = rB
+
+
d< 36
=r’l
r4-r7 =rg exp r4 = re d * re = rl0 r8 rl0 = r l l a r 3 = r,, rll/ r I 2 = z.
+
+
(1.2.3)
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
Shtarkman proposed an interesting economy algorithm [55] for reducing the number of working cells required. This algorithm is discussed in some detail in Section 1.3. Logical conditions in PP-2 are coded ,by means of Lukhovitskaya’s algorithm [as]. This algorithm decodes any logical statement built up of predicates (cf. Section 1.1) and the logical operators v (alternation), (conjunction), and - (negation), and produces a code which secures execution of all prescribed control transfers. Let us assume that P is either 1 or 0 , and control is to be transferred to A if P = 1, and to B otherwise. Lukhovitskaya’s algorithm works as follows. (i) The first pair of corresponding left and right parentheses is found. (ii) Partial outlets A’ and B’ for that pair are determined: (a) The first “ - ” following the right parenthesis of the pair is spotted. If between the parenthesis and the sign any “ V ” occurs following the next closed parenthesis. I n either we search for case, A’ is the first predicate, pi, following the conjunction so determined. If, as may happen, there is no conjunction sign in the remaining part of the formula, the partial outlet A’ is identical with A . (b) To determine B’ we search for the first “V” sign following the right parenthesis. B’ is the first pi following this sign. As in (a),if no v is found, B‘ is identical with B. (iii) If the content of the parenthesis just considered had been negated, we change A‘ into B’, and vice versa. (iv) To determine control transfers inside of a parentheses pair we apply the following procedures: (a) We load some fixed cells with A‘ and B’. (b) I n a right-to-left scan we find the first predicate p i (on the first scan this is simply the last one inside the parentheses pair) and produce the code: go to ifp, = 1 then A’ else B’. (c) For all the following (in the right-to-left sense) predicates we produce code equivalent to if pi = 0 then go to B‘ else go to preceding (in the given sense!) predicate. This procedure is interrupted as soon as the first v is found. (d) Having discovered the V sign, B’ is replaced by the predicate occupying the position next to the V sign (from the right). (e) We go back to (b) unless the open (left) parenthesis is reached. (v) The entire parentheses pair and its content is erased and replaced by the first predicate, then we go back to (i). The programming program PP-2 includes, besides the two algorithms for arithmetical and logical operators just described, many other facilities, like the Shtarkman algorithm for economy of working 37 “a”
“e”
WLADYSLAW TURSKl
cells, debugging procedures which output storage addresses allocated to conventional numbers, etc. One more feature of this program should be mentioned. The PP-2 produces an object program on cards. Thus, in order to perform actual calculations another deck of cards has to be fed into the computer input device. This program serves as a basis for many other programming programs. One of these is a very elaborate programming system developed a t the Lomonosov University, Moscow, under the guidance of M. R. Shura-Bura and N. P. Trifonov, for the STRELA-4computer [G4]. This system uses many basic concepts of PP-2 (such as Shtarkman’s economy algorithm, conventional numbers, and so on, but possesses some new features which deserve special attention. For instance, the system treats a program as an assembly of standard subprograms. Standard subprograms, sometimes called standard subschemes, are those parts of programming schemes which may be written independently of the program in which they are used, and are “connected” with the main program by means of some parameters (one such subprogram is, e.g., a cycle). Each of the standard subprograms is given a name, i.e., a conventional abbreviation. The programming system accepts programs with such abbreviated notation, and before actual translation is initiated, the full programming scheme of the program is constructed from the set of programming schemes of standard subprograms. This method is considered as useful for two reasons: (i) the set of standard operators acceptable for translation may be frozen; (ii) the algorithm for replacing names by programming schemes may be very simple since it does not perform the actual coding. This is especially worthwhile for it allows the introduction of new standard subprograms with extreme ease. Actual coding in the system is done by an algorithm which uses a limited and frozen set of standard operators.’ When producing codes the programming system takes care not only to economize the quantity of working cells but also to minimize the number of instructions involved in representing the given operator in the form of machine code. A very interesting novelty of this system is the checking procedure. Several such procedures are available in the system.8 We shall briefly ?The reader will note the difference between the meaning attached to the word “subprogram”in this section and the conventional understanding of the term. To avoid possible misunderstandings,a subprogram in the conventional sense will be called subroutine. aThe original version of checking procedures is due to T. A. Trosman of the Mathematical Institute of SAS. The working version is due to N. V. Sedova, V. V. Voevodin, Yan Fu-Tsen, and E. A. Zhogolev (of. [ 6 4 ] ) .
38
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
discuss two of them. The control program is the program for imitating the well-known step-by-step checking which used to be performed manually. The main idea of the program is to use the object program as input information for the control program which, according to the programmer’s option, accepts one or more of the object program instructions and, once more optionally, either executes these instructions one by one printing out partial results, or prints relevant control transfers, or j u s t the instructions themselves in octal or decimal form. Moreover, a special facility is provided for introducing so-called test data to be processed by a desired sequence of instructions of the object program with output of the results of that process. This par+ of the checking procedures may be used in two modes: Either the control is returned to the checking procedure program after the execution of the chosen instruction and output of the relevant data, or the entire sequence of instructions is executed without transferring control to the checking program. The checking procedures described up t o now are what one could unmistakably call the automation of step-by-step manual checking. There is, however, an additional set of checking procedures, called program analyzer, which is itself a most interesting example of a refined approach to programming processes. The analyzer takes a program written in machine language and processes it so as to obtain as an output the original source program written in Lyapunov’s operator language. Of course, two conditions must be satisfied to secure the successful use of the analyzer: (i) The program to be analyzed must have been produced in strict accordance with definite rules (as it is if the programming system is used to code the programming scheme in machine language). (ii) There must be but one operator for which a given sequence of operations serves as the machine representation.
If the programming program uses any economizing algorithms then programs reproduced by this procedure will differ from the original ones in the sense that, e.g., all arithmetic formulas will be reproduced in optimal order. Jokingly speaking, original and analyzer-reproduced programs are identical modulo wasteful instructions. This feature of the analyzer constitutes one of its principal advantages, enahling it to be used for checking the quality of source programs. Another possible use is reconstruction of the source program from the object, program when, say, the original source program is lost, or when a straightened version is desired. The analyzer employed in the programming system is, in its author’s own wards, “just one of the first attempts to produce such a program” 39
WtADYStAW TURSKI
and thus undoubtedly will need a great deal of improvement before it gets its final form. Generally speaking, the analyzer determines the type of the operator to which a given instruction belongs, by examining the operation part of the instruction. For some instructions that may be used by two or more operators, some additional information is derived from two sources. First, it is supposed that a point to which control is transferred by a logical operator is always the beginning of some operator, This is used as auxiliary information in dividing the code into parts corresponding to operators. Secondly, it is presumed that the object program consists of three blocks: instructions and relative constants, absolute constants, and data. Thus, from the value of the address part of an instruction, it may easily be inferred what type of operator this instruction belongs to. Unfortunately no more details of the analyzer can be given without going too far into a technical description of the ST RE LA-^ computer and details of other components of the programming system. We now turn to two of Ershov’s programming programs: the program for the BESM computer (the PPBESM system) and that for the STRELA-3 computer (the PPS system). These systems are described in [II, 121. PPBESM is quite similar to PP-2, described earlier, and thus we shall give only those details which are different. There are two features which are, from the user’s point of view, most interesting and important, though rather trivial from the programmer’s side, viz,, automatic storage allocation and priority of arithmetic operations. From a formal point of view, programs are written in somewhat different manner, though still strongly influenced by Lyapunov’s notation. For instance, in PPS the following notation is employed. A variable is denoted by a small letter, which may be supplied with any number of subscripts; however, the use of subscripts is limited to denoting dependence on cycle parameters. Cycles themselves are represented by curled brackets enclosing the cycle. To open a cycle the symbol
[
used, i ,denoting
the initial value of subscript i, and i, its final value. If ia = O it may be omitted, thus
Yo
means that the operations between this symbol
and the corresponding } are to be executed repeatedly for s = 0,1,2, . . . , 10. The only admissible step for the cycle parameter is 1. Instead of giving the explicit value i , , a relation of the form i < m may be employed as an upper limit for the cycle parameter.
40
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
Besides arithmetic operators, represented by formulas, five other types of operators are allowed: logical operator restoring operator nonstandard operator readdressing operator repeat and check operator
4 B, H,
n,
DC. Each operator is represented by its symbol followed by a set of parentheses inside of which information about the operator is given. Operator symbols may be subscripted; this time, however, subscripts denote labels of the operators. There is plenty of freedom in writing arithmetic formulas. As a matter of fact, only two common symbols are not allowed, namely, the fractional bar (horizontal) and the root symbol. Both should be replaced by powers. There are two symbols for multiplication: “ x ” and “*”; the latter is often omitted (implied multiplication). I n the following example arguments of functions are underlined in order to show the difference in the meaning of the symbols. zicos(a
+ b)(a-b) + In zj2 + d tg(f + a ) x
(a - b) 3 r;
(1.2.4)
Logical operators are written in the form (1.2.5)
I
where a b stands for any one of the relations a< b, a
0
Nfl
predicates in previously discussed papers, A (a< b [ ) meaning “go to N,
if a< b then N , else N 1 ” , where N , and N , are subscripts of operator N
symbols. Unconditional jumps are represented by [, and conditional N
N,
jumps depending on the content of the test register are written as [ N,
with the rule: If the content of the test register is 1, then transfer the control to the operator labeled (subscripted) N , , otherwise jump to N,. A few words should be said about the DC operator. Any part of the source program may be included between two DC operators. This will result in the following procedure being executed. After actual calculations according to this part of the program have been performed, the content of the main storage is summed up and the sum thus obtained is recorded. Then the relevant calculations are repeated and a second summation is accomplished. If the two sums agree, the entire content of 41
WCADYSCAW TURS KI
the main storage is recorded on magnetic tape. If the sums disagree, the computation process is interrupted and the operator (human) either calls the repair man or, by repeating the calculations for a third time, tries to choose between the two earlier results. Much attention has been given to economizing the translation time for PPS. A very fine example of an economy algorithm is discussed in Section 1.3. The PPS is some 1200 three-address instructions long, and translates source programs a t the speed of about 35 three-address instructions of the object program per minute [12]. Bibliographical notes. There are a t least three complete descriptions of different programming programs used in the Soviet Union. The PP-2 is described in Volume 1 of Problemy Kibernetiki; the programming system for STRELA-4and the programming program for BESM are published as separate books [11] and [61]. Besides, many more algorithms are scattered throughout different issues of P K and other Soviet journals. A formalized algorithm for translation of arithmetic formulas similar to that of Lyubimskii is given by Evien Kindler [30], see Appendix 2. 1.3 Some Results of Optimization of Translators
Addressing the participants of the Symposium on Mechanization of Thought Processes, A. P. Ershov said [I21 that there are two paths for the future development of programming programs (translators) which are of equal importance. One way leads to more flexible translators allowing for the use of more complicated forms of source programs, thus reducing the time that the programmer must spend in preparation of the programs. The alternate way leads to translators which would produce object programs more quickly and, what is perhaps even more important, would produce object programs that work more quickly and use fewer working cells. It is certainly possible that the technological development which has recently been progressing a t a tremendous pace may still accelerate, thus providing us with more and more powerful computers; but nevertheless ceteris paribus it will always be desirable to possess translators which are more nearly optimal in both senses quoted by Ershov. I n the present section we shall not follow the pattern of the two preceding ones. We shall present neither a historical account, nor a full discussion of pertinent work accomplished in the Soviet Union. Nor shall we attempt to exhaust the subject. We will select just two algorithms (very similar in a sense) and look at them more closely. 42
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
To begin with, we will study the Shtarkman algorithm for reducing the number of working cells required as temporary storage for intermediary results in the evaluation of arithmetic formulas [55]. The number of working cells required by the program is equal to the greatest number of working cells required for any single operator (in Lyapunov’s sense). As far as the Shtarkman algorithm is concerned, it is essential that the computer for which the algorithm was constructed be a threeaddress machine. The general structure of the instruction address part for the STRELA-4 is IA, IIA, IIIA, where IA and IIA denote addresses of operands, and IIIA denotes the address of the cell into which the result is loaded. (For some operations not all of these are used.) Let us consider the address parts of a piece of code corresponding to an operator (see Table I). TABLEI Distribution of conventional numbers in the address parts of the instructions corresponding to an operatop Regions of existence
Ori
denotes the conventional number of the i t h intermediary result.
Of course, the simplest allocation rule would be to assign individual working cells to all conventional numbers r l , r2, . , . , r5. However, a glance at Table I reveals that this would be a poor rule, since, e.g., the same working cell may be used for keeping results rl and, say, r6. As a 43
WtADYStAW TURSKl
matter of fact, we need only two working cells to serve the example of Table I. The set of instructions during whose execution the working cell loaded with rj is tied up, is called the region of existence of the intermediary result rj. This set begins with the instruction generating ri and ends with the instruction that precedes the last one in which rj is used as an operand. (Regions of existence of intermediary results in Table I are shown in the last three columns.) Thus, the following rule may be formulated: Intermediary results whose regions of existence do not overlap may be assigned the same working cell. According to this rule, the conventional numbers in the example should be replaced by a reduced set of conventional numbers, i.e., a set in which different numbers are attached to intermediary results whose regions of existence do not intersect. I n other words, rl, r2,and rs should be replaced by rI1,and r 3and r4by r t 2 .This example explains the main principles of the algorithm. With reference to the example considered in the previous section [cf. formulas (1.2.3)] it may be interesting to compare the old code with one obtained by application of the Shtarkman algorithm : l + a =r4 r2-r3=r3 lnx = r3 expr, = r , bar, =rl d'r: =r, (1.3.1) r4 rl = r2 r3 r2 = rz c+r4 =r4 a+r,=r, r p -r 3 = r 3 r21r, = z.
+
+
d&
=r3
Instead of 12 working cells only four are used. The Shtarkman algorithm is a typical one-pass algorithm which works as follows: All instructions forming an A or an F operator, i.e., the only ones that may possibly use working cells, are scanned starting with the last one. During this procedure a special table T is produced in which all the conventional numbers representing the intermediary results are listed, providing that the operation currently looked a t belongs to the region of existence of the pertinent intermediary result. The table T is produced according to the following rules. If IIIA contains a conventional number of an intermediary result, the relevant row in T is cleared, If a conventional number occurs in either IA or IIA, and in T there is no row containing that number, the first empty row of T is loaded with that number. If there is no empty row in T , a new row is added. Now, the table T possesses the following properties: (i) Each row of T contains conventional numbers of intermediary results whose regions of existence do not overlap.
44
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
(ii) The conventional numbers of any two intermediary results with overlapping regions of existence a,re to be found in different rows of T. (iii) The number of rows of T starts a t zero a t the beginning of the procedure and grows to the minimum number of working cells required a t the end. The unique assignment of working cells to rows of T and replacement of the Ti’s occuring in a row by the corresponding working cell address solves the economy problem. Shtarkman gives the following programming scheme for the algorithm just described. I n the scheme we have preserved Shtarkman’s original notation, hoping that the reader will not be inconvenienced too much:
(1.3.2)
There are three special cells a , ,6, and y which me to keep the following information : u contains the currently examined operation. /3 contains the currently examined fraction of the address part of the operation contained in a (IA, IIA, or IIIA). y contains the number of the first empty row of T. The algorithm works in three stages for each operation (corresponding to IA, IIA, and IIIA). Meaning of operators in Shtarkman’s algorithm :
II, Preparatory operator. 0, Transfers content of a back t o operator code. Ill3 Picks next operation from the operator code, loads a with it, and checks for the end of the operator.
II, Preparatory operator for three stages necessary for each operation.
a5
Clears operator w12and skips next operator. Loads w12. UI, Puts next piece of address part into j?, “tunes” operator ip,, for loading next conventional number into prescribed address, checks for end of the operation. P, Checks whether content of ,6 is a conventional number of any of the intermediary results. II, Prepares search through T, loads y with new address. U1,,Picks up next row of T; if the row is empty, loads y with its address; checks for end of T. 45
3,
WtADYStAW TU RSKl
P,, Checks whether the selected row of T contains the conventional number searched for. o12This operator, if present, causes the next one to be skipped (operator is to be cleared for the I I I A stage, and restored for IA and IIA). W,, Clears pertinent conventional number from T. B,, Loads pertinent conventional number into T. el5Forms the new address (erases the old conventional number in IA, IIA, or IIIA, depending on the stage currently performed, and replaces it by the new conventional number). R,, Outputs the number of working cells required for the examined operator code. The second algorithm we are about to discuss here is a part of a more general formal algorithm due to Ershov [lo]. Since, however, the remaining parts of the general algorithm are rather well known, and since the English speaking reader may consult the translation of the original paper, we shall give here only the principles of the “subalgorithm” used to reduce the number of instructions involved in the object program representation of an arithmetic formula belonging to a source program. Ershov’s algorithm will be presented as an ALQOLprocedure. This form is comparatively easy to follow, and was used by the author of the present article in a course of lectures on ALQOL given in 1962. For the sake of convenience, some features of the original paper are retained, though they prevent a more concise notation.
procedure Economize code for a given operation first address (a) second address: (b) third address: (c) control bit: (sigma) operation number: (theta); comment We suppose that the given operation is one of n operations generated by the given operator. All these n operations are stored as a two-dimensional array L [ 1 :n, 1 :5]. We suspect that there may be some repetitions in the array which should be deleted by the procedure. Array R [ 1:t, 1:5] consists of resultant operations, i.e., those which perform the assignment for the formulas forming the given operator, t is the number of formulas in the operator. Integers assembled in the array C1 [l:m]are called conventional numbers of the first kind and represent variables and constants pertinent to the operator, there being m different variables and constants. Boolean arrays B [ 1 :n]and B1 [ 1 :m] are logical scales. At the first entry to the procedure (for the given operator), the preparatory part, preceding label 1, should be executed, this being controlled by a Boolean 46
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
variable “first entry.” Labels correspond to individual operators forming Ershov’s algorithm. The seventh operator is skipped as irrelevant to the procedure. Structure of Ershov’s function is explained below. On calling the procedure, sigma = 0 (always); value a, b, c, theta, sigma; integer a, b, c, sigma, theta; begin integer i, j,p , s; if first entry then go to 1 ;
-
for i = 1 step 1 until n do begin for j = 1 step 1 until 5 do L [i,j] : = 0; B [i] := false end for; for i = 1 step 1 until m do Bl[i] : = false; p := 0 ; 1: if c # 0 then go to 9; 2: s : = Ershov function (theta, a, b ) ; 3: forj = 1 step 1 until 5 do if L[s,j] # 0 go to 6; 4: if a = L[s, I] A b = L[s,21 A theta = L[s,51 A sigma = L[s, 41 then go to outlet 1; 5: if s
WtADYStAW TURSKl
random numbers with almost constant distribution over the interval [1, n ] and studying statistically the structures of arithmetic operators, it is possible to construct functions that will be easy to evaluate and will considerably reduce the time consumed in the execution of the alg~rithm.~ Before leaving this interesting algorithm we shall point out that logical scales are used here for quite different purposes than their original application described in Section 1.1. Bibliographical notes. Ershov’s paper [ l o ]contains one more algorithm for reducing the number of working cells. Interesting remarks on economy of memory requirements can be found in a paper by S. S. Lavrov [40],which are discussed in Section 1.6. Many useful devices for obtaining optimal codes are contained in [64] and [ I l l . Two recent papers [67] and [48] are devoted to problems of automatic checking and symbolic addresses used in programming programs. 1.4 Soviet Work on ALGOL
I n autumn of 1958 a t the Mathematical Institute of the Soviet Academy of Sciences a research program was undertaken to construct an input language for electronic computers that would satisfy two conditions: (i) It ought to be as easy to learn as possible. (ii) It should be machine-independent. As a basis for such a language the Zurich version of ALGOL(ALGOL58) was originally adopted, but then a number of generalizations
were introduced, resulting in a more convenient notation. The proposed generalizations almost coincided with those changes in ALGOLwhich were introduced a t the Paris conference; thus most of the discrepancies between the proposed input language and ALGOLwere removed. It may be interesting to note that this coincidence in proposals arose in the work of two completely independent groups, and that the formal side of the proposed alterations is quite different. I n a paper by Lyubimskii [43] some remarks are given concerning disadvantages and shortcomings of A L G O L - ~these ~ ; remarks were made a t a Symposium organized by the Mathematical Institute. The remarks have a rather formal character and go along lines proposed by Peter Naur in Algol Bulletin. Since, moreover, some of the features of ALGOL-60,considered by Lyubimskii as liable t o cause misunder*Algorithmsof that type usually take up time proportionally to ns;Erahov’s algorithm needs time proportional to n.
48
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
standings, are considered by other authors as purposely unrestricted, we shall not give any details of Lyubimskii’s view. The most advanced Soviet work on ALGOLis presented in a book by Ershov, Kozhukhin, and Voloshin [16‘]. The book defines what has been called on several occasions the Siberian dialect of ALGOL.It is certainly true that the language proposed in the book differs substantially from ALGOL, though it remains in close relation to it; as a matter of fact, the proposed language incorporates ALGOLin the sense that any correct ALGOLstatement is a correct sentence of the Siberian dialect, although the converse is not true. Compared with ALGOL-60,the Siberian language possesses many additional features; the most important of them are listed below. (1) It is permissible to use complete arrays in arithmetical statements; thus it is possible to write matrix operations just as arithmetic expressions, without involving complex structures of procedures. This generalization, permitting a simple variable to denote a multidimensional array (and not only a scalar element of it), calls for extended possible type declarations for simple variables, for it would become necessary to declare the structure, i.e., dimensions and bounds of subscripts, for single variables. I n addition, arithmetic operators and standard functions should be redefined to allow operations on arrays represented by simple variables. For convenience, the metalinguistic term “array” should be divided into “matrix” and “array”, the latter being reserved for more than two-dimensional structures and for such two-dimensional arrays that do not obey the rules of matrix algebra. Finally, two new types of statements-forming statement and composing statement for building arrays from components having fewer dimensions and smaller components of the same dimensions, respectively-are introduced. [It is curious to observe that in completely different problems similar statements are defined and used by Klounek and Vl6ek (cf. Section 3.1).] (2) The Siberian language allows for use of more complex relations in Boolean expressions, viz., it is possible to use chain relations such as:
if if
411 = - - - = a[n] A >a >b > B
=
0
then S ;
then go to Novosibirsk.
This extension is connected with the abbreviated form of sequences depending on parameters. Thus, in correct Siberian language it is possible to write
--
= T[c,] instead of, say, T[c,] = T [ c l ] = T[cl 11 = T[c, 21 = T [ c ,
+
+
+ 31
(when cg = c1
+ 3). 49
WtADYStAW TURSKI
Generally, T[cl]iu wT[c,] denotes any sequence in which w is a delimiter, and c1 and c2 are the initial and final values of a parameter. (3) In the Siberian language it is possible to enter a block a t a place other than its head. This is achieved by introducing compound labels. A compound label takes the form A .B , a dot separating the components of the label. When used in go to statements, it signifies transfer t o a block labeled A , with inner label B. This preserves the inherently local character of labels and makes it possible to enter the labeled statement of any block from the outside. (4)There is a STOP statement in the Sibcrian language. This statement is equivalent to a dummy statement but has no successors whatsoever. ( 5 ) There are other minor formal differences, all conceived t o simplify the programmer’s work. (6) More fundamental differences are to be found in the concept of upper indices. Upper indices, when used, denote variables which are evaluated in some dynamical order. The current value of an upper index shows the number of the currently used value of a given variable. Consecutive values of upper-indexed variables are often obtained from recursive arithmetic formulas, e.g., %L+l
._ a
%i-Z
+ &l
+
x
xi,
(1.4.1)
There is obvious simplification in translation of formulas written with the help of upper indices, since only those different values of an upperindexed variable need be kept in machine memory which are used by the formulas. For the (usual) lower-indexed variables, i.e. those arranged as an array, all values must be preserved. ( 7 ) It is proposed that in the Siberian language a new rule for formal parameters called by value be adopted. Usually, for a given procedure, formal parameters called by value may be used as arguments of procedures only. In the Siberian language any formal parameter may be called by value, thus becoming local to the procedure body; however, if it is one of the “results” of the procedure, its actually computed value is assigned to that actual parameter which corresponds to the “resulting” formal parameter. ~ ~ be The concept of “cttll by vaIue” in standard A L G O L -can explained informally as follows: For the formal parameters entering the value list the corresponding actual parameters are evaluated once on entry into the procedure body (when the procedure is activated) and the values thus obtained are assigned to the formal parameters by means of implicit assignment statements which precede the first begin of the procedure body. The formal parameters called by value will in subse-
50
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
quent statements be treated as normal local variables of the body. Hence, if in the course of execution of the body, new values are assigned to such parameters, these new values would be inaccessible outside of the body, i.e., for instance, in the block which activated the procedure. This means that no “results” of the procedure could be treated in this way. In the Siberian language for such resultative formal parameters, called by value, implicit assignment statements are introduced, which are executed immediately after the final end of the procedure has been reached. These assignments act in a manner exactly opposite to the ones performed on entry into the body, i.e., the values of formal parameters obtained last (in the dynamic sense) are assigned to the corresponding actual parameters, specified in the procedure statement. This rule, while not enriching the “expressive power” of the language, may however become very convenient by reducing the overall number of registers needed to represent variables of the program. (8) The Siberian language preserves the A L G O L concepts -~~ of using procedures and functions with empty parameter lists as actual parameters, and allows for the use of several identifiers for one procedure. (9) I n order to make it possible to use the customary forms of formulas in which specifications of certain variables are given after the a, where y = sin(r/z)-new decformulas themselves-e.g., x : = y larations are introduced. These declarations are formally similar to assignment statements, as e.g., the above used y = sin(n/x). Like all other declarations they have no operative meaning and serve only to introduce initial values of parameters or variables. Besides, this type of declaration leads in some cases to considerable simplification and time economy in object programs. Ershov [I61 gives a thorough formalized description of the Siberian language. In addition there are many interesting examples of procedures and programs written in that language, together with a very useful Siberian-ALaoL dictionary and glossary of new terms. It is hoped that this brief section will inform the reader about the deep and far-reaching research on ALGOLconducted in the USSR. As t o practical applications of this theoretical research, very little can be said now. At the present (end of 1962) there are three groups working on implementation of ALGOL.Two of them, led by Ershov (Novosibirsk) and Shura-Bura (Moscow), were stated10 t o expect t o achieve their goal by early 1963’O‘. Both are trying to implement dialects
+
‘OPrivate communication. 10aAdded in proof. During the Kiev conference on automatic programming (Kiev, June 1963) Shura-Bum’s ALQOLcompiler TA-2 was demonstrated. A thorough description of this translator and the ALQOLdialect adopted is to be found in a paper:
51
WkADYStAW TURSKl
of ALGOLon the M-20 computer with card input. If and when this is achieved, the &I-20 will become the first Soviet computer to possess alphanumerical input operated in the USSR (cf. Sections 2.1 .and 2.3). The third group is led by V. M. Kurochkin [38]. Bihliographical notes. Besides the quoted book 1161, there is a paper by Ershov [ 241 which gives some “shop” details about the construction of the Siberian language and future translators for it. A number of ALGOL-written algorithms has appeared in Zhurnal Vychislit. Mat, i Matem. Bixiki. 1.5 Non-Lyapunovian Approach
The ideas discussed in preceding sections refer to concepts of programming schemes and programming programs (PP’s) of a certain, though not sharply defined, type. For all the PP’s hitherto considered, it is typical to possess considerable generality of form, and virtually no distinction is made between programming of problems taken from different fields. I n other words, these PP’s ar3 not dependent on the actual content of the source program. I n the pertinent Soviet literature, apart from variations of this approach, one may find two fundamentally different approaches which are discussed briefly in this section. First of all, we discuss the autoprogramming method developed a t the Computation Center of the Ukrainian Academy of Sciences headed by V. M. Glushkov [ 1 9 , 5 8 , 5 9 ] . This method tends to eliminate a considerable amount of highly intellectual work which must be performed by skilled programmers and educated mathematicians when Lyapunov’s multilevel preparation procedure is used (cf. Section 1.1). Glushkov [I91 rightly observes that the method proposed by Lyapunov, and subsequently adopted in many programming programs, leaves t o the programmer the very difficult problem of choosing the right algorithm for the given problem and then preparing calculational schemes, whereas with little additional work this stage of programming could also be automatized. Let us consider a finite set of typical mathematically stated problems, which will be called the class of allowed problems, or CAP for short. The elements of this set are not specific problems but types of problems, each consisting, generally speaking, of an infinite variety of specific A L G O LTranslator, -~~ [Shura-Bura, M. R., and Lyubimskii, E. Z., Zhur. Vych.Mat.6 Mat. F i z . 4, No. 1 , 96-112, (1964)I. During the same meeting it was announced that the Kiev Institute of Cybernetics, headed by Academician V. M. Glushkov, produced another ALGOLcompiler for the KIEVcomputer.
52
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
problems. For each element of CAP there exists a finite number of known algorithms. All these algorithms should be programmed once and for all and kept in machine memory (perhaps in some auxiliary type of storage). In addition there should be one special program for each element of C A P which will do the “thinking” on selecting an algorithm for a specific problem. Of course, there are many principles influencing the choice of an algorithm for a given problem, such as accuracy desired, individual features of the specific problem, memory requirements for each of the possibly applicable algorithms, and so on; but it is quite clear that once these principles are formalized there is a possibility of programming the choice procedure. All partial programming programs provide control facilities of two types. One of these serves for checking the calculation procedure, the other checks whether the chosen algorithm, the machine parameters and the specific problem are compatible, in the sense that “it is possible to solve the given problem, with given accuracy, by means of the chosen algorithm, on the given computer.” Partial programming programs that perform actual programming (coding) of specific problems use standard subprograms for performing tasks that are common to more than one partial programming program, e.g., coding arithmetic statements. Glushkov suggests that some nonnumerical programs might be included in the library of subprograms. These would perform a number of easily formalized analytical transformations of data (e.g., differentiation and integration of elementary functions, substitution and reduction of polynomials, etc.). To this question we shall return in Section 1.6. Glushkov says that the input information, or source program, for a computer endowed with an autocode of the proposed type would consist of statements like: “solve the set of simultaneous differential equations dx,/dt = - 5 z Z 2 ,
dx,/dt
= x1
+ cos 22,)
with initial conditions t = 0, x1 = x 2 = 1 over the range 0 < t < 10, with accuracy better than 0.0001; and produce values of x1 and x2 for t = 1, 2, . . . , 10.” After reading such input the computer will “itself” choose the right algorithm and perform the calculations. There is no known realization of such an autocode yet; however, first attempts to implement similar methods were made by Stognii, who produced working programs for solving linear differential equations and formulated some criteria for choosing one of two algorithms: that of Runge-Kutta and that of Adams (Stormer) type. Unfortunately, the criteria were not formalized and thus the actual decisions have had to be made by programmers. I n [58] Stognii gives a program for reading 53
WlADYSlAW TU RSKl
sets of equations (coded manually letter by letter) and producing object programs for either of two algorithms. I n [59] he proposes a scheme for differentiating (analytically) equations of the type z = f(z,y), y = y(z), and uses this scheme for a program that solves differential equations in the vicinity of singular points. Glushkov’s ideas may be thought of, in a sense, as generalizations of Lyapunov’s principles. Now we proceed to a quite original approach advocated by Yu. A. Shreider [53] and [54] and his colleagues V. A. Kozmidiadi and V. S. Cherniavskii [37]. The ideas pertinent to this approach are as yet neither embodied in any practical representation, nor even worked out in enough detail to permit such an embodiment. However, this approach opens such exciting perspectives and leads to so profound an analysis of programming that any review of relevant Soviet work not referring to these ideas would be incomplete. The basic idea behind the proposed method of programming is to consider a computer as a language that includes, as a sublanguage, the programming language. I n this connection, a language consists of (i) a set of material objects (symbols, electronic components) constituting an alphabet of the language, (ii) rules for composing correctly formed finite subsets (words, expressions, formulas, registers, accumulator) constituting a formalized syntax of the language, together with (iii) some interpretation of the texts, i.e., finite sets of correctly formed subsets. Interpretation is defined by rules constituting the semantics of the language and, by and large, may be understood as a mapping of the linguistic system (set of all correctly formed subsets) onto a set of objects called meaning or values. Thus, an electronic computer and an abacus may be considered as languages. This somewhat shocking statement becomes clear when one recalls that the abacus (used for calculations and not as, say, a kind of trolley) is useless unless some interpretation rules are given; in other words, to use the abacus, one has to interpret correctly formed combinations of beads on wires as numbers. Similarly, an electronic computer without proper semantics is just a meaningless heap of circuits, elements, and racks. For a computer treated as a language, the alphabet consists of primitive elements like flip-flops or ferrite cores (sometimes called variables) which may assume one of two possible states (called values of variables). Such variables are interpreted as material variables, and the states of material variables are interpreted as the values zero or one. Aggregations of these variables, e.g., registers, are correctly formed expressions interpreted as “variable words,” and the states of these aggregations are interpreted as numbers or instructions. Which specific 54
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
interpretation is to be attached to a given state of a given aggregation depends on, e.g., the state of another aggregation. Sequences of expressions, corresponding to processes executed by the machine, may well be considered as correctly formed expressions and interpreted as “searching,” “addition,” “comparison,” etc. Such a n approach allows for replacing dynamic programs by lexicographic expressions (static description). Now, if the input devices of the computer are singled out and some interpretation of their state is adopted, we get a sublanguage-input language-of the computer. It is very important to realize that the language, as defined above, depends on the interpretation rules; thus, the same linguistic system (embodied in hardware) may give rise to many different languages. Let a function y be given by a set of expressions of a language L. This set is called a program of the function p in the language L. Languages designed for defining (in any sense) functions are called programming languages. All languages used currently for actual programming belong to a special type of programming languages, viz., algorithmic languages, or process languages. The distinguishing feature of these is that a function is given by its algorithm, i.e., a set of prescriptions which, if obeyed consecutively, lead from an argument to the corresponding value of the function. The algorithmic languages do not, however, represent the only possible structure of programming languages. Another example of programming languages is the language of primitive-recursive functions, defined as follows: Alphabet (1) Symbol 0 , called zero, is interpreted as number zero. (2) Symbol I , called dash, has no independent meaning; if, however, A is interpreted as the number A, then A is interpreted as the number A + 1. Thus 0’ is interpreted as the number 1. (3) Symbols xl, x,, x3, . . . , called variables, are interpreted as variables assuming positive integer values. (4) Symbols ‘pl,‘p,, cp3,. . . , called functional symbols, are interpreted, depending on context, as names of functions. ( 5 ) Symbols ( , ), and =, called left parenthesis, right parenthesis, and equality sign, have no independent interpretation, but influence the interpretation of expressions in which they occur. Syntax (1) 0 is a number. (2) If A is a number, then A‘ is a number. 55
WtADYStAW TU RSKl
(3) Every number is a term. (4)
Every variable is a term.
.
(5) If 9 is a functional symbol and T,,. . , T, are terms, then @ (T,, . . . , T,J is a term. ( 6 ) If T, and T, are tierms, then T, = T, is an equation. ( 7 ) The equation 9 ( X ) = X’,where Q1 is a functional symbol and
X is a variable, is a primitive-recursive description. I t s interpretation is the description of the function y that puts into correspondence t o a number A the number A 1. 0 is interpreted as the name of the function p). (8) The equation @ (XI, . . . , X,) = A is the primitive-recurgive description interpreted as the description of the function y which to any n-tuple of its arguments puts into correspondence the number A . The symbol 9 is interpreted as the name of the function p, (9) The equation 9 (X,, . . . , X,) = X,is a primitive-recursive description that is interpreted as the description of the function q~ which to any n-tuple of its arguments puts into correspondence the value of its ith argument (1 < i < n). @ is interpreted as the name of the funcbion y. (10) Let (i) P, P, . . . P, be a primitive-recursive description, (ii)9,9,, . . . ,* k be functional symbols appearing in it, (iii) the left-hand side of the equation in which these functional symbols appear for the first time be @,(X,, . . . , Xk), (X,,. . . ,X,), 1 G i < k, (iv) 9be a functional symbol not appearing in the primitiverecursive description under consideration, where P, is of the form then P, . . . P, P,
+
+
+
qx,,. . .
>
X,)
= QIO(*,(X,
>
. . . , X,), . . . ,*,(X,, . . , X,)) *
is a primitive-recursive description that is interpreted as the description of all functions given by the initial description and of the function p) obtained by substitution of the functions p), , , . . , q k into the function y o , where y t ( 0 < i < k) are the functions whose names are 0,. The symbol 9 is interpreted as the name of this function p). (11) Let (i) P,P, . . . P, be a primitive-recursive description, (ii)9,and 9,be functional symbols appearing in (i) and the left-hand sides of the equations in which they appear for the first time be of the form Q,(X,, . . . , X,-,), @,(X,, * * Xn+A (iii) 9 be a functional symbol not appearing in (i), then PIP, . . . PmPm+lPm+e, where P,+l is Q(O,X,, . . . , X,) = 56 9
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
Q,,(X, , . . , Xn) and P,, + 2 is Q,(X’,,X, , . . . , X,J = a 2 ( X 1 Q,(Xl, , X2, . . . , xn),X, , . . . ,Xn), is interpreted as the description of all functions described by (i)and of the function cp which is expressed by the functions qJ1 and vz : ~ ( 0X,, ,
?(XI
+
(XZ,. Xn), , X2, 1, X2, . . , Xn) ~ 2 ( X 1cp(X1, * * * 9
Xn)
= ~1
* * 3
=z
* * * 3
1 2 ,
.
*
9
Xn)*
The symbol Q, is interpreted as the name of the function F. (12) Let PI . . . P,, be a primitive-recursive description and akthe functional symbol of the left-hand side of equation P,n. Then the primitive-recursive description P, . . . P,,, is called a program for the function vk,whose name is a,.
Example. The primitive-recursive description P, P2 P 3 P, P5 P,, where P 1 1
Q,, (Xl) = XI,
P 5 l
Q,AXI) = X’1, *,(XI, x,, X3) = XZ, Q,,(XI, x2, X,) = @2(Q,,(Xl, x2, X3)L 9 5 m XZ) = @l(X2),
CpS1
@ 5 ( x ’ 1 1 x2)
P 2 1
[PSI P 4 1
=
*5(’19
‘2)j
‘2)
+
is a program for the “adding function” p6(r1,x2) = x , x2. Of course the primitive-recursive function method of programming is not a very convenient one, and serves as an example rather than as an object to be achieved. Nevertheless, the possibility of programming not only by means of algorithms but also by means of relations is shown by this example. Thus, the general conclusion may be stated that a generalized programming language should include not only algorithms, but also relations as standard means of describing procedures. Such a language is proposed in [all and is based on the theory of Markov algorithms and on the method of primitive-recursive functions just described. There are one or two further points that deserve t o be mentioned. The approach to programming described here implies what sort of micro-operations should be included into the code list of a computer in order to facilitate autoprogramming. I n fact, Shreider [53] examined the Soviet three-address computers from this point of view, and suggested that, although they are general purpose computers, many changes in their code lists are necessary, or at least desirable, in order to simplify automatic programming for a given class of problems. This should not surprise the reader, since only the statistical analysis of a large number of problems programmed in languages which describe 57
WLADYSLAW TURSKI
not only macro-operations but also micro-operations, i.e., the computer itself, makes it possible to decide which correctly formed expressions pertinent to the programs should be incorporated into hardware. Finally, it is worth mentioning that regarding the computer as a language may lead to an entirely new concept of reliability. 1.6 Special Purpose Autocodes
All that has been said so far about automatic programming in the
USSR is restricted to general autocodes, i.e., programs that accept a source program written in terms of symbolics, and produce an object program written in machine code, regardless of the content of the source program; provided, of course, that the latter was written according to certain rules. This is true even in the case of Glushkov-type autocodes (of. Section 1.5).However, it is not difficult to see that in practice a somewhat different situation arises quite often, viz., it becomes profitable to possess a specialized autocode which accepts source programs belonging to a more or less restricted class only. Such an autocode operates very quickly, and occupies comparatively little machine memory. As a matter of fact, the attentive reader has undoubtedly noticed that an autocode of Glushkov type consists of a number of specialized autocodes of just this kind. I n the first part of this section we give in more detail an interesting example of a specialized autocode due to 0. K. Daugavet and Ye. I?. Ozerova [ 9 ] . This autocode is designed to perform operations on matrices, vectors, and scalars and thus is applicable in all problems of linear algebra. This autocode produces object programs from source programs in which actual values of data need not be given explicitly. Hence, in a sense, the object program is data-independent and may be used many times with different sets of data, provided the structure of the data is the same in each case. Information about data structure is presented to the computer in the following form. Each variable (matrix, vector, or scalar number) is given a name, denoted by the programmer as a capital letter (or combination of letters) but afterwards coded as a number, since the computer does not accept alphabetic information. A matrix M which is to be stored as a two-dimensional array in machine memory, starting with location a,,is represented as M a , m h, n
h2,
where m and n are the dimensions of the matrix, and h , and h, denote the “step length” in the array in the directions corresponding to the 58
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
dimensions m and n. A p-dimensional vector V , located from a2, is represented by v a 2 p h, where h is the step length, i.e., the number of cells occupied by one component of the vector. A number located in a, is represented by
N a, 0 0. The symbolical inscription N A klk,OO denotes an element of matrix A , the address of which is defined (inside the array representing matrix A ) by the relation a, = a , k,h1 k2h2,where the bars indicate that the dynamically last (i.e. most recent) values of the parameters k , and k , are taken into consideration. The parameters may be either constant or variable, in the latter case they may be mutually dependent. Formulas of the type k = cc ki + phj y , where a , p, and y are numbers, define actual values of variable parameters. The program itself is very simple and consists of sequences of “commands” written in the form n,ABC, where n, is a pseudo-code number denoting the operation to be performed on the variables A , B , and C. If, e.g., ni assumes the value of the pseudo-code denoting matrix multiplication, the command is equivalent to multiplication of matrices A and B and storing of the result in the location assigned to matrix C. An alternative form of commands is
+
+
+
ni A F R X This form is used when the result of matrix operation nj is a vector, as happens, e.g., if nj is the pseudo-code for solving simultaneous linear equations, A being the associated matrix of the system, F the column vector of right-hand expressions, X the result vector, and R a column vector of working cells. There is one feature of the autocode which deserves special attention. Some of the variables may be marked by a special sign (following the appropriate capital letter) which means that the marked variable will never be used in consequent operations. It is clear that the translator (or rather compiler) interprets this information in such a manner that storage occupied by the marked variable may be used for variables (or intermediary results) if they are generated after (in the dynamic sense) the execution of the command containing the marked variable. Additionally, part of the information may be incomplete, i.e., need not 59
WtADYStAW TURSKI
include initial locations. Such variables will be located either in storage previously occupied by “marked variables” or, if these locations are still occupied, in any other locations available. The Daugavet-Ozerova autocode is, in a sense, an abbreviated form of a partial programming program of Glushkov type. Next we describe an interesting nonnumerical autocode due to Shurygin and Yanenko [56].The autocode is devised to perform some simple, yet very useful, operations on algebraic expressions written in alphanumeric form. The expressions accepted by the autocode are of the type
P
=
E&a, xyi x y . . . . xp,
(1.6.1)
where ctki and a, are real numbers that may be recorded in machine memory and x k are letters. If xj is a variable it may be a function of other variables. Expressions like (1.6.1) are called polynomials. Observe that this use of the word is more general than the customary one, in which the exponents must be non-negative integers. A large class of expressions used in mathematics can be represented as polynomials in this sense, e.g., the expression
R=
uz
+ dcy2 + 1 dx2 + ey2
(1.6.2)
may be replaced by the polynomial
R where u
= dz2
= UXU-’
+ wl”
u-l,
(1.6.3)
+ ey2 , w = cy2 + 1.
The source program consists of the data description and the program itself. The data description, in turn, consists of two parts. I n the first part all pertinent polynomials are coded as pseudo-instructions. (This would be unnecessary if alphabetic symbols were accepted by the computer). The second part gives code numbers for polynomials treated as entities; i.e., each polynomial is given a code number serving henceforth as identifier of this polynomial. The sequence of operations to be performed on the polynomials is laid down in the program, formally similar to a conventional program written in machine language; the pseudo-instructions are built up from polynomial identifiers and pseudo-codes of operations performable on polynomials by the autocode. The list of performable operations includes : (1) Replacement of a letter in a polynomial by another polynomial. Let the polynomial P contain nonnegative integer powers of the letter 60
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
q, while the polynomial Q does not contain the letter q at all. Then it is possible to substitute Q into P in place of q. For convenience an additional convention is introduced, viz., the code number of the polynomial Q is identical with the code number of the letter q. Then, if one wishes to replace ql, q2, . . . , q, in P by polynomials Q,, Q2,. , . , &, and the code numbers of letters qi+ and pi differ by one, i t is sufficient to specify in the instruction P, Q1,and Q, only. Such a complex substitution is possible even in the case when P does not contain all of the qi (1 Q i Q n),or when not all of the Qi(1 < i < n ) have been given in the data description, and thus not all of them “exist.” The autocode will choose in this case those pairs (qi, Q i ) for which substitution is possible. Addition, subtraction, and multiplication of polynomials is performed by means of the replace instruction, described above. For this purpose special standard polynomials, A = x + y, S = x -y, and M = xy, are introduced. Replacement may be performed either with, or without, reducing similar terms, since the reduction takes up a long time and is not necessary in intermediary polynomials. (2) Differentiation of polynomials. Let the polynomial R be a function of x,, . . , , x,, t,, . . . , t,, and let xl, . . . , x, be differentiable functions oft,, . . . , t,. With a given table of derivatives
-yik, axi -
--
i = l , 2 , . . . , n, l c = l , 2
, . . . , m,
(1.6.4)
the autocode finds any of the polynomials (1.6.5) i-1
a*R Of course, S is recorded in the form (1.6.1). I n (1.6.5) ---denotes the at, partial derivative with reference to t k entering R explicitly. Besides these two fundamental operations, there are some others, of an auxiliary character. (3) RenamethepolynomialsP,P 1,. ..,P +%by&,& 1,. . . , Q n. (4) Duplicate the polynomials P,P 1, . . . , P n as polynomials
+
+
Q,Q
+ I,, . . ,& + n.
(5) Read polynomials P, P (6) Print polynomials P, P (7) Erase polynomials P,P
+
+
+
+ 1, . . . , P + n. + 1, . . . , P + n. + 1, . . . , P + n. 61
WtADYStAW TURSKI
I n operations 3 through 7 , n may assume the value zero, in which case the operation is performed on the single polynomial P. This polynomial-handling autocode was applied by the original authors to various problems related to partial differential equations of hydrodynamics and to the extremely tedious work of evaluating determinants consisting of alphanumerical elements. Yanenko [69] gives general rules for reducing systems of quasilinear differential equations :
i
= 1, 2,
3, 4,
where a,, . . . ,f, are functions of u,v, and w, to the form of a single quasi-linear differential equation for the function w = w (w,v). These rules were programmed for the STRELAcomputer and many checks were conducted. In [56] an example is given of a set of three differential equations of the form (1.6.6) reduced t o one differential equation in 12 minutes by the computer, using this automatic coding system. Shurygin and Yanenko [56] give a detailed description of the technical features of this specialized autocode, which are of certain interest to students of similar problems, but will not be discussed in the present article. 1.7 Use of Topology in Programming
I n this section we discuss a paper which provides an example of the application of very advanced mathematical apparatus to practical problems of automatic programming. Lavrov [40] considers the question of economy of machine memory in closed operator schemes. Following his argument we shall see how this problem is reduced to a distribution problem, closely related to the famous problem of graph colors. Let us consider two sets of entities: quantities xj and operators S,. There are two possible relations between an operator and a quantity, if they are related a t all. The operator may either use the quantity, or generate it. I n the first case the quantity is called a r g u m e n t of the operator, in the second case the result of the operator. Needless to say, a quantity which is an argument of one operator may be the result of another. Between two different operators the relation called t r a n s i t i o n may occur, in which case one operator is called predecessor, the other successor.
An operator scheme is defined by two finite sets 62
X
.
= {xl, , ,
, x,,}
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
and C = {S17. . . ,Sm) and three matrices A = [aii], B = [b,], and C = [cjk], where i = 1, 2 , . . . , n ; j = 1, 2, . . . , m ; k = 1, 2, , , . , m. The matrices are Boolean with the following rules for non zero elements: aij = 1 if, and only if, Siuses xi. b, = 1 if, and only if, Si generates xi. cik = 1 if, and only if, there is a transition from S, to 8,.
In addition to this we shall consider a few other concepts. A link in the scheme is any pair of integers (j,k) such that (i) 1 < j < m, 1 < k < m and (ii) cjk = 1; apath in the scheme is any ordered sequence jz, j3,. . . ,j,)such that any pair ( j k - l , j k ) of positive integers (jo,jl, with 1 < k < s is a link; thus a link is a special case of a path. An itinerary of the quantity xiis a path (j,,. , . ,j,) in which the operator Sj, uses xi and none of the jl,. . . ,j,- operators generates it. The itinerary is closed if Si, generates xi,otherwise the itinerary is called open, The operator numbered j, is called initial operator of the itinerary, and Si, its terminal. We shall say that the operator scheme is closed if each open itinerary of the quantity xi belongs to a t least one closed itinerary of the same quantity. The ordered pair ( x ~Sj) , denotes the value of the quantity xi before execution of the operator Si;the pair (Sj,xi)denotes the value of the quantity xi after execution of the operator Xi. The pairs (Si,,xi), (xi,Si,), . . . , (xi, Sj, (Si,,xi)are said to belong to the itinerary (j,,,j17 . . . ,j 8 ) . From the set of all 2mn possible pairs associated with the operator scheme we single out a subset, called encumbrance set, which consists of (i) pairs (xi, Si)such that Siis either internal or terminal in a closed itinerary of xi,(ii) pairs (SitXJ such that Sj is either internal or initial in a closed itinerary of xi. At this point the reader deserves a short break in the continuous flow of definitions. This shall be achieved by a few almost trivial remarks. The concepts introduced above bear close resemblance t o the usual notions of programming. Quantity is a generalization of register content; operator corresponds t o instruction; transition is similar to transfer of control; and operator scheme is very close to program. A closed itinerary is a sequence of instructions such that the first one generates the content of a register xi and all but the last of the other instructions of the sequence use the value recorded in the register; in other words, the register xi is occupied by the same value during the execution of the sequence (cf. region of existence, described in section 1.3). If the pair (xi,S j ) belongs to the encumbrance set, then the register 63
WtADYStAW TU RSKl
xi is occupied before the generalized instructions Sj is obeyed. If the pair (Si, xi)belongs t o the encumbrance set, then the register xi is occupied after Sj is obeyed. Now let us consider generalized readdressing. For this purpose we take the scheme C, X, and the set Y = {yl,. . . , y,}. Let M denote a subset singled out of the set of all 2mn pairs of the scheme. We denote by F the function which satisfies the conditions: (i) F is defined for all pairs belonging to M and (ii) the values of F belong to Y. The function F is called readdressing function. We shall say that Sj uses yi, if and only if, Siuses xi and F(xi,Sj) = yi,. Similarly, if and only if Sj generates xi and F ( S j ,xi)= yi/ we say that Sj generates yi,. Of particular interest are functions F which satisfy the additional conditions : For given j, and for 1 < a
((xi,,sj),. . . (zikv sj)} E Jf, < < k F(zia,Xj)+ P(zi,, Sj). 9
If F satisfies this condition we say that the new notation system Y is consistent on input of Si.Similar conditions must be satisfied if Y is to be consistent on output of Sj. We call Y consistent if it is consistent on both input and output of all SiE Z. If M includes all pairs that belong to the encumbrance set, the new notation is called complete. If each pair of M belongs to the encumbrance set we call the new notation minimal. I n other words, for a complete and minimal set, M is identical with the encumbrance set. Finally, if for a complete, minimal, and consistent notation system, and for any itinerary (j,, jl, . . . ,j,)of the quantity zi
the new notation is said to be equivalent to the old one. With the help of the definitions introduced so far we may formulate the following problem: For a given closed operator scheme Z = {S,,S,, . . . ,S,} and X {xl, x 2 , . , . , xl*},find a notation system {yl,.. . , y,,} = Y, equivalent to X, such that for any other notation system {zl, z2, . . . , zn,} = Z which is equivalent to X, the inequality :
n, >,no (1.7.2) is satisfied. It is easy t o see that this problem is of utmost importance in the theory of programming. The most difficult part in the solution of the stated problem is the equivalence criterion. The given definition of equivalence demands that some conditions should be satisfied for all 64
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
closed itineraries. Since the number of itineraries is infinite and they are not easily ordered, the equivalence criterion should be reformdated in such a manner that it would not involve infinite sets. We define the h e w e r of the quantity x, as subset of the encumbrance set, such that each element of the bearer contains x, and all elements of M that contain x, are included in this subset. The bearer of x, will be denoted by M,. Let us consider two pairs m and m’. We shall say that m ,m‘ E M are connected if there exists either a closed itinerary of x, t o which both m and m‘ belong, or a third pair m” E M , that is connected with both m and m‘. The bearer M,is connected if all of its elements are connected. M , falls into subsets (with empty common parts) such that (i) any two pairs of the same subset are connected and (ii) no two pairs belonging to different subsets are connected. These subsets are said t o be regions of activity of x,. If M , is connected it consists of exactly one region of activity; in the general case the bearer M , consists of a finite number of regions of activity. Lavrov has proved the following:
. .
Theorem. The new consistent system { y l , y z , , , yn,) is equivalent to the old one, {xl,x2,. . . , x,} if and only if the readdressing function F is defined on the encumbrance set, and is constant inside of each of the regions of activity of arbitrarily chosen x,. Proof. Since the readdressing function F is defined on the encumbrance set M , the new system Y is complete and minimal. It remains to be shown that if, in each region of activity, F = constant then along any closed itinerary the equality ( 1 . 7 . 1 ) holds true, and vice versa. Proof of necessity. Let the pairs m E M and m‘ E M belong t o the same region of activity, then, by definition, there is a sequence m, = m, m,, . . . , mk = m’ such that, when j = 1, 2, . . . , k, the pairs m j - and mj belong to the same closed itinerary; but since the systems are F(m,), j = 1, 2, . . . , k, which assumed to be equivalent, F(m3-,) : shows that for arbitrarily chosen pairs m and m’ belonging to the same region of activity the value of F is the same. This proves the necessity part of the theorem. Proof of suficiency. Let (j,,,jl, . . . ,j8)be an arbitrarily chosen itinerary of x,. All pairs belonging to it belong to the same region of activity xo,thus Eq. ( 1 . 7 . 1 ) is fulfilled. This completes the proof of the theorem. I n this way Lavrov reduced the problem of equivalence to the problem of determining regions of activity. But this problem may be solved 65
,
WtADYSlAW TURSKI
easily for any operator scheme by means of an algorithmized process. An ALGOLprocedure for this algorithm is enclosed as an appendix to Lavrov’s paper. 1.8 “Philosophy” of Automatic Programming
Soviet academician Sobolev, leader of the Siberian Division of the Mathematical Institute of the Academy of Sciences, has stated [57] that the time is coming when automatically produced programs will in every aspect outdo manual programming. This view, contestable as it is, faithfully represents the “philosophy” underlying Soviet research in this branch of science. We leave the discussion of Sobolev’s opinion to Sections 2.1 and 2.3.1 and restrict ourselves in the present section to an examination of the motivation and perspectives of automatic programming in the USSR. An obvious stimulus for the automation of programming for electronic computers is the ever-growing ratio of the number of computers produced to the number of qualified staff trained. This problem becomes very acute even in the USSR (cf., e.g., [ 3 6 ] ) ,in spite of the fact that a considerable yearly output of numerical analysts acquainted with computers impaired the demand for programmers during the early years of electronic computing. Now, however, the situation is changing. Computers are becoming more and more popular and are entering into so many branches of life that trained mathematicians, no matter how large their number, cannot cope with the programming of computers for various, and quite often small, problems. Besides, it would be a waste of highly skilled labour to use i t for programming on a large scale. Automatic programming makes the programming process easier and quicker for numerical analysts, and facilitates programming of computers by specialists not necessarily having any numerical background -by biologists, chemists, and others. But this is not the only reason for the intensive search for better and better autocodes. A second, and seemingly more important, one is given in Sobolev’s statement. This second reason is the persistent belief that automatic programming may be not only faster but also better than manual programming. Fedoseev [I81 regards the manually produced programs as possessing, inherently and unavoidably, mistakes and errors that need not affect “The procedure is written in the Russian vcrsion of ALGOL,but an enclosed dictionary makes i t possible to read this procedure, except comments, without any knowledge of the Russian language.
66
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
the results of calculation but make the computation process unnecessarily long and complicated. The reason for this is the lack of lucidity of programs thus produced, and the extremely difficult and time-consuming procedures which have to be applied in order t o check the program against the original algorithm; for the generally accepted method of checking programs is either to have two programs doing the same job and compare the results of calculations, or to have one programmer checking thoroughly somebody else’s programs. This should make it clear why almost all Soviet theoreticians consider built-inchecking procedures as a conditio sine qua non of a good automatic programming system (cf. Section 1.2). I n Section 1.5 we considered the automatic programming system proposed by Shreider. He claims that the way leading to better and fuller automation of programming goes through the revision of traditional lists of operations microprogrammed for digital computers. I n the same section we gave some ideas about Glushkov’s proposals. Generally, however, it is assumed that the best path t o the automation of programming is that laid down by Lyapunov and his collaboratorsthe method of programs which program. It is perhaps interesting to analyze the criteria of the “goodness” of PP’s, which are given in [18]. The quali$cation of a P P manifests itself in its ability t o accept as many common mathematical symbols and conventions as possible. A qualified PP should “understand,” if needed, omitted parenthesis, priority of operations, a minus sign preceding an open parenthesis, etc. On the other hand, the more qualified the PP the more should it take advantage of specific features of the computer, use built-in cycles, readdressing facilities, and so on. An important job for a qualified PP is to economize the object codes. This may be done either by scanning through a rough version of the object program, or by picking out similar expressions in the source program. The second way gives better results, though it takes more time. Automatic storage allocation is another fundamental feature of a qualified PP. Speed of P P is not considered an equally important factor, since in most cases programming by PP takes only a tiny fraction of computing time. Certainly, acceleration of the PP a t the cost of reduced qualification ia not worth considering. A very important factor influencing the goodness of the PP is data structure and presentation. Compactness of acceptable data sets is one of the desirable qualities; but more fundamental demand is that the data should be local, i.e., should not depend on the structure of machine memory, levels of storage, and so forth. A good PP allows for rapid changing and replacing of one data set by another. This is essential both 67
WtADYStAW TURSKI
in production runs of the computational program and in its trial runs. Further, a good PP should not impose any restrictions on the length of the resulting object program, except, of course, “natural” limitations, such as machine memory capacity. This implies, among other things, that the PP should provide for transferring previously produced sections of the object program to the backing store. There are two secondary criteria of goodness, viz., length of the programming program andsimplicity of methods used and notation adopted. Several authors emphasize that a good PP should of necessity include self-checking and automatic checking facilities both for translation and actual calculations. (We may mention here that devices like internal parity checking are not common in Soviet computers). Finally, Fedoseev makes interesting remarks on the effects which employment of a PP has on the staff of a computing laboratory. He finds that the use of programming programs leads to natural division of the programmers employed by the laboratory into two groupsprogrammers concerned with problems to be solved by the computer and programmers concerned with the PP. The first group gradually loses interest in technical details of programming and may subsequently become indifferent to questions like the possibility of saving computing time by, say, foregoing the use of some standard subroutines available through PP. The second group may, reversely, lose any interest in the mathematical meaning of the programmed problems and devote themselves completely to the upkeep of the PP. It is perhaps worth mentioning that in Fedoseev’s opinion a PI’ becomes obsolete much sooner than the computer itself, and thus the necessity arises of continuous improvement in the existing programming program. Bibliographical notes. To the best of my knowledge, there is only one paper [I81 in the relevant Soviet literature devoted to a general discussion of methods of automatic programming. Many interesting remarks are, however, to be found in some papers referred to in preceding sections, especially in [ a l l , [19], [53, 541, [27],[38].Also, I would like to bring t o the reader’s attention a textbook on programming [37a, in Russian], which covers much of the pertinent work done in the USSR and contains a rather ample bibliography. Additionally, the following papers may be recommended: [15], [HI. 2. Poland 2.1 Historical Remarks and Underlying Philosophy
In Poland there are two organizations concerned with automatic programming : the Institute of Mathematical Machines and the Compu68
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
tation Center, both being divisions of the Polish Academy of Sciences. These two organizations differ in respect to their roles and t o the scope of their work, as well as in their histories and their general approach to the problem of automatic programming. IMM, which is an offshoot of the Mathematical Institute of PAS, is primariIy concerned with designing and producing electronic digital computers. As a matter of fact, three computers were designed there: SKRZAT,XYZ, and ZAM-2. Several copies of the latter have already been produced by the IMM and are being operated in various computing laboratories; the two former types of computers were rather experimental. The CC was set up in September 1961; a large part of its staff previously formed the Applied Mathematics Department of the Nuclear Research Institute. It is primarily concerned with practical applications of computers in various branches of science and in economics, and operates a number of different computers, including the first URAL-2installed in Poland. The formal division of work does not encompass all the research done by these organizations, since, e.g., one of the first Polish computers, EMAL-2, was constructed under Dr. R. Marczynski’s guidance a t CC, and IMM conducts extensive research in applications of computers t o banking and administration problems. As far as the questions of automatic coding are concerned, both institutions are doing equally pioneering jobs : IMM has constructed three systems of external programming for their family of computers, and CC is working on alphanumeric codes for the Soviet-made URAL-2 computer. On comparison of the work already accomplished by these two institutions and of the attitudes taken by the authors of published papers representing the points of view of the automatic coding groups in I M M and CC, a very curious conclusion can be reached. The IMM people consider the possibilities of automatic programming to be limited by the structure of existing computers and, as stated in [44],are of the opinion that “ . , a practical realization of a fully universal language, independent of all individual machine features, will always lead t o compromises a t the expense of the full economical use of the machine.” I n other words, though realizing the neccessity of the external language to take full advantage of all machine potentialities, they still find that, in practice, automatic coding does not exploit these potentialities t o the maximum, thus leaving it to the manual coding to produce the most economical programs. At the same time, the automatic programming group in IMM does not seem to conduct any research on the problem of how to change the traditionally accepted lists of operations micro69
.
WtADYStAW TURSKI
programmed in the computer in such a way that more economical use of automatic programming becomes possible. The automatic programming group a t the CC, on the contrary, considers the problem of economical automatic programming to be closely related to the list of operations of the computer, and suggests which of the usually microprogrammed operations are useless from this point of view, and which ones are lacking in existing computers. This opinion is to be found, e.g., in [22, 231 and developed in 1211. Since, however, the CC does not, a t present, build its own computers, and existing ones are not quite fit for optimal translation of source programs into object programs, the Center has taken a somewhat unusual approach, viz., a quasi-manual coding technique has been adopted (cf. Sections 2.3.1 and 2.3.2).12 At an International Conference held in Warsaw in 1961, the two groups took different positions and expressed different attitudes towards ALGOLand similar formal external languages. The IMM people, starting from the principle of an ever-persistent tendency t o improve any universal language, rejected the proposed freezing of ALGOL,and did not recognize any need for a uniform external language that may be “understood” by all computers above certain limits of speed and memory. On the other hand CC, though disagreeing with many features of ALGOL,accepted it as a basis for a future external programming system for all computers installed in the Center. Furthermore, the version of ALGOLworked out a t CC as a hardware representation for the URAL-2is intended to become the common programming language for all computers of this type installed in the USSR and other countries of Eastern Europe. I n seeking the reasons for these differences, one easily finds that the IMN, operating only self-made computers and incorporating in any new coding system all systems previously used, has already achieved, in a sense, the desirable state of affairs where any program may be run on each of the computers without any additional recoding. On the other hand the Computation Center, operating many different computers, is in the position familiar to all the people trying to run their programs on different computers without a uniform programming language “understood” by all the machines. I n order to complete the general information about automatic programming groups in Poland, we may add that the Institute of Mathe1aIt may be worth mentioning that some of the gcncral ideas concerning the design of computers suitable for automatic coding are incorporated in the logical design of the ODRA 1003 computer, produced by the ELWRO factory in Wroclaw, Poland.
70
AUTOMATIC PROGRAMMING IN
EASTERN EUROPE
matical Machines is a t present working on a COBOL-like language for the ZAM computers. 2.2 Work of the SAKO Group
In order to distinguish between the two pertinent centers of research on automatic coding in Poland, we shall use the abbreviations SAKO group and KLIPAgroup, as convenient nick names for the autocoding groups working a t IMM and CC, respectively (cf. Section 2.1). 2.2. I The SAS Programming System
The SAS programming system was devised, by the SAKO group led by Professor L. Lukaszewicz in 1959-1960, as an external programming language for the XYZ computer [GI],and afterwards, somewhat enlarged and modernized, adapted for use with the ZAM-2 computer [ I 7 ] . We shall consider this final version only, for the minor changes introduced in it are not important, and any correct “old SAS” program is certainly correct in the new version. The SAS is a typical semi-autocode. It frees programmers from manual allocation of storage space, allowing for symbolic address parts of instructions; the operational part of an instruction is preserved in machine form, i.e., consists of two letters being the abbreviation of the Polish names of instructions. I n addition, SAS allows for automatic inclusion of subroutines, recorded in auxiliary store, into programs, division of programs into chapters, and simplifies the presentation of numerical values to the computer, a feature which is particularly rewarding since the ZAM computer is essentially a fixed-point one, without floating-point facilities other than simplified input of nonstandardized numbers. SAS greatly reduced the number of blunders commonly made by programmers when forced to insert real addresses into the address parts of machine instructions, and consequently reduced the time needed for checking of programs. Furthermore, should any change in a program written in SAS occur, it would not cause readdressing of all instructions (as is necessary when absolute addressing is preserved). I n such cases it suffices to change a small number of symbolic addresses only. SAS uses all but one of the symbols available on standard Creed teleprinter equipment, the omitted one being the f sign.’3 A SAS program consists of blocks, i.e., pieces of program recorded in consecutive “Among the characters available in “figure shift” there are two symbols x and n to be distinguished from the letters X and N .
71
WtADYStAW TURSKI
registers. Blocks may by divided into numerical, containing numbers, and instructive, containing commands. The address of the beginning of a block is the first register occupied by the block. Symbols used to identify blocks and particular registers may consist of a t most four characters, and fall into two categories: variable symbols used to identify blocks of numbers, particular numbers and working cells must begin with a letter; symbolic numbers used at3 identifiers of instructive blocks (or labels) begin with digits or the symbol n. A SAS chapter may be divided into paragraphs, identified by symbolic names consisting of not more than 3 letters preceded by the * sign. Paragraphs are similar t o ALGOLblocks in that symbols used within a paragraph may have quite different meanings in another one. Symbols preceded by the ‘‘ x ” sign are understood as global symbolic numbers, which do not lose their rncaning on passing from one paragraph to another. An instruction takes the form: (operational part)(. or empty) (address part). (operational part) is one of the machine-acceptable symbols for operations. If the addrcss part is preceded by a dot it means that the address is that of a double-length register, there being two word-lengths in the ZAM-2 computer: 18 and 36 bits. (address part) may be either absolute, i.e. an unsigned decimal integer, corresponding to the register number, or relative, preceded by a or “ - ” sign, and understood as the address of a word following ( + ) or preceding ( - ) the instruction in which they occur, or symbolic, in which case it consists of a block identifier supplemented by an integer enclosed in parentheses to denote relative position of the word in the block (if needed). Symbolic addresses that are paragraph identifiers, global symbolic numbers or symbolic numbers, must be followed by right-hand parenthesis. Numerical values may be written in SAS programs in one of the following forms:
“+”
(i) Short integers, i.e. integers occupying half-word registers, are written in the form (sign) (integer), and must not exceed 2I7-l. (ii) Long integers, occupying 36 bit words (one bit being taken up by the sign of the integer) are written in the form: (sign) (integer)D, e.g., 7 5 ~, 2 7 6 5 4 3 4 8 9 ~ and , must not exceed 235-l. (iii) Short numbers, written in the form (sign) (integer) * (decimal fraction) K , occupy half-word registers. If the number is to be recorded in machine memory with the decimal point in another position than that which follows from decimal notation (binary scaling), a corresponding scaling factor is written after the decimal fraction. 72
+
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
(iv) Long numbers are written in the form (sign) (integer).(decimal fraction); this may be followed by comma and scaling factor, if binary rescaling is needed. The long number occupies a 36 bit register.14 In (iii) and (iv), in the case of too many digits, i.e., when there are more significant digits and “scaling zeros” than 17 or 35, the least significant bits are “rounded off.” Binary scaling, mentioned above, is an essential feature both of the SAS and the ZAM-2 computers, since the latter has no floating point arithmetic except interpretative routines. Octal numbers preceded by s sign are interpreted as logical scales (cf. Section 1.1) and thus may be either 6 or 12 digits long, depending on the word-length chosen. AbsoEzcte numbers are decimal numbers preceding instructions and separated from them by a solidus. These are interpreted as internal memory locations in which an “absolutely numbered” instruction is to be recorded. Instructions without absolute numbers are recorded in consecutive locations, following the last numbered instruction. Symbolic numbers are symbolic names given t o blocks and separated from the first instruction of a block by a right-hand parenthesis. If the parenthesis is followed by another one, as, e.g., in 291)) O D . C H ~ , the instruction is located in a half-register with even address; the odd half-register is filled with a “do nothing” instruction. It is possible to give several symbolic numbers t o the same instruction, as in the following example: ~ A R ) ~ K O S ) U512. A The combination ~ ~ ~ / ~ K o L1022 ) o D is understood as the first instruction of a block which has the symbolic label 3KoL and which is to be recorded in registers with absolute addresses starting from 342. The labeling system adopted in SAS is beyond any doubt very convenient, especially when full advantage is taken of paragraphs and chapters, but it imposes extremely heavy requirements on memory space occupied by directories and dictionaries during the translation and, alas, execution of programs. The last thing to be mentioned about the system is the way in which the standard subroutines are called in. This is done by a directive FUN N , where N stands for the number of the desired subroutine. This directive causes the subroutine to be recorded in the body of the chapter in which the directive has been used, starting from the location that “The ZAM-2 computer is an essentially fixed point computer; thus, in machine memory all numbers are recorded as integers. The position of the decimal point and the value of the scaling factor carry information that is used by input (read) routine, and afterwards lost. This information determines how many zeros are to precede first significant digit of the number recorded in the machine memory.
73
WtADYStAW TURSKI
would be occupied by an instruction having the same lexicographic position as the directive FUN N . There are some other directives in SAS, i.e., instructions leading to red-tape operations during the translation process. Their meaning is usually self-evident; e.g., START n / N , CHAPTER N , and so on. We have not given a more detailed description of SAS for two reasons: (i) Section 2.3, describing the KLIPAsystem, covers the same topic in many respects; (ii) a better example of work of the SAKO group is the SAKO system itself. 2.2.2 The Automatic Coding System SAKO
SAKO was constructed a t IMM almost simultaneously with SAS and is used for the programming of the XYZ and ZAM-2 computers, providing for full automation of the coding. It is outside the scope of the present article to give a detailed description of the system, for the manual produced by IMM [45] contains some hundred pages, and the reader is perhaps more interested in general features than in details that may be found in technical papers. Thus, hoping that in case of misunderstandings the original literature (see bibliographical note a t the end of the current section) will be consulted, we shall begin with short examples showing the general structure of the language SAKO. (1) Arithmetic formulas:
x3
= ALFA 1 = 1 + 2
s
=
D =
+ LOG(SIN(X))
+
s A(I,J) x 45,678
B(J,K)
Numbers in parentheses denote running indices of array elements. If, however, the left-hand parenthesis is preceded by a functional identifier (like SIN or LOG), the number in parentheses is interpreted as an argument of the function. (2) Boolean formulas: A E B + C
SDF
0000.4567.3241
X A
(3) Control transfers: JUMP
TO
5F
JUMP ACCORDING TO ALFA
74
TO: 3AB,4,2,5C
AUTOMATIC PROGRAMMING
IN EASTERN EUROPE
this instruction means:
I: REPEAT FROM 3: ALFA = 0 (1) 29 00 TO CHAPTER: QAUSS-SEIDEL RETURN IF OVERFLOW: 4 ELSE 3
These examples15 show the flexibility and potentialities of SAKO. Now we shall consider in some detail the most interesting features of programming in this language. Handling multidimensional arrays. Multidimensional arrays handled by SAKO are called blocks. Each block has to be declared at the beginning of the chapter in which it appears. Block declaration is performed by the statement: BLOCK (n, m, . . . , k, r, t ) : (list). The list consists of the names of all the blocks of the chapter having the same structure defined by the contents of the parentheses. The structure of the block is defined as follows: let there be s numbers inside the parentheses following the word BLOCK, then the block declared is an s-dimensional array. Thus, referring to an element of any of the blocks entering the list, we must specify s subscripts (running indices). The numerical values of the s numbers appearing inside of the parentheses in the block declaration represent upper bounds of the subscripts. Hence, the declaration BLOCK (3, 3 ) : MATRIX A, MATRIX B says that in the chapter where the declaration has occurred there are to be reserved registers for two square matrices of 9 elements each. Elements of the matrices are called for in arithmetic statements by expressions like the following: MATRIX A (j, 2), MATRIX B (1, 2). Letters acting as values of subscripts must be declared as integers. The declaration BLOCK reserves space for arrays that must be filled by either reading in or generating in the course of executing the program. If values of elements of the array are to be given in the program itself, another declaration is used, viz., TABLE (n,. . . , k): (name). This is written '&Polishwords appearing in SAKO statements are replaced throughout the present section by corresponding English ones.
75
WtADY SLAW TU RSKl
immediately before the actual values of the elements are explicitly given, e.g.: TABLE (2, 3) : A 3.7195 45.16 -.876 9.8867
.I867 3456.9
If the structure of the array is to be changed in the course of calculations, a declaration STRUCTURE (i, j, . . . , k, al,a2,. . . , a,,): (list} must be used. In this declaration, i, j, . . . , k are identifiers of the variables (declared previously as integers) and al,a*,. . , , an are integers; the list has the same meaning as in a BLOCK declaration. The new structure is defined by the items in parentheses in a manner exactly similar to the old structure in the corresponding BLOCK declaration, The total number of elements in the restructured array must not exceed the total number of elements reserved for a n array. A special declaration TABLE OCTAL (n,m, . . . , k): (name) serves to introduce Boolean arrays. Boolean expressions. A distinguishing feature of the SAKO system is its ability to handle Boolean words and perform Boolean operations on Boolean words. A Boolean word is a sequence of 18 or 36 binary digits: zeros and ones. I n the first case we speak of short Boolean words, in the second of long ones. The words are represented in the octal system, each octal digit representing a group of three consecutive binary digits. For clarity, dots separating groups of octal digits may be used, e.g., a short Boolean word: 707.512, and a long Boolean word: 76.754.324.765. Boolean words are identified either by individual identifiers, or by subscripted identifiers, when the corresponding word is an element of an octal table. The following operations on Boolean words are available in SAKO: ( 1 ) Renaming of a Boolean word: A f B, where A is any identifier, and B is either an identifier or a correct Boolean expression. ( 2 ) Boolean addition: A = B C , where B andC are Boolean words; this operation is the bitwise addition. (3) Boolean product: A = B x C , where B and C are Boolean words: this operation is equivalent to bitwise multiplication. (4)Boolean negation A = - B, where B is a Boolean word, is the operation defined as bitwise negation of the word B. (5) Cyclic shift A E B * ( N ) ,where B is a Boolean word and N is an integer or integer identifier, is the word A resulting from N cyclic shifts of one position to the right, if N > 0, or to the left, if N < 0, of the word B. 76
+
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
A correct Boolean expression is formed by application of any number of operations (2) through ( 5 ) . Of two operations in a Boolean expression, that one which is enclosed in a greater number of parentheses is executed first. I n case of an equal number of parentheses, the operations are executed with the following priority: (1) shifts, (2) products, (3) sums and negations, Operations enclosed in the same number of parentheses and of the same hierarchy are executed in sequence from left to right. To illustrate these rules we take the following example [a#]. We require the formation of the Boolean product of positions 0 through 9 of the word n and positions 10 through 19 of the word C , add to this the negation of positions 20 through 29 of the word D ; the result is to be stored in positions 26-35 of the word A , the remaining positions of the word A being filled with zeros. This may be achieved by executing formula: A (B * (20) x c * (10) ( - D ) ) * (6) x 000.000.001.777. Additionally, it should be noted that any machine word, hence any number, may be treated as a Boolean word and vice versa. This allows for many simplifications customary to programmers working in machine code and inaccessible in a great majority of autocodes. In SAKO, Boolean words are not used for control transfer operations, as they are in ALGOL. Input-output operations in the system are conceived to facilitate the programmer’s work to the utmost. There are special instructions for reading single numbers, blocks of numbers, captions, octally represented Boolean words and arrays, and so on. The same applies for printing routines, which allow for preparing tables, octal output, and captioning of numerical material. A program written in SAKO is divided into chapters which facilitate the execution of long programs, exceeding the capacity of the operational memory. Subroutines and library routines are treated in SAKO in the usual way; many concepts of ALGOLprocedures are incorporated, including the formal and actual parameters correspondence rules. There is, however, one important programming device used in the system that bears no resemblance whatsoever to ALGOL,viz., the SUBSTITUTE instruction. Subroutines are called in by instructions of the form:
+
(actual identifiers of results) = (subroutine identifier) (identifiers or numbers serving as arguments). 77
WlADYStAW TURSKI
An instruction of the form: SUBSTITUTE : {subroutine identifier) (partial list of numerical values or identifiers of actual parameters serving as arguments) causes some of the formal parameters to be replaced by the values of some of the actual parameters. Those positions on the list which are occupied by the formal parameters that are not to be replaced by actual parameters on execution of this instruction are left empty; technically this is achieved by dots being placed instead of identifiers. Now, when the subroutine call is to be made, the relevant list should have empty (dotted) positions in place of the parameters already inserted by the SUBSTITUTE instructions. The practical value of this trick becomes clear when one takes into account the fact that a subroutine may need to be called by another subroutine, while actual parameters are generated by the main program, and formal parameters (to be replaced on calling by actual ones) are essentially local to the subroutine body. An example will perhaps add clarity to the above. Consider a subroutine declared as follows:
SUBROUTINE: (u, v) = TRANS c = cos (ALFA)
(x, Y , ALFA)
s = SIN (ALFA) u = x x c + Y x s v = - x x s + Y x c RETURN and a program in which irrelevant instructions are replaced by bars:
SUBSTITUTE: TRANS (., H,
--
.)
I I_____
(B, C) = TRANS
I-
RETURN
78
I
( 3. 456, . , D) I
SUBROUTINE: ( u ,
-1
v)
$ I
-- -- - -
i.1
= TRANS
!
J
(x, Y , ALFA)
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
If the (B, c ) = TRANS (3.456, . , D) is used inside of a subroutine, the value of H, computed outside of this subroutine, is inaccessible, thus the only way it as an actual parameter is to use the SUBSTITUTE instruction. Before closing the section devoted to the work of the SAKO group we will say a little about the translation of programs written in the language of SAKO. The following information is given by Swianiewicz and Sawicki [62]. The translation of both SAKO and SAS programs takes place essentially in two stages. The first stage consists in translating the program into a special simplified version of SAS, the socalled SAS-W language. The second stage is the translation of the intermediary code thus produced into a machine code. During translation the SAKO program is read into the machine “one sentence after another.” After the sentence has been read in, it is identified and the pertinent translator subroutine is activated in order to translate the sentence into SAS-W (leaving the address part in symbolic form). If the SAKO sentence is labeled, the label is changed into a symbolic number (cf. Section 2.2.1) and recorded in a dictionary, Numbers appearing in arithmetic formulas are translated into binaries. At this stage of translation standard functions, subroutines and library routines are selected (or read in) and a list of them is attached to the SAS-W program chapter. Thereafter, a complete SAS-W chapter is sent to the drum memory. Then, the entire chapter is considered for a second time, symbolic addresses are erased and replaced by real ones, subroutines and functions subjoined, and the resulting program chapter transferred to the drum. In the translation of arithmetic formulas great care has been taken to optimize the resulting code, and it is stated [Z] that though the optimization method used has some shortcomings resulting from the need to simplify the translator, a very significant shortening of the resulting program has been achieved, This high degree of optimization achieved by the SAKO translator is, unfortunately, paid for by low speed of translation. In another respect, viz., the length of the translator itself, the approximately 5,000 half-words occupied by the translator (i.e., just about 1/6 of all the external storage available) are not a real limitation imposed on programs to be run on the computer, since i t very seldom happens that a long program, or programs using voluminous data, are written in SAKO and executed in translated version a t once. Most commonly such ((long” programs (if written in SAKO) are first translated and printed by the computer in machine code or SAS-W, modified or adjusted “manually)’ (if necessary), and only then fed into the computer 79
WCADYSCAW TURSKI
a second time for production runs; and now the full SAKO-translator is not necessary. Bibliographical notes. A short but thorough description of the SAKO system is to be found in [44] and [as],both papers written in English. At the Warsaw Conference on Automatic Programming, held in 1961, several other papers [ 2 , 62, 631 concerning SAKO were presented, and a limited number of copies of the English version of these is still available on request from IMM. I n addition, the IMM has published two Polish reports [I71 and [61] about the implementation of external programming languages on the XYZ and ZAM-2 computers. Arithmetic formula translation is described in [ 2 ] ,and the use of subroutines is explained (briefly) in [49]and in full length in [63].Assembly routines for SAKO translation are described in [62]. Finally, for the fullest description of the SAKO system, [45] should be consulted. 2.3 KLIPA 2.3. I The Pragmatic Approach
The Computation Center of the Polish Academy of Sciences originated from the acute demand for large volume computation, posed by the Institute of Nuclear Research of PAS and other divisions of the Academy. The CC was set up on the basis of a URAL-2computer, which a t the time of its installation was the biggest and the fastest one in Poland. Like most of the Soviet-made computers, the URAL-2does not possess any facilities for alphabetic input, the only standard input being the octally punched film tape, operated on the closed-loop principle reminiscent of the external memory used for some of the early I B M computers. 16 When reading this and the following section, the reader should bear in mind two facts: The programming staff of the CC was trained for coding in the internal (octal) language of the URAL-2 computer; and from the first days of its existence, the CC was overloaded with a continuous flow of orders for computations that had to be carried out. These facts are familiar to many Computing Laboratories, both East and West, and thus the programming policy followed by the CC is, perhaps, a typical one in such a situation. It starts from two principal premises: that programming in internal language is slow and a p t to cause many hard-to-discover errors, and that any automatic programming system requires many long preparatory steps and that the 1% order to permit input of alpha-numeric characters, a paper tape reader was attached to the computer at CC.
80
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
translation from programs written in an autocode is likely to upset tightly tailored production schedules. Therefore, a pragmatic decision was made: to construct a coding system which would reduce the most mechanical portions of the programmer’s work and would not increase substantially the input time for programs. The coding system worked out at the CC, the KLIPA,is the practical outcome of this decision [22, 231. As far as the single instruction is concerned, KLIPAdoes not change the appearance of the operational part of an instruction. It is preserved in machine language, i.e., consists of two octal digits. There seemed to be no important reason for changing this form of the operational part, especially since all the programmers knew these forms by heart. The address part, on the other hand assumes the fully symbolic form of subscripted variables. The only limitation imposed is that the subscript, enclosed in parentheses, may be a linear function of one variable only. Hence, r, kappa (5x + 73), and april (23) are correct addresses, while pi (sigma + rho) is not. A KLIPAprogram is divided into sections of arbitrary length. The interchange of sections, i.e. the replacement of a section just executed by another, currently located on the drum, is done automatically, with the substantial additional feature that in this section interchange, as well as in other transfers between different levels of storage, special “programmed checks” are provided t o eliminate possible errors of the type usually discovered by the parity checking (which is absent in the URAL-2computer). Another function of KLIPAis the assembling of a section from separate pieces; this is done during the input of a program. Standard functions may be either added t o the section body permanently, or called in, through a buffer area of core memory, any time the function is called for. This alternative way of using standard functions enables the programmer to f0bW successfully an optimization policy with respect t o economy of locations and speed of calculations. Calling for the standard functions is accomplished by a simple jump instruction (the operational part of this instruction is 22) as in the following examples : 42 x 42 x 22 sin 22 rq 56 y 22 sin 56 Y
which are equivalent to the ALGOLstatements: y := sin
(2);
y := sin (sqrt (2)).
81
WtADYStAW TURSKI
Special provision is made for the use of library subroutines, which may be either included with one of the sections of the program or stored on separate sections of tape. Labeling in KLIPAis done by preceding the instructions to be labeled by a symbolic label, e.g.: :kappa) 02 x.
+
Labels of the form: (label) (octal integer)) are acceptable and have obvious meaning. All identifiers, i.e. variable names and labels, should be declared; this means that numerical (octal) values of the identifiers should be given in the form: x : = 20, kappa : = 7032. This information is used by the translatbr to replace symbolic names by their numerical values during the translation process. Hence, with the declarations given above, the instruction 42 kappa (x 2) will be translated as 42 7054. There is a possibility of chain declarations, like:
+
x := 5 kappa := 4x 3 june := kappa (12x
+
+ 5).
Labels should be declared in a similar manner, with two exceptions: (i) Labels for indicating points to which control is to be transferred may take the form of primed letters: a’, b’, . . . ,z’, and in such cases no declaration is needed. (ii) There is a special label “pocz” (short for a Polish word meaning “beginning”) which is automatically declared as the smallest possible label in the sections of the given program; thus all labels may be made relative to this particular one by means of a sequence of chain declarations, and hence any absolute addressing may be avoided altogether. This system of labeling and declaration may look a little peculiar, and a few words are needed to explain why there is no fully automatic storage allocation or labeling in KLIPA. The main reason is that KLIPAwas devised to facilitate the work of skilled programmers who use in their everyday routine work methods of absolute addressing and know how to take full advantage of these methods. Thus, any automatic storage allocation would appear to them artificial, and perhaps wasteful. Now the KLIPAlabeling and declaring system gives t o a programmer all the possibilities which the absolute addressing could have given, and in addition it simplifies to a large extent the actual work to be done. By preserving the freedom of applying all the tricks programmers knew, KLIPAwon their confidence, while i t is well known that persons used to programming in machine language usually mistrust all forms 82
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
of automatic programming. On the other hand, the special label “pocz” makes it possible to introduce to a newcomer a completely relativeaddressed programming system, thus avoiding much of the laborious task of checking and changing absolute addresses. While single instructions in KLIPAare simple and do not differ very much from machine language instructions, a sophisticated set of redtape subroutines for KLIPAwas devised to facilitate the programmer’s work as much as possible. Moreover these unspectacular routines were largely written with future ALGOLimplementation in mind. It is hoped that most of the existing red-tape operations will be included into an ALGOL compiler without many changes. Translation of the KLIPAprograms is performed on input, and with a speed limited only by the paper tape reader capability t o read 400 characters per second. There is no necessity of reading in an entire line, followed by decoding, identifying, and storing of the recoded information. All these stages of translation are performed by the KLIPAtranslator as soon as a single character is read in and, as a rule, the translator processes (in the sense that a translator is supposed to) the information carried by a character before the next one can be sensed. On the average it is felt that only if tape readers were faster than 1000 characters/second, the translation would be limited by the programming. This relatively high speed of translation is due to two basic principles incorporated in the KLIPA:(i) use of the characteristic function method for identification of symbolic names, and (ii) the machine instruction 30 a, which permits quick realization of multiway switches and which has been widely used. Since the former reason is discussed in the following section, we shall now devote our attention t o the latter. The instruction 30 a,where a denotes any octal integer not exceeding 7777, is obeyed in the following manner: the sequence of two instructions 30 a
I, results in executing instruction I , which is composed of I , and of the content of the register whose address is a. The composition of I , is described by the formula I , = I , (a),where (a)denotes content of register a, and both (a)and I , are treated as octal numbers (including the two most significant digits forming the operational parts). Hence, e.g., if a = 7000 and (7000) = 00 0002, the sequence 30 7000 42 0030 is equivalent to 42 0032. 83
+
WtADYStAW TURSKI
A multiway switch in the translator may then be arranged aq follows. Let the register a always be loaded with the character read in most recently. Then the sequence 30 22
u
k,
where 22 is the unconditional jump instruction, results in a jump to either of 32 registers following the kth (32 is the number of different binary representations of characters available on the 5-channel paper tape). It is easy to see how this may be used for checking the formal correctness of the programs and for interpretation of various symbols appearing in the program. 2.3.2 The Characteristic Function M e t h o d
In many problems of non-numerical data processing, such as construction and use of compilers, translators and other types of autocodes, business data processing, mechanical translation, etc., there arises the problem of identification of alphabetic inscriptions coded by means of paper tape (or card) punching devices. Quite often the only identification needed consists in assigning to each inscription a particular integer, which may be thought of as the location address where the relevant numerical information is stored. For example, in autocode practice the information stored may be the address of the memory cell assigned to a given alphabetic variable, or, in some cases, just the current value of the variable itself. I n natural language translation, the numerical information stored in the location whose address is associated (by the identification procedure) with a given inscription (i.e., with a particular word of the text to be translated) would be, perhaps, composite; it would contain coded information about the grammatical structure and properties of the inscription (e.g., tense, mode, and aspect of verb), a “signal’) bit indicating whether the word is unambiguous (in the sense of not having homonyms) and the address of the location of the corresponding word of the language into which the translation is carried. These examples explain what is meant by “identification” in the following part of the section. It is perhaps worthwhile to mention that the identification procedure suggested is especially powerful when combined with a method analogous to the one described at the end of Section 2.3.1. For the sake of simplicity, we confine ourselves to inscriptions coded by means of standard teleprinter equipment operating on five channel tape. Thus any inscription is a sequence of rows of holes, the number of 84
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
holes in a row varying from zero to five. Each row of holes may be thought of as a binary representation of an integer in the range 0 through 31. On the other hand, these rows may be considered as digits in 32-basis-notation system. Hence, each complete inscription may be interpreted as a number recorded in this system. Converting these numbers into the familiar decimal (or binary) notation system we would obtain a natural identification rule: Each inscription is assoeiated with a store location whose address is given by the decimal (or binary) interpretation of the teleprinter-coded representation of the inscription. For example, the inscription “us,” coded by a teleprinter working in the Second International Teleprinter Code as 00 0.
0 . 0.
. .
(where punched holes are denoted by circles, and nonpunched holes 20 = are represented by dots) would be associated with the 28 x 32 916th store location, and the inscription “sin,” coded as
+
0. 0..
.o
..
0..
00.
+
+
would be associated with store location 20 x 3ZZ 12 x 32 6= 20,870. However, this natural interpretation rule would give rise t o excessive memory requirements, since for identification of inscriptions belonging to a vocabulary consisting of a t most j-letter words a storage of 32j locations would be needed. To avoid this, two simple methods may be used. The first reduces the number of letters in an inscription used for identification purposes, which decreases j . The second consists in a kind of contracting procedure applied to the results of the natural rule. I n natural language translation the first method is unacceptable since cutting off part of a word may lead to considerable misunderstanding, I n automatic programming both methods are useful, and in fact both were applied in KLIPA. Before considering in some detail a variation of the second method, called characteristic function method, let us imagine that we have applied the natural rule to a vocabulary of the following structure: (i) There are no homonyms in the vocabulary. (This is not a substantial restriction and may easily be overcome.) (ii) The words in the vocabulary have a t most j letters. (iii) Words are divided into groups in accordance with their length. 85
WtADYStAW TURSKI
Thus, there would be j groups of words, Gi, i = 1, 2 , . , . ,j of n, elements (inscriptions) each, the total being the number of inscriptions in the vocabulary. The application of the natural rule would result in plotting the inscriptions along a ray. Inscriptions of the Ci group would belong to the segment cri - p;,where a,'= 32i. Now, let us define for all integers from 0 to 32i a function
Cni
p ( f ) = 1 if there is an inscription associated
with this integer 0 otherwise.
=
5'
(2.3.2.1)
Introducing a continuous variable x, we define p
(x)= ~(5')
+ 4.
(2.3.2.2)
f o r i = 1 , 2 , . . . , j.
(2.3.2.3)
f -4 Q x < f
for
It is easy to see that pi= -
s* P
(2)
dx
o u c 1i - ~ -
oi-1
The pi may be considered as a measure of filling the memory with vocabulary items, and the difference 1 - pi measures the relative number of wasted registers. Our aim is t o find a transformation of the variable x, y = @ (x), which satisfies the following conditions: (i) If x is an integer, y should be in integer too (ii) If X l f 2 2 , y1 = @(XI) # @(G)= 92 j
(iii)
C (1 - iji(y)) = min, where
i-1
.;;ikqx,, d @ ( x ) (iv) The function
Q,
I0 3 0 A\
should be easily performable by computers.
It has been found experimentally that a linear transformation of the form y = aix + ci, with coefficients a,, ci suitably chosen for each
Gi,may be used to advantage. On considering the choice of the coefficients two important cases are to be distinguished. Let all the integers 6, a i - l < f < oi,such that p (6) = 1, be subscripted in ascending order fI, t2,. . . , En,. Let p denote the greatest common divisor of all positive integers S,, defined by
8, 86
= &+I
- 57,
r = l , 2 , . . . , ni - 1.
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
The cases to be distinguished are (i) p > 1, and (ii) p = 1, i.e., the case of mutually prime numbers 6,. I n the case (i) we define a, = l/p and, by the “$xed point division” transformation y = xlp, we reduce the case to (ii) and considerably increase pi as defined by (2.3.2.4.) I n the case of mutually prime numbers 6, we do not apply any multiplication, i.e. a, = 1, but we try to pick a value ciwhich shifts the segment ui- oileft so that occupied positions of the segment coincide with empty positions of preceding segments. It may happen that we are unable to find enough “free” spaces to reallocate all the occupied locations of the segment ui- ui;nevertheless, in practice it is nearly always possible to shift the segment left by a number of locations. Furthermore sometimes it is possible to find a quasi-common divisor, i.e., a number p* such that for all tiE ( u ~ -ui) ~ ,we have &/p* = p i E, where pi is an integer and E is the machine-representable unrounded remainder. Then, a transformation y = x/p* - E will be very suitable for reducing memory requirements. The transformations described here may easily and rapidly be performed by a computer on any inscription belonging to the vocabulary for which the coefficients have been computed, i.e., the method is applicable t o $xed dictionaries only, since addition of a new word (inscription) brings about a re-evaluation of all coefficients. Another restriction is imposed by the necessity of the inscription being read in completely before the identification can start. Another variation of the method, viz., that of characteristic functions, allows for the identification procedure to be carried out simultaneously with reading in of the inscriptions. This variation has been employed for KLIPAtranslations. Consider the first n letters of a given inscription. Letters n 1, n + 2, . . . are disregarded. If an inscription contains fewer than n letters, the following procedure is interrupted as soon as the last letter is sensed. As soon as the j t h letter of the inscription (treated as j t h digit of the “base 32” number corresponding to the inscription) is being read in, the computer evaluates:
+
+
fj* if
:= a&,
+ xj + cj;
fj* > M then fj :=A* - K else fj :=&*;
For a set of inscriptions used by the KLIPA language the following experimentally chosen values of constants have been found to satisfy the minimization demand imposed on (2.3.2.4): n = 4, M = 255, K = 179, a, = 2 for i = 2, 3, 4, c1 = 0 , c2 = 20, c 3 = - 58, c4 = - 25, fo f 0. 87
WtADYStAW TURSKI
Using this transformation it became possible to reduce the number of wasted registers to 20 per cent. For the remaining waste we are amply rewarded by the impressive speed of identification of the inscriptions that are used in the KLIPAprograms. BibEiographicaZ notes. A thorough formalized description of the KLIPAlanguage is given in [ Z Z ] . Some, perhaps the most interesting, features of the KLIPA translator are to be found in [23]. Paper [20] explains the symbolic addressing method for the ENAL-2 computer, and in a sense may be considered as the source of the methods used in the KLIPA language. General considerations on relations between programming and computer structure are contained in [23] and developed in [21]. Papers [65, 661 include, among others, an exposition of KLIPAgroup views on the value of automatic coding for business and scientific applications of computers. 3. Other Countries of Eastern Europe
I n the present section we shall give the reader the opportunity to get acquainted with a handful of results obtained in various countries of Eastern Europe. The variety of subjects to be mentioned in this section makes it impossible to give a consistent introduction t o the section; thus we present the following three subsections as separate expositions of fine pieces of work. 3.1 Kloutek-Vltek Symbolic Language for Economic Data Processing
It is a well-known fact that economic1’ data processing is greatly handicapped by lack of a suitable external language of programming. Attempts undertaken in Western Europe and USA have resulted in languages like COBOL or NEBULA,which (thanks to their inherent speech-like characteristics) are of considerable merit, especially for nonmathematically minded people. They do not, however, provide the rigorous notation needed for mathematical development in this field. Two Czechoslovak authors, J. KlouEek and J. VlEek, have proposed [33] a synibolic notation with which mathematicians working in the field of data processing could readily familiarize themselves, and which at the same time may be comparatively easy to learn for economists, accountants etc., though not as easily as, say, COBOL.The K-V notation is, to a certain extent, similar to that of formal logic and operator ”For brevity we use the words “Pconomir data” to reprrscnt data relating t o the national economy, business operation, and others having similar structure.
88
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
calculus. From the automatic programming point of view, the K-V system is closely related to the symbolic operators method of programming (cf. Section 1.1) and construction of the corresponding PP’s should be fairly easy; however, no hardware representation of the K-V system
is known to exist. The K-V notation system, or K-V symbolic language, KVL, is particularly applicable to all problems of selecting, combining, and rearranging of economic information items and sets of such items. I n other words, KVL is a language for description of data preparation, i.e., procedures that precede the actual calculations. First of all, let us define the various forms of economic information: Elementary economic information is an item of information that cannot be further subdivided without losing significance. Examples of elementary economic information are number of employees, number of hours worked, etc. The elementary economic information has no inherent meaning and may be related t o many other items of elementary information, e.g., number of hours of work a particular employee, or number of hours of work per month for a given factory. Two or more items of elementary economic information related t o each other and reflecting a concrete economic phenomenon are said to form a compound economic information item; e.g., the compound economic information consisting of the elementary economic information items: number of employees, hours of work, and total production output, corresponds to an economic phenomenon: number of hours spent by an average employee on one unit of production. Finally, an information assembly is a set of compound information items. The concepts of elementary information item, compound information item, and information assembly correspond t o what are sometimes called item, record, and file, respectively. We shall use small letters to denote elementary economic information items, bold face small letters to denote compound information items, and bold face capitals t o denote assemblies. We define the following obvious relations between two pieces of elementary economic information a and b: a > b, a < b, a = b. Symbols denoting compound economic information may be subscripted in order to show a particular arrangement (and number) of compound information items into an information assembly; e.g., qi for i = 1 , 2 , . . . , 1000 signifies that the economic phenomenon described by the compound information qi occurs 1000 times, and the information assembly thus defined comprises 1000 cases. Operations performable on economic information resemble operations 89
WtADYStAW TURSKI
with logical classes and may be denoted by the same symbols, with necessary modification of meaning of the symbols, which follows from the dynamic nature of the relations: they do not reflect established facts, but become true by virtue of the operations performed in accordance with the relation symbols used. The fundamental operations performable on economic information are : (i) Comparison, performable on two items of economic information of the same form, is denoted by 1, e.g., a 1b. (ii) Forming a higher form of economic information from lower ones is associated with the symbol E, e.g. q, E A means that the assembly A is formed from the compound information items qi. (iii) Extracting a lower form of economic information from a higher one is denoted by n, e.g., qi n A means: the compound information qI is extracted from the assembly A. The derived operations performable on economic information are either sequences of the operations defined above, or belong to one of the following: (iv) Marshalling of an assembly according to the elementary information contained in compound information that form the assembly is denoted by j(A), where j is a particular elementary information. The compound information is marshalled within the assembly in accordance t o specified, usually numerical, rules. (v) Ordering is a special kind of marshalling and is used to arrange the compound information forming an assembly in ascending or descending order with regard t o the numerical values of certain elementary information contained in the compound one. This is written symbolically in KVL as p+j(A) or t . - j (A), depending on the ascending or descending order desired. The plus sign as subscript may be omitted. (vi) Joining two or more items of information of the same form into a new item of information of the same form as the original ones; the associated symbol i s u . This operation is not performable on elementary information. (vii) Inclusion of compound information into an assembly is denoted by C, e.g., qi C A. There is an opposite operation, called (viii) Exclusion: A 2 9%. In addition to the above, the KVL uses the following symbols:
[A] Denotes an ordered set of elementary economic information items. A Denotes the complement of the assembly A. 90
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
Is used as “short-hand sign” for the words: “generates,” “results, ’’ etc, { } Curled brackets are used in order t o explain the preceding operations in more detail. The symbols =, , - , ., : are used in their general sense. =>
+
Now let us consider a simple example. We shall write a KVL program for processing the assembly A formed by n items of compound information qi (i = 1, 2, . . . , n). We would like to obtain as output the following data-: (1) the compound information that contains the smallest elementary information b, (2) the greatest elementary information c from each of the subassemblies created by marshalling A according to a, and (3) the sum of all items of elementary information c contained in these subassemblies. The sought-after quantities may be denoted in KVL as: q { b = min}, max c { a }, c { a }; qi = (a, b, c ) . The problem is solved in the following manner: (1) qi € A ; (2) a (A) => [A]; (3) [A] 3 Mjj= min a, . . . , max a ;
Marshalling according t o a. The ordered assembly A is divided into subassemblies
Mj .
(4) P+
b(Mj) a [Mjl ;
{qkE
qk.j
LMjl}
Subassemblies are ordered with respect to b. The first (smallest) item of information b is extracted from all the subassemblies. From the extracted items the smallest b is determined.
( 7 ) q,, n [A17 ( 4 . j
=
min);
The compound information q with the smallest b is extracted from [A]. Item c is extracted from ali the compound information q which belong t o all M. This information is ordered into independent subassemblies (in which “compound information’’ consists of single elementary items!). 91
WCADYSCAW TURSKI
( 1 1 ) c 1 n [Cj]9 max cj;
(12)
e c k , j
k-1
The first (greatest) element is extracted from each of the ordered subassemblies C, and the greatest of the items thus extracted is found. The sum of the extracted items c in each subassembly is evaluated.
=fj;
KVL, as may be seen from the above, does not conform to popular demands for a business data processing language generally understandable on the executive or managerial level. Moreover, even from a purely mathematical point of view some improvements, giving more flexibility, are urgently needed. Nevertheless, KVL should be regarded as a first successful step towards a formalized external programming language for economic data processing. It may be hoped that discussion and cooperative effort of interested persons will help to build a better language, which will remove many of the difficulties commonly encountered in this field. 3.2 Sedlbk’s Program for Investigation of Solutions of Differential Equations
There are numerous methods for the numerical treatment of ordinary differential equations, and perhaps no universal rule for an a priori decision as to which of them is to be used for a particular set of equations. On the other hand the decision-making process in this case is rat,her simple when only it0 logic is considered, and becomes lengthy and tedious only when all calculations necessarily involved are taken into account. Thus, naturally enough, it is desirable to program this procedure for a digital computer in order to enable the mathematician operating the computer t o arrive a t correct decisions quickly and safely. Such a program, supplied with some auxiliary subroutines not directly connected with the problem of finding the best numerical method, has been developed by J. Sedlik [52] and used by the staff of the Institute for Mathematical Machines in Prague on the SAP0 digital computer. It will become clear from the remainder of this section that Sedl&k’s program is practically machine-independent and thus may be used on other types of computers. The program is used in order to investigate the solution of the set of differential equations yr’ (2)==fr[2,y1 (4,Yz (XI,* with initial conditions 92
. - , y8 (41,
= 1, 2,
*
*
,
(3.2.1)
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
yr
( ~ 0 = )
Yro,
r
= 1, 2,
-
* * 9
(3.2.2)
8.
The solution of (3.2.1) is desired in the form of a matrix yri, r = 1, 2, . . . ,s, i = 1, 2, . . . , n ; in which the ith row corresponds t o values of the unknown functions yTassumed for the value xi of the independent variable x; xi E I,, where I , is known as the range of integration. We shall divide the methods of numerical integration into two classes: (i) Direct methods (step-by-step methods), i.e., those methods in which, in order to obtain yTi (r = 1 , 2 , . . . , s), information about only yr.i- is used. The best known method of this class is that of RungeKuttal*:
k81/2),
(3.2.3)
are used in order to obtain values of yri. There are two subclasses of the indirect methods. Let us consider, e.g., a method due t o Stormer and Adams :
Formula (3.2.5) defines the S-A extrapolation method, and (3.2.6) the S-A interpolation method. SedlBk’s original program made provisions for methods (3.2.3), (3.2.5), (3.2.6) and some additional ones. Explicit knowledge of methods that may be applied t o a given set of equations (3.2.1) is not necessary, since i t is supposed that the computer is supplied with all the subroutines necessary t o perform calculations according to the pertinent scheme. lBFormula(3.2.3) gives the method in its classical version; in contemporary computation practice Gill’s modification of the process is generally preferred.
93
WLADYSLAW TURSKI
The programmer has t o prepare input data consisting of one code word 01 representing in a symbolic manner the desired order of integration methods, and of several code words g i specifying in symbolic manner the desired output quantities. The code word 01 assumes the form
1- I I k l S!Vl’
(3.2.7)
consistent with the computer word structure in SAPO. All letters in (3.2.7) are to be replaced by binary numbers with the following meaning : Number of steps to be integrated by the indirect method Symbolic number denoting the type of the indirect method Number of steps t o be integrated by the direct method k Number of equations in (3.2.1) s Special variable: v = 1 when the integration step length h is v predetermined by the programmer, v = 0 if h is to be chosen by the computer itself. The code words gj have the form m z
In order to explain the meaning of (3.2.8)we consider the integration procedure as divided into j stages, each stage differing from the others with respect t o the desired output data; e.g., for the first five steps we would like to have all yviprinted, for the next ten steps only yBy,i,and so on. Now, let tj denote the number of the last step of integration belonging to the j t h stage. A finite sequence of integers ti - tj - consisting of as many numbers as there are different stages may serve a dual purpose: first, it determines the length of each stage; second, when the words gj are located consecutively in machine memory, these integers determine which stage is being executed a t any given time. The binary integer d, in (3.2.8) is a symbol indicating which particular quantities are to be printed out during the j t h stage. I n Sedlbk’s original program, dj = 2 calls for output of yTi ; dj = 4 calls for nyri, dj = 6 causes printing of yri and y:i, and so forth. The integers p j in (3.2.8) denote the over-all number of quantities t o be printed during the j t h stage. In addition to the code words g, and 01, the programmer has to prepare a subroutine for evaluating the right-hand expressions of (3.2.1) and specify either a fixed h or a tentative initial h and the desired accuracy E . The initial values (3.2.2) form the last part of the input data. 94
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
All the input is to be recorded in fixed store locations, except, of course, the subroutine for the evaluation of the right-hand expressions, which has fixed beginning only. Sedlik’s program includes a special subroutine for changing and choosing the step length h in such a manner that it will satisfy the desired accuracy requirement given by E . Since the subroutine is based on a “double-the-h-repeat-calculations-and-compare-results”process, i t is essentially independent of the integration method used and thus may be called in from any arbitrary point of the program, as produced by the compiler. For a student of automatic programming the interesting part of Sedlik’s program is an assembly routine, which on the basis of the sequence of gj and 01 words prepares the program. For brevity’s sake, we cannot give many details of this routine. The interested reader is advised to consult the original paper [52], where complete flowcharts are included. We shall only point out some remarkable features of the Sedlik program which make it very convenient for the purpose stated at the beginning of the present section. First of all, the program is self-restoring. This means that, once the equations (3.2.1) are specified by input of right-hand-subroutines, any number of different 01 and gj may be executed without additional changes. In other words, the data tape may consist of many blocks, say, OIl{gj}, 01, {gi}, . . . ,thus permitting comparison of accuracy obtained and time consumed by various methods and combinations of methods. Another interesting and important feature of the program is that it is ‘(equation independent.” This means that, once the program is fed into the computer, it may be used to investigate many different sets of (3.2.1). This is achieved by changing not only 01 and {gj}, but also the subroutines for the right-hand expressions. Finally, let us observe that Sedlik’s program is equally handy for investigation and for actual calculations, hence it is possible to proceed to integration over long range immediately after the method is experimentally chosen. Concluding this section we point out that Sedlik’s program may be compared with the work of Stognii [58], mentioned briefly in Section 1.5, and seems to originate from the same theoretical trend, viz. ideas supported by Glushkov’s (Kiev) school of automatic programming. 3.3 Kalmir’s Detachment Procedure
In this section we discuss an exceptionally interesting procedure due to the Nestor of Hungarian computer people, Lisz16 Kalmir of Szeged. 95
WtADYSlAW TURSKI
The procedure was devised in order to simplify translation of arithmetic formulas into M-3 computer code. M-3 is a two-address computer, with a peculiar operation list which for a given arithmetical operation 0 presents four different machine instructions : 0
0 a b
1 2
0 a b 0 a b 0 a b
3
bOa+r,b; bOa-tr; r@a+r,b; r@a+r;
(3.3.1)
I n (3.3.1) a and b are addresses of operands, r denotes a special register of the arithmetical unit, and + means loading the registers indicated on the right-hand side with the result of the operation stated on the left. Thus, e.g., b 0 a + r , b means to load both r and b with the result of the operation b 0 a. Not all arithmetic operations are available in all four varieties (3.3.1), but by using the unconditional jump instruction 2 J a b with meaning: “r -t r, b; jump to a”, we may form pairs like 1 2
0
J
a *+1
b b,
(3.3.2)
where * denotes the addressof the instruction in whichit occurs. Since the operation 1 0 is available for all 0, the pairs (3.3.2) may be used as a substitute for 0 0 if this should be lacking for a given 0 ; and similarly for the other forms of single instructions appearing in (3.3.1). Hence, for the sake of brevity, we shall consider all four forms (3.3.1) as available for all arithmetical operations @. The feature of the M-3 two-address computer described by (3.3.1) may be advantageously used for automatic programming purposes, namely, for optimizing subroutines of the translator: the result of any operation need not be stored, if it can be used as first operand for the next instruction. (It is worth noticing that here we have an implicit answer to the question whether one-, two-, or three-address computers are best suited for automatic programming). The foregoing implies that the chain operator (.
. . ((ao0 , a,) 0, a,) . . .) Ovav
--f
b
(3.3.3)
rather than an operator of the form a 0 b + r should be considered as the simplest form of nontrivial arithmetical operator (assignment statement). Indeed, the chain operator (3.3.3) may be readily translated into M-3 machine language: 96
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
10, a , 3 0, a, 3 0, a3
a,
...
3 @ “ - I a”-1 2 0, a, b,
where the omitted addresses are immaterial. Thus, the main objective for research on automatic programming for M-3 and allied computers is t o find an algorithm for decomposition of any arithmetic operator into a finite number of chain operators. The problem has obviously a trivial solution, obtainable with the help of algorithms constructed for a three-address computer ; but then, by virtue of all chain operators being reduced to simplest three-address operators, a 0 b + c, there will be many unnecessary transfers to and from machine memory. Hence, more correctly stated, the objective is to find a decomposition algorithm which minimizes the number of chain operators needed to represent the given arithmetical operator. Besides, the algorithm should be “sufficiently simple,” for too complicated an algorithm may easily consume a large part of the time gained by the optimization procedures. Let us, following Kalm&r’s paper, consider arithmetic programs, i.e., finite sequences of arithmetical operators, each assuming the form e + v, where e denotes an arithmetic expression and u denotes a variable. Arithmetic operators forming a program are separated by semicolons, and each of the arithmetical expressions is enclosed by at least one pair of parenthesis, except when it consists of a single variable, e.g.,
(a + b ) +z;
( ~ ( -a c ) ) -+a;
a -+z
(3.3.4)
is an arithmetic program. Expressions of the form (81
0 2121,
(3.3.5)
where u1 and v 2 are variables, and 0 is an operation symbol, are called Jirst-order chain expressions. Chain expressions of higher order are defined recursively, viz., (c 0 v) (3.3.6)
is a chain expression of ith order if c is a chain expression of (i - 1)st order. Chain expressions of any order will be called chains. An arithmetical operator of the form c + u,where c is a chain, is called a chain operator, and an arithmetic program consisting of chain operators and operators of the form v1 -+ v 2 only is called a chain program. 97
WlADYStAW TURSKI
Consider now the set V of all the variables pertinent to a given arithmetic program, and the set V‘ of the essential variables of the same program. We do not define essential variables in any specific manner; as a matter of fact in the following we only suppose that
V‘ 5 v.
(3.3.7)
Two arithmetic programs P, and P , are said to be equivalent relative to the essential variables V‘, which is denoted P,-P, (V’), if the numerical values attached to the essential variables by both programs, P I and P,, are the same. Two facts deserve to be mentioned at this stage: (i) Kalmbr’s definition of the equivalence is much more precise than the intuitive explanation given here. (ii) The reader is advised to bear in mind the fact that the set of all variables pertinent to a program may include some variables not given explicitly in the algorithm which is realized by the program-e.g., working cells should be considered as such variables. The same holds true for V’. At the beginning of the present section we have arrived a t a conclusion concerning the objective of research on automatic programming for “M-3-like” computers. This conclusion may now be formulated as follows: To find a decomposition algorithm which, for a given arithmetic program P I and set of essential variables V’, produces a chain program P , which is equivalent to P I relative t o V’. Kalmbr has proved a very important theorem, which describes formal conditions permitting application of procedures that lead to partially decomposed arithmetic programs and meet the demands of equivalence. Unfortunately, both the rigorous statement of the theorem and its proof are too long to be reproduced here, thus we restrict ourselves to an informal formulation of this theorem. Consider a program P and set of essential variables V’. Let our objective be to “detach” an expression d occurring in P. This means that we are introducing into the program the operator d -+w , where w is a variable, and then replace (all, some, or none) occurrences of d by w . Variable w is called the working variable of the detachment. We divide our program into three parts P I , P,, and P,, called head, trunk, and tail, respectively. This division depends on the choice of both d and w. Namely, the three following conditions must be obeyed: (i) The expression to be detached must not change within the trunk. (ii) The working variable of the detachment must not occur in the 98
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
trunk, except for its occurences within the expressions to be detached. (iii) The first (if any) occurence of the working variable in the tail has to be placed on the right-hand side of the symbol + ,and the working variable is allowed to be an essential variable if it occurs in the tail only. Then the theorem says that if we detach d from the trunk, add the operator d -+w as the last operator in the head, do not change the tail, and denote the new (decomposed) trunk by P',,we have the relation
-
P,P,P, P,P,'P, (V').
(3.3.8)
The theorem is, of course, entirely machine-independent and thus valid for any kind of addressing system used. The detachment procedure proposed by Kalmir in the remaining part of his paper [25] is M-3 machine oriented, and hence we shall not make any detailed analysis of it. A few general remarks concerning the procedure may, however, clarify the underlying ideas. The expression d is chosen as a chain, and the primary goal of the procedure is to reduce the number of primitive symbols occurring in it, i.e. t o shorten the program by detachment as much as possible, and a t the same time produce as good conditions for the next detachment as possible; for the detachment procedure is, in a sense, recursive. A general rule for the best guess concerning the expression to be detached is t o pick the first maximal chain (i.e., one which does not occur as a subexpression in the program), starting from the right. This simple rule is subject to many additions, since its straightforward application leads to clumsy compositions, as illustrated by the following example. Consider the program (3.3.9)
The right most maximal chain is (b able we get
((b
+ c ) d ) -+
t;
+
(a t )
--f
u;
+ c)d. Choosing t as a working vari((b - c)/(b
+ c))
(a- t ) + w ;
-+
v;
(3.3.10)
+
Here, ( b c ) can still be detached, giving, with the help of a new working variable q, the following chain program: ((b
+ c)d) + t ;
+
(a t ) + u ;
(a - t ) -+w ;
(b
+ c) +q;
( ( b - c)/q)-tv; (3.3.11) 99
WLADYSLAW TURSKI
whereas, detaching from (3.3.9) first ( b
+ c), we would get
from which (qd) should be detached, yielding (b
+ c)
+
q;
(a (qd) t ; ( a - t ) --f w; -+
+ t)
-+
u;
((b - c)/q)
-+
v; (3.3.13)
Obviously (3.3.13) is a shorter, and thus “better,” chain program than (3.3.11). Note that no detachment procedure can give (3.3.13) from (3.3.10) since the first two terms of (3.3.10) form a chain, and thus are not subject to the detachment procedure. From a purely pragmatic point of view KalmBr’s algorithm has some minor disadvantages, pointed out by Kalmir himself. One of them is that the detachment algorithm tends to decompose even programs which are already machine acceptable. Another will be seen from the next example. I n the case of the program ((a b)c) -+ d ; ($) + g; (f -((a b ) c ) ) -+ h; the algorithm will produce ( ( a b)c) --f w ; w -+ d ; (ef) --f g; (f - w)-+ h ; introducing the unnecessary working variable w, and one unnecessary memory transfer, since the form ((a b)c) -+ d ; (ef) -+ g; (f - d ) -+ h would do equally well. In spite of these and other minor drawbacks, the algorithm is a very convenient one and may well be adopted (with some modifications) for practical work.
+
+
+
+
4. Survey of Methods of Programming
I n the preceding sections a more or less detailed analysis of the most important developments in the field of automatic programming has been presented. Now, a few words remain to be said about practical methods of programming employed in everyday work of computing establishments. As far as is known from both published material and personal contacts with Soviet scientists, the prevailing programming system in the USSR is based on the two-step method, described in Section l.l.l9This method and its many variations imply a sharp division of routine work on programming between programmers who formulate problems in terms of one of the existing programming programs, and specify the ‘*For a discussion of programming methods for business applications, see Yu.1. Chernyak, Electronic Simulation of Planning Systems in USSR, Data Processing (London) 6, No.1, 18-24 (1964).
100
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
functional meaning of separate blocks, and coders who carefully translate the symbols in which the program is written into those acceptable to the computer. Thus, with some oversimplification, we might have called the dominating Soviet programming system “the binary-coded automatic programming.” I n contrast to this are two important schools that of Shura-Bura and Ershov and that of Glushkov, both leading to a much higher degree of automation of programming; these should be considered as determining factors for future development. There is no doubt that when alphanumerical input devices become more widely available for Soviet-made computers, the tremendous theoretical work performed in that country will ripen into many interesting automatized programming systems. I n Poland, there is a positive tendency to introduce automatic programming for all computers that are operated in computational centers which perform calculations for various customers. I n specialized computation centers, semiautomatic or even internal language programming is recognized as more efficient. Two languages are recommended for general use: a limited version of ALGOLand SAKO. Much research is devoted to the problem of enlarging SAKO so as to provide for business data processing; this is to be done by including many features and concepts of COBOL(the language so created is provisionally called SAKOBOL). I n other countries of Eastern Europe no distinct trend can be observed. I n the German Democratic Republic [26] and in Czechoslovakia [39] programming in machine code, with some symbolic addressing, is a t present the most commonly used programming technique. On the other hand, in Czechoslovakia a considerable amount of research is devoted to the preparation of an automatic programming system for the first Czechoslovak large-scale computer EPOS.Some traces of this work can be seen in Kindler’s paper [29] (see Appendix 2). Simultaneously with that, ALGOLis becoming a widely accepted publication language: Many algorithms are published in ALGOLand subsequently translated into machine codes for testing purposes (e.g., [30]). Thought-provoking remarks on automatic programming languages which are to be found in an extremely interesting paper by Culik [&I on languages generated by algorithms, and stimulating work by Svoboda on applications of Korobov’s sequencing theory to addressing systems [60], show that, parallel with practical applications, some theoretical research on this subject is well advanced in Czechoslovakia. I n Hungary, most of the programming is done in machine language, though Kalmir’s approach described in Section 3.3 above indicates that some successful attempts to automatize the programming process are being made there too. 101
WCADYStAW TURSKI
I n general, one may conclude that, in Eastern Europe, automatic programming is considered as a vitally important part of computer science, and a great deal of both labour and funds is devoted t o theoretical and practical research in this field. Appendix 1. Example of a Lyapunovian Program [ 4 / ]
Let us consider a finite difference equation approximating Dirichlet’s problem over a rectangular domain. Let the dimensions of the rectangle be nh x mh, where n and m are positive integers, and h is an arbitrary positive number. The rectangle may be covered by a square net, each elementary square being of dimensions h x h. Let the subscript j correspond to rows of the net, and the subscript i to columns, i.e., i = 0 , 1,2, . . . , n ;j = 0 , 1 , 2 , . . . , m. We shall denote by (i,j)a node of the net. We are seeking the function fij, defined on the set of nodes (i,j), having prescribed values for all nodes ( O , j ) , ( n , j )(i, O), (i,m),and satisfying at -internal nodes the following equation :
f<j= i ( f i - 1 , j +fi+l,j+fi,j-1 +fi,j+1). The solution will be found by an iterative process. For this purpose let denote a first approximation. We define the operator A , as follows:
fg
(i) The operator A , generates the quantity
f”” 23
=
t(f,”-:tj+ f?yl +
f,”+l,j
+f$+J,
(ii) transfers the result to the location previously occupied by f& (iii) generates& = I f;+’ - fil, (iv) selects the greater of the two numbers q ,p t , and (v) transfers it to the location previously occupied by q.
In other words, the operator A , may be represented by the scheme
[f$I [ P i p PI P(P > 7)[P
+
111 rf:
+fijI,
wheref; is the operator executing (i) and transferring the result to some fixed auxiliary location. When the calculation has been performed for all internal nodes of the net, the final value of r] is compared with a stored positive number E . If 9 > E the computation is repeated, otherwise the computation is stopped (which is denoted by the operator OCT). The computational scheme for this procedure is of the form: m-ln-1
4 ( [ O + q ] j -n1 in- 1 A& 102
(E
> 7)? OCT.
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
The corresponding programming scheme is represented by
where the initial values of i and j are equal to one. There are possible simplifications of the programming scheme, but the form given above may well be considered as typical of Lyapunov’s method. Appendix 2. Kindler’s Algorithm for Programming of Arithmetic Formulas
A comparatively simple algorithm for programming arithmetic expressions for a three-address computer has been given by Kindler [29]. This algorithm can translate arithmetic expressions with variables, left and right parentheses, four basic binary operators, and the sign #, which is used to denote the terminal point of the expression. I n this Appendix we give the algorithm in its simplest version. Some rather interesting features and possible extensions of this algorithm and complete proofs of its validity may be found in the original paper. Let us adopt the following definitions: (1) variables are multipliers, (2) multipliers are factors, (3) factors are terms, (4) term +factor is a term, ( 5 ) term - factor is a term, (6) factor x multiplier is afactor,
( 7 ) factor:multiplier is a factor, (8) (term+factor) is a multiplier, (9) (term - factor) is a multiplier,
(10) (factor x multiplier) is a multiplier, ( 1 1) (factor:multiplier)is a multiplier, (12) term # is an arithmetic expression.
If a string of characters (variables, parentheses, operators, and # ) a , a, . . . a, is an arithmetic expression A , then the integer d is called the length of A . The function p (priority) is defined on the set of all characters:
p ( x )= p ( : )= 2 P(+)
=P(-) =
1
otherwisep = 0. 103
WtADYStAW TURSKI
Variables are divided into two kinds; (i) actual variables A , B, . . . ,which can be used in every arithmetic expression, (ii) auxiliary variables T1, T 2 , . . . , which are used during the generation of a program.
The algorithm compiles the code for execution of expressions of length d 2 4, and is written by means of an Amor,-like language, with some obvious extensions included in the class of Boolean expressiona. The procedure compile (k, V , 0,W ) ,where k is a nonnegative integer, V and W are actual variables, and o is an operator, is meant to denote that: (i) the generated code will have at least k instructions, (ii) the kth instruction is V o W -+ Tk.
A statement of the type a[i]:= Tk,means that the ith character of the expression is overwritten by the character Tk; q is the smallest index i of those ai which are pertinent to the resulting program at the stage of compilation corresponding to a given value of k. In other words, characters a,, a2,, , . , aq- have been (at a given stage of compilation) already used, and thus are irrelevant for subsequent stages. begin q := 1; k := 1 ; i := 1; A l : if i d - 1 then go to A5;
if a [i] is a variable then go to A2; i := 1 i; go to A l ; A2: if a [i + 21 is a variable then go to A3; i : = 3 +i;gotoAl; A3: if p(a[i + 3)] < p(a[i + 13) then go to A4; i := 2 i; go to A l ; A4: compile (k,a [i],a[i 11, a[i 21); if a[i - 11 is a left-hand parenthesis A a [i + 31 is a right-hand parenthesis then begin a[i + 31 := Tk;
+
+
+
t := 4;
i:=l+i; j := i - 3 end
else begin a[i + 21: = Tk; t := 2;
j := i 104
- 1 end;
+
AUTOMATIC PROGRAMMING I N EASTERN EUROPE
forj : = j step - 1 until q do a [ j q := q + t ; k : = 1 k; if i < q then i : = q ; go to A1; A5: end
+
+ t ] := a [ j ] ;
REFERENCES I n the reference list, the following abbreviations are used: Problemy Kibernetiki-irregularly appearing Soviet journal. English PK translation of some volumes published by Pergamon Press, New York. Stroje na Zpracovcini Informaci-irregularly appearing Czechoslovak SZI journal. Prace ZAM-series of reports of The Institute for Mathematical PZAM Machines, Warsaw, Poland. Conference on Automatic Programming Methods, held at Warsaw, CAP Poland, September, 1961. Frontier Research on Digital Computers, John W. Carr 111and Mary FRDC Dale Spearman, eds., Univ. of California, Berkeley, California, 1959. following the reference means that the paper has been published in R Russian. following the reference means that the paper has been published in P Polish. 1. Arsentieva, N. G., On some transformations of programming schemes, P K 6 , 59-68 (1960) R. 2. Borowiec, J., Translation of arithmetic formulae, C A P , in Algorytmy 1, 37-56 (1962). 3. Capla, V. P., A computer survey of the Soviet Union, Datamation 8, No. 8, 57-59 (1962). 4. Cam, J. W., 111, Bibliography on Soviet computer literature, F R D C 2. 5. Carr, J. W., Report on a return visit to the Soviet Union, F R D C 2. 6. Chorafas, D. N., Programming Systems for Electronic Computers, p. 94, Butterworths, London (1962). 7. somputing Reviews Bibliography: 3, Comp. Rev. 2, 212-214 (1961). 8. Culik, K., On languages generated by some types of algorithms, C A P , published in Proaeedings of Munich Congress of IFIP, 1962. 9. Daugavet, 0. K. and Ozerova, E. F., Programming programme of compiling type, Zhur. Vych. illat. i Mat. Fiz. 1, 747-748 (1961) R. 10. Ershov, A. P., Programming of arithmetical operators, Doklady ALad. Nauk S.S.S.R. 118, 427-430 (1958), transl. in Comm. ACM , 1, No. 8 (1958). 11. Ershov, A . P., Programmiiig programme for B.E.S.M., Izdatel’stvo Akademii Nauk, Moscow (1958) R, transl. by M. Nadler (1959), Pergamon Press, New York. 12. Ershov, A. P., The work of the Computing Centre of the Academy of Sciences of USSR, Proceedings of the International Symposium on Mechanization of Thought Processes, p. 269, H.M.S.O., London (1959) 13. Ershov, A. P., Operator algorithms, PK 3, 5-48 (1960); 8, 211-235 (1962) R.
105
WtADYSlAW TURSKI
14. Ershov, A. P., Main principles of construction of the Programming Programme in the Mathematical lnstitute of the Siberian Division of the Academy of Scicnccs USSR, Siberian Math. Zhur. 2, 835-852 (1961) R. 15. Ershov, A . P., Kamynin 8. S., and Lyubimskii E. Z . , Automation of programming, l ' r u t l y 3-90 Vses. iMatemuticheskogo s'ezda 2, 74-76 (1956) R. 16. Ershov, A. P., Kozhukhin, G. I., and Voloshin Yu. M., Input language for automatic programming system, Computation Centre of the Siberian Division of the Academy of Sciences USSR (1961) R. translation: Yershov, A. P., Kozhukhin, G. I., and Voloshin, U. M. (1963); Input Language System of Automatic Programming, Academic Press, New York. 17. Fialkowski, K., Swianiewicz, J., ZAM-2 Compiiter description and Programming in the language SAS, P Z A M C3, Warsaw, (1962) P. 18. Fedosecv, V. A., Methods of automatic programming for computers, PK 4, 69-94 ( 1960j H . 19. Glushkov, V. M., On a certain method of automation of programming, PK 2, 181-184 (1959) R. 19a.Gosden, J. A. (ed.), Report of a visit to discuss common programming languages in Czechoslovakia and Poland, Comm. ACM 6, No. 11, 660-662 (1963). 20. Grcniewski, M., Symbolic moditicators code for the EMAL-2 computer, C A P . 21. Greniewski, M., Algorithmic language, compiler and logical design of computers, €?roc. I B A C S y m p . , Moscow, 1962. 22. Greniewski, M., Turski, W., Beschreibung der Sprache KLIPA, Wiss. 2. Tech. Univ. Dresden, 12, Heft 1, 64-68 (1963). 23. Greniewski, M., Turski, W., External language KLIPA for URAL-2 computer, Comm. A C M 6, NO. 6, 321-324 (1963). 24. Iliffe, J. K., The use of the Genic system in numerical calculations, Ann. Rev, Autom. Programming 2, 1-28 (1961). 25. KalmBr, L., A contribution t o the translation of arithmetical operators (assignment statcrnents) into machinc language of the computer M-3, C A P . 26. Kammerer, W., ZifSernrechenanZagen, Akademie Verlag, Berlin, 1960. 27. Kamynin, S. S., Lyubimskii, E. Z., and Shura-Bura, M. R., On automation of programming wit,h the help of a programme that programmes, PK 1, 135-171 (1958) R. 28. Kcldysh, M. V., Lyapunov, A. A., and Shura-Bura, M. R., Mathematical N o . 11. problems of the theory of computers, Vestnik Akad. Nauk S.S.S.R. 16-37 (1956), R. 29. Kindler, E., Simple algorithm for the programming of arithmetic expressions, S Z I 8, 143-154 (1962). 30. Kindler, E., Matrix inversion on computers with fixed point operations, S Z I 8, 136-142 (1962). 31. Kitov, A. I., Electronic digital computers, Sovetskoje Radio, Moscow (1956) R . (partially translated in F R D C 2). 32. Kitov, A. I , , and Krinitskii, N. A., Electronic Digital Computers and Programming, Fizmatgiz, Moscow (1961) R. translation: Electronic Computers, Pergamon Press, New York, in press. 33. KlouEek, J., and VlEek, J., Ein Entwurf des symbolischen Systems f u r die Formulierung der Aufgaben auf dem Gebiete der Verarbeitung von okonomischen Datcn, 821 8, 181-188 (1962) (in Czech). 34. Korobov, N. M., On some problems of equal density distributions, Izv. Akad. Nauk S.S.S.R. Ser. Mat., 14, No. 3. 215-238 (1950) R.
1 06
AUTOMATIC PROGRAMMING IN EASTERN EUROPE
35. Korolyuk, V. S., On the address algorithm concept, P K 4, 85-1 10 (1960) R . 36. Kovalcv, N., quoted in Computers and Automation 11, No. 9, 7 (1962). 37. Kozmidiadi, V. A,, Chernyavskii, V. C., On some concepts of the theory of mathematical machines, Voprosy teorii mat. mashin 2, 128-143 (1962) R.
37a.Krinitskii N. A., Mironov, G. A., and Frolov, G. D., Programming. Fizmatgiz, Moscow, 1963, R. 38. Kurochkin, V. M., A lecture to the Warsaw Conference, C A P . 39. Laboratoi: matematickich stroju, S Z I 1, (1953) This issue is devoted to principles of programming and computers (in Czech). 40. Lavrov, S. S., On memory economy in closed operator scheme, Zhur. Vych. Mat. i M-ut, Fiz. 1, 678-701 (1961) R. 41. Lyapunov, A. A., On logical schemes of programmes, P K 1, 46-74, (1958) R. 42. Lyubimskii, E. Z., Arithmetical block in PP-2, P K 1, 178-182 (1958) R. 43. Lyubimskii, E. Z . , Proposed alterations in ALGOL-60, Zhur. Vych. Mat. i Mat. Fiz. 1, 361-364 (1961) R. 44. Lukaszewicz, L., SAKO-An automatic coding system, C A P ; published in Ann. Rev. Autom. Programwiing 2, (1961). 45. Lukaszewicz, L., and Mazurkiewicz A., Automatic coding system SAKO; Part I: Description of the language, P Z A M C2 (1961) P. 46. Lukhovitskaya, E. S., Logical conditions block in PP-2, P K 1, 172-177 (1958) R. 47. Markov, A. A., Theory of algorithms, Trudy Mat. Inst. Akad. NauE S.S.S.R. 42 (1954) R. 48. Martynyuk, V. V., On the symbolic address method, P K 6, 45-58 (1961) R. 49. Mazurkiewicz, A., Arithmetic subroutines and formulae in SAKO, C A P ; published in Ann. Rev. Autom. Programming 2 (1961). 50. NovBkovB, M., and VlEek, J., Method of programming the simplex-method procedure on a digital computer, S Z I 8, 171-179 (1962). 51. Podlovchenko, R. I., On transformation of programming schemes and its application to programming, P K 7, 161-168 (1962) R. 52. SedlLk, J., A programme for investigation of solutions of ordinary differential equations, S Z I 7, 99-117 (1959) R. 53. Shreider, Yu. A., Programming and recursive functions, Voprosy teorii mat. mushin 1, 110-126 (1958) R. 54. Shreider, Yu. A., On concepts of generalized programming, Voprosy teorii mat. mashin 2, 122-127 (1962) R. 55. Shtarkman, V. S., Block of economy of working cells in PP-2, P K 1, 185-189 (1958) R. 56. Shurygin, V. A., and Yanenko, N. N., Realization of algebraic-differential algorithms on an electronic (digital) computer, P K 6, 33-43 (1961) R. 57. Sobolev, S. L., A lecture to the Warsaw Conference, C A P . 58. Stognii, A. A., Principles of a specialized programming programme, P K 2, 185-189 (1959) R. 59. Stognii, A. A., Solution of a problem connected with differentiation of a function with the help of a digital computer, P K 7, 189-199 (1962) R. 60. Svoboda, A., Application of Korobov's sequence in mathematical machines, S Z I 3, p. 61 (1955) (in Czech). 61. Swianiewicz, J., XYZ Computer, Programming in Machine Language, SAB, SAS, and SO systems. P Z A M C1, Warsaw (1961) P. 62. Swianiewicz, J., and Sawicki S., SAKO-translation, C A P .
107
WtADYStAW TURSKI
63. Szorc, P,, Subroutines in SAKO, C A P . 64. Trifonov, N. P., and Shura-Bura, M. R. (eds.)Automatic Programming System Fizmatgiz, Moscow (1961) R . 65. Turski, W., Man-Computer Communication, lecture delivered to Top Management Conference, Warsaw (1962). 66. Turski, W., Possible astronomical use of digital computers, Postepy Astronomii 11, 147-161, (1963) P. 67. Vyazalov, L. H., Morozov, Yu. I., An auxiliary standard programme for checking programmes, P K 6, 59-67 (1961) It. 68. Voloshin, Yu. M., Automatic Programming Bibliography, Siberian Division of the Academy of Sciences USSR, Novosibirsk (1961). 69. Yanenko, N. N., Reduction of a system of quasi-linear equations to a single quasi-linear equation, Uspekhi Mat. Nauk 10, V y p . 3, (1955) R. 70. Yanov, Yu. I., On matrix schemes, Doklady Akad. Nauk S.S.S.R. 113, 283-286 (1957) R, (translated in FRDC 2). 71. Yanov, Yu. I., On the equivalence and transformations of programming schemes, Doklady Akad. Nauk S.S.S.R. 113, 39-42 (1957) R . (translated in FRDC 2). 72. Yenov, Yu. I. On logical schemes of algorithms, P K 1, 75-127 (1958) R.
108
A Discussion of Artificial Intelligence and Self0rganizat ion GORDON PASK System Research Limited Richmond, Surrey, England
1. Introductory Comments
.
Level of the Discussion . The Self-organizing System . . Specific and Distributed Processes Systems with Artificial Intelligence The Relevance of Brains . Heuristics and Symbiotic Interaction . Descriptive Languages . Localized and Unlocalized Automata . 2. The Characterization and Behavior of a Self-organizing System 2.1 Various Definitions . 2.2 Special Case . 2.3 A Model . 2.4 Formal Representation . 2.5 The Model . 2.6 Physical Mechanisms . . 2.7 Control Hierarchies and the Stable Self-organizing System 2.8 Unlocalized Automata . 3. Artificial Intelligence 3.1 Basic Definitions . 3.2 Specific Processes . 3.3 Intelligent Artifacts . 3.4 Some Difficulties . 4. Other Disciplines . 4.1 Psychological Models . 4.2 Physiological Models 5. The Interaction between Men and Their Intelligent Artifacts 5.1 Present Position . 5.2 Educable Machines . 5.3 Dividing Labor . 5.4 Adaptive Machines . 5.5 Concluding Remarks . Acknowledgments Glossary . References 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
.
.
. 110 . .
. . . . . .
. . . . . . . . . .
. . . . .
. . . . . . . . . . .
110 110 111 112 113 113 114 114 116 116 118 119 119 126 130 144 155 165 165 177 198 200 204 204 205 208 208 209 210 212 214 214 214 218
109
GORDON PASK
1. Introductory Comments
1.1 Level of the Discussion
The names “artificial intelligence” and “self-organization” are often criticized on the grounds that they are prone to the contradictions of self-reference and other forms of paradox [ I ] [2]. But most people agree that “artificial intelligence” and “self-organization” admirably describe classes of phenomena and kinds of artifact that are nowadays very often encountered. Consequently, these names are used loosely t o tag the phenonema or artifacts concerned. Any system that simulates mentation is deemed “artificially intelligent” and any system with a behavior that becomes more ordered (according to some vague criterion or another) is called a “self-organizing system.” Perhaps we cannot be more precise. On the face of it, however, a cybernetic demarcation and analysis of these systems are possible and potentially valuable. The loose usage is to be deprecated for avoiding paradoxes (of self-reference and control) which must be tackled t o gain a proper appreciation of what goes on. But the classes of “artificially intelligent” and “self-organizing” systems are not properly represented within the theory of informationally closed systems and the required extensions of this theory are, in the first place, logical and ontological rather than mathematical [3]. A tentative cybernetic analysis is presented as part of our discussion of this field. 1.2 The Self-organizing System
To begin with, in Section 2, we outline the characteristics of a selforganizing system [4] and develop the special case of self-organization as it appears in connection with automata or computing mechanisms which may be either fabricated artifacts or living organisms. Hence our discussion is centered upon the property of “learning” and upon mechanisms that give rise t o a “learning” behavior. It is argued that nontrivial learning behavior is generated by populations of evolving automata (or, using a distinction due to Loefgren, by unlocalized automata) rather than single automata with well-defined inputs and outputs (localized automata). On the other hand the system we observe is usually a sequence of localized automata. The underlying evolutionary process is described by a sequence of relatively static images. It is necessary to take the word “learning” quite seriously. “Learning” involves more than adaptation. True, some kind of a n engram must be laid, or some plastic change must occur, as a prerequisite for learning. 110
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
But, in view of the frantic activity that goes on as the concomitant of perceiving an event, it is difficult to imagine a brain that is not modified in some fashion. Minimally, we infer learning from evidence of a goal-directed change in a pattern of behavior. Since any consistent behavior pattern can be ascribed t o a computation carried out by the object under scrutiny, learning is inferred from a goal-directed change of computation, and the object concerned “learns to compute” in the sense that its computational repertoire is enlarged as a result of “learning”. This point of view is consonant with our own intuitions in the matter and experimental results of the kind obtained by Bartlett [5]. We do not learn a poem or a story as a tape recorder might, by registering its image in some malleable substance. On the contrary, we learn the computations required in order to recite the poem or tell the story. Further, the learned computation is not so much retained as reproduced. Memory is continual relearning. The distinction has nothing t o do with dynamic versus static information storage. Either or both can be involved in the realization of an automaton. At the moment, however, we are viewing an automaton, as a collection of algorithms or, at a less detailed level, as a mapping from input to output that satisfies this algorithmic specification. Insofar as this abstraction is embodied in a brain or a network, the brain or network may be assigned a couple of extreme roles. At one extreme it is a telephone exchange, perhaps with variable connections, wherein definite parts have definite functions to perform. Altogether the functions that are performed describe the automaton, and an image of the brain is isomorphic with an image of the automaton. At the other extreme the brain acts as an internal environment or medium in which patterns of activity and constraint are able t o develop; for example, patterns of interaction between impulse sequences and distributions of synaptic impedances. Insofar as memory involves relearning and reproduction, we are invited to adopt the latter view of a brain. If the developing organizations are identified with automata, these are reproduced in the medium of the brain. Their variations in response to internal or external change is a statement of their evolution (which, behaviorally, is manifest as learning). The form of variation is a n evolutionary rule (which accounts for the goal-directed property of learning behavior). 1.3 Specific and Distributed Processes
Neither view of a brain or a network is necessarily more accurate than the other. Each is an image of the same physical object. Certainly there are regions in most brains and networks that are so profitably 111
GORDON PASK
described by the analogy of a telephone exchange that any other approach is out of court; for example, i t would be absurd to describe a reflex arc or an input filter in any other way. On the other hand there are many regions that can be described in either fashion according to our convenience and the states of the physical object that we deem relevant to our enquiries. The important point is that learning is a property of such a description and its behavior, not of the physical object “brain” or “network”. It is sensible to talk about learning if we have decided to view a brain or network as a medium in which a population of automata is evolving. Conversely, if our inquiries refer to learning, this image of the brain or network is the most convenient, and its states are most readily identified with states of the physical object. One way to assert that this decision has been made and that an object capable of acting as a medium for evolution is being observed is to say that we are considering a selforganizing system. 1.4 Systems with Artificial Intelligence
I n Section 3 we deal with artificial intelligence. The contention ia that artificial intelligence is a special property of a self-organizing system. In particular, a system has artificial intelligence if it learns in much the same way that we learn and about much the same universe of discourse. This definition automatically excludes cleverly designed calculators and appears compatible with the spirit of present-day research in this field, though it implies rather more stringent criteria for intelligence than are commonly adopted and lays the emphasis upon dynamic characteristics (which are sampled in a test for intelligence) rather than capabilities that might be inferred from knowledge about the structure of a system. Like any self-organizing system, an artificial intelligence is a controller and (as we argue in the main discussion) it has an hierarchical structure wherein there are levels of control that aim to achieve different levels of goal. However, if we call the system intelligent, its control activities are necessarily termed “problem solving” and the stable states achieved as a result of these control activities are termed “problem solutions.” A little more is involved than the idiom of the field. We are at liberty to identify the states of a learning system with signs and to call any sign and its denotation a symbol. But if the system learns as we learn, then we are forced to regard it as operating upon symbols that denote the perturbations we ohoose to call problems. Further, we must 112
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
countenance symbolic operations, a t a higher level in the control hierarchy, that act upon and transform symbols alone. So far, we have renamed an hierarchy of control, calling it an hierarchy of symbolic descriptions (which appears in the main discussion as an hierarchy of metalanguages). If i t learns as we learn, it must perform modifications that we find familiar. (Unless we test for this capability we cannot discern an intelligent machine). Let us define a concept as the process whereby a symbol is assigned to a state of a description of the environment within its denotation (or to a set of symbols, in a description of a system’s internal state). The acquisition of a concept is a process whereby the concept (itself a process) is learned. Although any system that learns can be said to use, and possibly to acquire, “concepts” in a broad and rather dubious sense, an artificial intelligence must use concepts like our own and it must acquire concepts in much the same way that we acquire them. Hence a study of artificial intelligence is chiefly concerned with the dynamics of a system of symbols whereas a study of self-organization also involves the underlying organization, states of which are identified with these symbols. Further, in considering artificial intelligence, microscopic semantic processes are important (whereas self-organization is a macroscopic property of physical assemblies). It is necessary to distinguish between perceptual and motor regions, for example, or between different kinds of problem-solving algorithms, and a great deal of the discussion involves a more detailed review of systems that have been previously considered as self-organizing systems. 1.5 The Relevance of Brains
Since artificial intelligence resembles our own mentation, the workings of a human brain have an obvious relevance t o the design of an intelligent machine, This aspect of the matter is examined in Section 4 which deals with various physiological and psychological models of human learning and concept acquisition. 1.6 Heuristics and Symbiotic Interaction
One outstanding feature in the design of intelligent machines is the role of “heuristics” or broad ((patterns”of action (methods of proving hypotheses, for example, or criteria of similarity) that are part of the specification but which stem from human experience in problem solving. There is a very real issue of the extent to which an artificial intelligence can be independent of a human intelligence. At the moment, it cannot be. Coupling between the two may, as suggested above, depend upon 113
GORDON PASK
heuristics that are vehicles for injecting some wisdom into the artifact. Alternatively the wisdom can be gained through interaction with and experience of a man or a group of men. I n Section 5 we consider this kind of man-machine interaction, both from the viewpoint of machine education and from the diametrically opposite viewpoint of extending the capabilities of a man and controlling his learning process. I n fact, Section 5 is devoted t o symbiotic interaction between men and machines (which can be contrasted with modes of interaction in which the machine is regarded as a tool). Conversation is a typical symbiotic relationship, The crucial test for symbiosis is the production of a joint concept (arising from interaction between the participants but which cannot be ascribed to either of the participants alone). Whereas concept acquisition within an intelligent machine entails the internal construction of some element in a descriptive language, this process is exteriorized in a symbiotic man-machine interaction and is evidenced by the construction of a common language. 1.7 Descriptive Languages
Since linguistic arguments prove essential in the analysis of artificial intelligence, the discussion of self-organization has also been phrased in terms of the languages used in describing and performing experiments upon a learning process. We could, of course, have avoided any mention of “language” until Section 3 (because, as pointed out in Section 2, a “system” is isomorphic with an object language and its denotation). One advantage of adopting the more elaborate formulation (apart from consistency) is that properties like “learning” can be shown to depend upon the relation between a physical object and the observer’s descriptive language instead of depending upon a relation that entails the personal oddities of a particular observer. 1.8 Localized and Unlocalized Automata
Automata are completely abstract entities. But all interesting automata are realized as physical structures and appear as organizations that are a property of some physical medium. The material dependence of these abstract entities is particularly important because of the dominant role assumed in our discussion by localized and unlocalized automata and the distinction of one class from the other. It will be wise to keep a tangible realization of each class in mind. Since we are considering the real world, a computing machine is not a typical localized automaton. It has an aura of permanence which belies the fact that any localized automaton, open to the structural 114
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
perturbations of the real world, has a finite life span. A better exemplar, perhaps, is an ape in a cage. The creature is a physically open system with a metabolism that preserves its structure. Informational closure can, however, be approximated (though, typically enough, internal information transfer is mediated by an autonomous activity that continues while the animal survives). Since the ape has a consistent behavior pattern in a given environment, it computes a response as a function of its input stimuli (with its internal state as a parameter of this function). Finally, since we have some idea of the goals that apes aim for, and since we know the stimuli that count as signs, we can observe the goal-directed changes of behavior pattern that characterize learning. For the purpose of this analogy, the inputs and outputs of a n ape are well-defined, so it is a localized automaton; but however well it is fed, the ape, like any other localized automaton, has a finite life span. It cannot survive indefinitely. The paradigm case of an unlocalized automaton is a cage containing a well-nourished and reproducing population of apes together with a signaling arrangement (a flashing lamp or a buzzer) which allows the experimenter to stimulate the population and some method of discerning the response of a typical member of this population (by recognizing a behavior pattern manifest by the majority of individuals). The individual apes are certainly automata, and the aggregate of apes is also a parallel computing machine. The input and output of the system are not, however, defined with reference to the individual that actually carries out the computation; hence, the parallel computing machine realized by the population is representable as an unlocalized automaton. I n common with a subset of unlocalized automata, which has been shown by Loefgren [6] to exist, the population of apes may have an indefinite life span. Our exemplar becomes more plausible and less trivial if we allow overt cooperative interaction between the apes. (There is a n implicit competitive interaction, in any case, due to the food supply limitation and the finite boundaries of the cage.) We shall assume that the individuals interact (and may cooperate) through a system of signs, and normally these signs are precisely the signs that we use when stimulating the population and detecting its typical response. Indeed, we should aim to interact with the population in terms of the same language that is used for internal communication. To push the analogy one stage further, it would be possible t o insist that the apes did cooperate by providing a form of environment in which an ape could only survive (receive sufficient nourishment) if it cooperated with other apes. (Hence, creatures that survive are forced 115
GORDON PASK
to communicate in order to maintain cooperative interaction.) I n artifacts that consist of a medium in which organizations evolve, this constraint is always applied. 2. The Characterization and Behavior of a Self-organizing System 2.1 Various Definitions
A “system” is not “self-organizing” in a completely unqualified sense. Any suggestion that it is can be countered by several ingenious arguments to show that no such thing exists. I n fact, the concept of “self-organization” is rightly applied to a relation that exists between an observer and the object he observes. “X” is “self-organizing’) insofar as its activity leads a sane observer to assert this property of “X”and his relation to “X.”By way of a definition, when we say that “X” is a self-organizing system we mean (i) that “X” appears to become more organized and (ii) that as we observe it “X” forces us to revise our idea of “organization,” or to reselect the “frame of reference” (a system “structure”) in which this organization appears (and in which it is occasionally measured). The revision is necessary when observing “X”, in order to keep track of “X” behavior and to render a coherent account of it in our own “scientific” language. Wiener [7] [a}, Beer [ 9 ] , Mesarovic [ l o ] [ll},and von Foerster [ 4 ] have given strict and consonant definitions of a self-organizing system. The term is wedded to the field of control mechanism theory by axiomatic structures such as Pun’s [I21 and it is used rather more broadly in connection with Bertelanffy’s [I31 abstract system theory. For the present purpose we shall use von Foerster’s definition in which Shannon’s [ l a ]measure, redundancy, is used as an index of organization and according to which a system is a self-organizing system if and only if the rate of change of its behavioral redundancy, R, is always positive. Formally, if H,,, is the maximum informational entropy or variety (a function of the possible states, in the state description of the system) and if H is the informational entropy or the variety of its behavior (a function of the states which are occupied throughout an observation) then, from Shannon,
R
=1
- H/H,,,
and von Foerster requires that
dRldt > 0 116
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
for any self-organizing system. It is readily shown that ( 1 ) is satisfied, providing the inequality
H dH,,,/dt
> H,,,dH/dt
holds true. Several cases are considered in the original paper. Adaptation corresponds to the case when H,,, is held constant and -dHldt > 0 when, also, dR/dt > 0. There are also systems embodying some developmental process which increases the number of elements to be considered in a state description while maintaining H as a constant, and in this case d R / d t > 0 because dH,,,/dt > 0; Finally these special cases of (2) may be combined to yield rather plausible images of growth accompanied by differentiation, of the kind encountered in the development of populations and of embryos. To appreciate this formulation we must emphasize that a “system” is not, in itself, a matter of fact, such as a physical object. It is an abstract model, constructed in an observer’s descriptive language L* (often, though not always, a “scientific” language) which has been identified with the physical object (by specifying procedures for observation and measurement and other procedures for parametric adjustment). The bare bones of a system describe its possible states and their structure, hence a framework specified in L* that limits the set of hypotheses that can be posed and the relevant measurements that can be made. The behavior of the system is a sequence of states. Observations of this behavior, sometimes contingent upon a particular manipulation of the system parameters, provide evidence to validate or deny the hypotheses that have been posed. The measures R, H , and H,,, are, of course, determined with reference to the basic structural framework and must be redefined whenever it is changed. It is not difficult to show that in some circumstances an observer is impelled to change his frame of reference in order to maintain the joint consistency and relevance of his observations. Thus the embryologist has every right to regard the embryo as the relevant object of his investigations; but in order to make sense of it he is bound t o perform experiments which are (or were until quite recently) formally incomparable. (The first experiments entail state descriptions of cells, the next entail state descriptions of tissues, and so on.) Similarly a psychologist has every right to address his enquiries to an individual baby; but, in order to do so, he must examine an apparently disconnected sequence of behaviors that refer to whatever bits of the environment happen to occupy the baby’s attention. There are many cases where the behavior in each member of a 117
GORDON PASK
sequence of systems reveals an increasing degree of organization, due to changes in H or H,,,, or both. It is often also true that the sequence that is generated by successive redefinition has no limit apart from the arbitrary demarcation between disciplines; for example, when the embryo becomes a baby, embryological inquiries give place to psychological inquiries. (We comment that, even if an observer insisted upon maintaining the original state description, his observations would become uninformative. Even if the baby is described in terms of the states of its cells the resulting description is not pertinent to paychological inquiries. We need not argue the issue of reduction between different levels of hypotheses. At the present state of knowledge, cell states may be used to predict cell states, but cannot be used to predict the decisions of an organism.) Now the whole sequence can be justified in L* by a statement like, “I am looking at the physical object called an embryo which, for all its changes, I take to be a coherent entity,” or like, “I am looking a t a baby.” The justification rests upon the fact that other observers, using L*, understand and agree with these statements which (because they have a higher logical class) are metastatements about the sequence of systems and have no direct connection with the observations that are made within the systems, although their cogency may be supported by the behavioral evidence. These metastatements associate the sequence of systems and allow us to regard them as a whole. I n particular, they legitimize an organization created as the disjunction of the several different systems which (providing that d R / d t > 0 for each of its component systems) is called a self-organizing system. Indeed, it can be argued that all nontrivial self-organizing systems have this form and are thus compatible with our original dictum. Any growth, for example, forces us to redefine the growing system unless it is uniform growth (as in the case of crystal growth which constitutes a trivial case of self-organization since the process could be accounted for, using a more competent state description, in terms of a simple rule). 2.2 Special Case
For the present discussion we shall deal exclusively with a special kind of self-organizing system encountered in the observation of learning and the interaction between information processing structures. The self-organizing system is manifest as a sequence of adaptive systems, each of which describes a localized and adaptive automaton coupled to other automata or an experimental environment. (Localized automata are automata with well-defined and finite sets of input states and out118
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
put states). I n practice the adaptive automata may be identified with an image of the computations carried on by a human being or an animal or a machine. Normally, the adaptive process is directed, in the sense that the human being adjusts his behavior to maintain or maximize some experimentally determined reinforcement, and the machine is designed to perform whatever computation maximizes the value of a variable that describes the state of its environment. Hence, the adaptive automaton can always be viewed as a control mechanism. The restrictions entailed by considering a finite set of outcome states (input-output state pairs or equivalently stimulus-response state pairs), over any finite interval, are no more severe than the restrictions that are tacitly assumed in all behavioral experiments. We shall not examine the origin of these constraints in detail, but comment that they may be interpreted either (i) as due to the fact that a man, other organism, or machine is characterized by a finite “computational capacity” and a quantized outcome space or “field of attention” that remains invariant and is contemplated for at least a minimum finite interval, or (ii) as due to a constraint upon our own methods of observation, akin to Caianiello’s [15] “adiabatic condition” that forces us to consider a behavior in terms of an invariant set of alternatives.
2.3 A Model We shall approach the issues of nontrivial self-organization through a model or simulation of a self-organizing system that can be built from more familiar automata, which we shall, in any case, need t o consider a t other points in the discussion. I n the first place we consider an adaptive probabilistic machine and show that, although it can act as a self-organizing system over a short interval, i t is essentially instable. A collection of such automata, combined with an over-all selective mechanism, prolong the stable mode of this simulation; but in order to produce an indefinitely stable self-organizing system it is necessary to introduce an underlying evolutionary process. The simulation has been carried out on [lS]a special purpose computer called the EUCRATES system [I71 as part of an investigation of learning and attention and, while trivial as a learning device, this model illustrates the difficulties embedded in the concept of a self-organizing system. 2.4 Formal Representation
(1) A finite informationally closed system Bodefined in L* is specified by its state description, and certain initial constraints upon the possible changes of state. Suppose that u EZare the most primitive states that 119
GORDON PASK
can be distinguished in L*. A finite state description is a mapping from a subset ,Yo of Z onto points C in a space of attributes C*. Consider a quantization of this space that determines a further set of discrete valued variables u* (which are the variables in a n abstract model defined in L*) and states u E [u,*, uz*,. . . u,*]. (We say that the system Eo which embodies this abstract model is in state u if c E u.)Call the mapping 2 0 3 U a description. An automaton defined in S,, is a further mapping F of the form
F ;[ul* . . . urn*]
--f
[u*,+~ . . . u * ~ ] ,
n>M,
where the product set [ul* . . . u,*] is called the input set and the product set [u*,+~. . . u*J is called the output set. It is convenient to rename these sets
x = [?A1*, . . . urn*],
x E X input states,
y E Y output states, Y = [ u * ~ +. ~. .,u*,], up = ( X , Y ) c u, Thus these inputs and outputs define a projection from Up onto the X coordinate of Up and onto the Y coordinate of Up. The formulation is consonant with the work of Ashby [18],Loefgren [19],and Rosen [ZOI. (2) A fixed automaton computes a function
Y
=
f (x),
(3)
or, if time t is quantized and if we adopt the notation
for the states selected a t times t = 1, 2 . , , , then the above relation is interpreted as ylt 1 =f(x,lt.
+
The input of this fixed automaton may be manipulated by and its output may act u p o n some external entity, such as the observer or an instrument. On the other hand, if n > M its environment may also be specified in E,,. In this case the states of the environment will be u E [U - Up], and the coupling between the automaton and its environment will be defined by a pair of mappings
A ; Y + [ U - Up] and B ; [U - Up] + X . Normally the behavior of the environment will be defined by a relation = f * (9) where $ is a finite sequence of selections of y E Y when the 120 IL:
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
environment may be considered as a further automaton. In any case the automaton and its environment represent a pair of coupled subsystems as in Fig. 1. We comment that, if n = M , either Eo is not
'rn The Controller
Y
-
The Environment
I The Environment
P
Parametric Coupling Varying Value of
FIQ.1. A controller.
completely closed, or the automaton is sessile, or it has a cyclic pattern of activity. (3) A variable automaton is capable of computing several functions f+according to the value of a parameter q3 which indexes the selection of f from a set F of functions. Hence
Y
=
f+C4.
(4)
121
GORDON PASK
The usual formulation images computing the function
+ as the output of an over-all controller
4
= S(X3
Y,)
(5)
and if g has a directed property in the sense that the variable automaton obtained from (4) and (5) as
Y
= fgcz,r,,(x)
maximizes the value of a payoff function 8,defined over the domain of the states of its environment, then it is called an adaptive automaton, which involves an hierarchy of control. Fig. 2 illustrates the structural consequence of an hierarchy of control. Its mathematical origin is the fact that g(x,y,) is a function of higher logical order than the f(x), which it selects (and this, of course, determines one kvel of organization in the hierarchy of control). If the adaptive automaton and its environment are both defined in s,,,the system is dosed and the payoff function will depend upon states u or finite sequences of states 6. If Eo is not
IF 4 h a s m v a l u e s , this is equivalent to
FIG.2. An adeptive controller.
122
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
closed, 6’ may be an arbitrary reinforcement. Further, if the adaptive automaton and its environment are both specified in Eo, convergence of the 4 values is guaranteed, j . +2 +- . . . T where T is either a value of 4 or a cycle of values with the characteristic that it will maximize O(4). Ashby points out that the least specific control criterion (it is hardly fair to call its measure a “payoff function”) is stability, and he demonstrated that any dynamic and informationally closed system will approach a stable state. (This may be a point equilibrium or a cyclic oscillation which is repeated, in which case, the terminal condition is a dynamic equilibrium.) If a subsystem can reach several stable states from a given starting state, the particular terminal condition depending upon the value of a parameter, then the system that includes this subsystem involves an hierarchy of control and Ashby calls it ultrastable [21]. The corresponding paradigm for the case of an incompletely closed system where 6’is a reinforcement variable entails the idea of “survival.” Eo is defined, or its mechanical representation survives, if and only if certain physical conditions, indexed by some of the u*, have been satisfied. For a biological system the critical conditions are conveniently described as limits upon essential variables like body temperature. We stipulate that an organism survives if and only if the values of these variables remain between these limits, hence the corresponding system is defined if and only if certain of the u*, indexing essential variables, have values between u*,,~ and u*,~,,. The parameter changes in ultrastability, may occur as a consequence of an over-all controller sensing the fact that u * l t is in the neighborhood of u*,,, or uXmin. To demonstrate this point, Ashby built a device called the “homeostat” [el].It consists of four interconnected positional servomechanisms and an over-all controller which is an arbitrarily determined and preprogrammed number selector. A particular plan of interconnection between the servomechanisms sets up a velocity coupling and (together with the transfer functions of each device) determines a function f, where the index $ is the number of the interconnection plan (and the number selected a t this instant by the over-all controller). The positional output8 are interpreted as the “essential variables” u*. Limit indicators on the output potentiometers determine “critical values” u*,,, and u*,,,~~, and if u*>u*,,, or u*,~,, > u* then a signal is delivered to the over-all controller which selects the next number, 4, on its preprogrammed list. and that no Assume that the homeostat is stable, given a plan u* contravenes the limit condition for “essential variables”. Suppose 123
GORDON PASK
that one of the positional output potentiometers is arbitrarily disturbed to provide environmental input, the homeostat may return t o its equilibria1 state or it may become instable, in the sense that either u* > u*,,, or that u * , ~>~u*.I n this case the over-all controller will select a number dZ from the list. If stability is achieved, given that C#I = dz,no further change occurs. If not, another value of $ I is selected. Since the homeostat is designed so that some value of $t will induce stability, against any perturbation in the experimental repertoire, it always survives. Haire, Harouless [22], and Williams [23] have recently made a much larger homeostat and have extended the work done by Ashby on the original device. Chichinadze [24] has constructed a homeostatic model with a memory capability, and Tarjan [25] has extended Lypuanov stability to such cases. Similar comments apply when the computations performed by an automaton are “probabilistic.” The automaton (and possibly also the observer who constructs E,, in terms of L*) has access only to state probabilitiesn(x,) = 17,,xi E X . The appearance of an input state (which, for this purpose, we call xi*) conveys imperfect evidence concerning the existence of xi. Alternatively (or in addition) the state changes of the environment can only be defined “probabilistically” so that f* is replaced by a finite matrix I7 = 11 17 (xi I xj) I( = I/ Ilij (I. Uttley [26], [27], [28],[29] was the first to design a conditional probability machine able to estimate values of 17(xi I xj ) and n ( x , ) as numbers p(xi* 1 xj*) and p(xi*).Such a machine infers the existence of xi E X even if xj* is the input state providing that the value of p(xi* I xj*) exceeds some arbitrary limit embedded in the design. Similarly conditional probability machines, like Steinbuch’s [30] “learning matrix” and a device due to Katz and Thomas [31], which are related t o or derived from Uttley’s work can make “probabilistic” selections from their output states. (The term “probabilistic” is very tenuous since an observer need not remain ignorant of an automaton he has specified.) I n fact, the automaton is provided with (or, as later, it can generate) “probabilistic” values 1 > p(yiIxi) > 0. (We omit the x* notation and assume that input states are accurately determined although f * is not.) An M component vector of these weights (llpjo\lgiven that 2, is selected from X ) is presented as a bias to a process that is independent of any other aspect of the system, but which may, of course, be specified as part of Eo in L*. The output of this process is illustrated in Fig. 3 as an index value f which selects a function f from a set 9. The set .F contains M subsets corresponding to the M outputs states and llpjoll biases the so called “chance” process in such a way that pjo is the 124
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
Chance Process
Y
I
I ‘
The Environment
FIG.3. Probabilistic machine.
“chance” that the output state yi E Y is selected given x, E X or equivalently that fE is included in the corresponding subset, Consequently this automaton can be represented by the relations
Y = f,(4 4‘ = Chance, (p(y/x), z) = Chance, (2, g,)
(6)
Although it is often convenient t o express this in the form P(Y) = x ( P ) , which implies
+
P(Y1l-t 1 = x l t (PI= Y l u w 7 1 , (7) where p ( y ) is an M component output state probability vector and where P = pji and 17 = 1) I7,ll are the state transition probability matrices that characterize the automaton and its environment. ( 5 ) I n the adaptive form of probabilistic automaton the functions computed are doubly indexed
Y
=
f,,,>C.)>
4
=
Chance, ( ~ ( y l x )x) , = Chance, (2,
4 = g (5,G),
such that
c,),
(8)
0 + Om,, 125
GORDON PASK
However, it is often possible to interpret the rule g as a change in the values pji in P and thus to obtain a relation corresponding to (7)) namely,
P ( Y ) = 4Pd
(9)
If, for example, 8 is a binary variable the potential across the condenser in Fig. 4 can be shown to estimate the probability, given X~ EX,
Equivalent to
x closed if x i = 1
“‘I
e==
-
e Y closed if y, = 1
FIU.4. Probabilistic circuit.
that yjeY entails 6 = 1. A machine (although the usage is variable, a machine is taken to be a realized physical device) of the kind in Fig. 5 will derive m - M estimates of this form, which, from moment to moment, define m-M matrices PQ.(The argument is more elaborate if 0 is not binary, but providing that 1 3 8 0 is not really different.) Andrew has reviewed the field [32], [33]. If an adaptive “probabilistic” automaton (characterized by P& and its stationary “probabilistic” environment (characterized by IT) can be specified in Eoconvergent adaptation is guaranteed. Thus P+ -+ P,. The output state probability vector p ( y ) = p ( x ) [I7( PT)]will either be the fixed point vector of n(P,) or, if this process involves ergodic subsets only, of such a subprocess or, in the limiting case, it may define a trapping state. 2.5 The Model
Our simulation of a self-organizing system involves a sequence of systems S,,r = 1, 2, . , . , ea‘ch of which is composed of an automaton characterized by P6, (describing a machine M,) and its environment characterized by IT, (describing a physical realization Z?). In addition we postulate an external selective mechanism A effecting a rule denoted as 126
XI
I
t
---------
xa
xm
Select 1 Column of Analog Stores for Interval At
+
t
I
t
D
I
n Y1 Ya
Ym
FIG.5, Details of probabilistic machine.
ni
!=
GORDON P A X
h that selects the value of r . Formally, the mapping E, + h (3,)is a functor with the stipulated property that r=p
l h (EJ E 8 = u ar, .
r-1
As shown in Fig. 6 . The coupling between M , and 2, involves the input states X, and Y , and, as before, we denote the product set of outcomes
I
~~~~e
M I i s shown
as selected
FIQ.6. Over-all picture of selection among probabilistic subsystems. as U, = X,, Y,. Arguing from indifference we assign a matrix with equal values in each entry for the initially selected P+,l. Now it is required to simulate a self-organizing system of the special kind discussed in Section 2.2. Consequently the condition
d R (E,)/dt > 0 128
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
or, since the number of input states and output states is invariant, the equivalent condition
-dH ( S r ) / d t> 0 must be satisfied. Indeed we can stipulate that Er is dejned or that its physical realization M,, Z, survives if and only if it is always the case that dR (&)/ dt > 0. Perhaps the simplest rule for ensuring that dR (B,)/dt > 0 is embodied in the payoff function feedback that is illustrated, where (if 0, is the payoff function over the domain U, of Er)the system maximizes Or = H (Er).[In practice, 0, is proportional to an approximation of - H ( E J. ]-. I n general, as P+?--f PTr,the finite difference AO,, which is taken as an approximation to dR (E,)/dt,is positive. However, a t or before the moment when P+,= P,, AO, = 0. Hence, although Er is a selforganizing system initially, it is an unstable self-organizing system. Could this be otherwise, if some other initial assignment of P were chosen or if a different rule were embodied? Since the system d”, ar is a particular case of Estes’ [34] conditioning model, in which the trapping state is not uniquely determined because we only require any maximally regular behavior, the answer is known to be “no.” Certainly, more subtle rules exist for maintaining d R (E,)/dt > 0 over a longer interval; for example, it is possible to incorporate a limited “forgetting” capability. But no such expedient can prevent the eventual demise of E, as a self-organizing system. Wattanabe [35], for example, has studied the convergence of statistical adaptive processes. If a , j3, and y are positive or zero constants Wattanabe suggests the form, for t > to,
H(E,)-Lt
=a
(t - t,)B.
e-yt
which can be fitted to the output of statistical learning models (in specific cases to Bush and Mosteller’s [36] model and to Luce’s [37] model) or to empirical data (either from learning experiments involving organisms or from a machine like M r ) . I n fact, when we start from a maximum variety condition with equal entries in P this equation can be fitted with p = 1 and to = 0, hence
H(Er)l-t = ate
-y‘
or
erlt
R(tZ,)Ilt = 1
-
--__
at
e-yt
constant
129
GORDON PASK
We detect the condition Ad, = R, > 0 and, when it occurs, remove M,, 2, from the simulation. What happens after that! The statement 0 0 , = R, > 0 is signaled to the selective mechanism A (which, in our model, we conceived as a mechanism of attention) and the transformation h is applied to the system E, to generate X ( E r )= (Ertl)which We define is embodied in the machine M r + , and its environment Z,,. A to satisfy E B as before and, also, (i) h (Sr)
+ At
> 0, where t, is the instant at which AB, = R, > 0 and where At is an interval At > 1. Of these conditions (i) excludes the possibility of reselecting (ii) dR [X(E,)]/dtIt,
Er and (ii)is satisfied by any system other than Erunlessl7, is the stochastic inverse of the initially selected P4r(which we avoided by making each L‘,.embody some trapping state and assigning P4r with equal initial Hence application entries). Thus it is legitimate to specify A(&) = Er+l. on each occasion when do, = of the transformation A, starting with El.”,, R, > 0 gives rise to a sequence of systems
-
a* = [El --+ h
(El). . .I,
or to an over-all system m a* =u 0,h,
where, as before, r=p
9 = u A’ (El)*
(10)
t-1
Since each member of Z* satisfies dR ( S r ) / d t > 0 it is true t h a t d R (B*)/dt> 0 and that 8”is a self-organizing system. The entire model is shown in Fig. 7. 2.6 Physical Mechanisms
There are many network-like adaptive computing machines that can be used t o realize instable and, in this sense, trivial, self-organizing systems of the kind we have just considered in the abstract. Most of them can also be used as components in stable self-organizing systems and, before we embark upon the rather abstract issues connected with stability (and nontriviality), we shall briefly review these mechanisms. Any network consists of a finite collection of [38] elementary components (often called “artificial neurones”) that are coupled by connections or “fibers.” Signals are unit impulses (or sequences of impulses). If the network is finite and structurally determined, no distinction 130
I
I
I
:
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
131
GORDON PASK
between changes occurring in a fiber and in a component is necessary, providing that the state of each component is specified by a t least as many variables as the number of inputs it receives. [This is always true for the network but it need not be true for the automaton, as in 2.6 ( 8 ) . ] Consequently, adaptation of a network can be specified in terms of the adaptive changes that are brought about in the transfer functions of its elementary components. It is often legitimate to imagine a network in which all connections are made (or a t least in which there is a great deal of overconnection). Adaptive changes within the components appear, in this picture, as a differentiation of the network (whereby potentially available connectivity is blocked off). We shall consider the main types of artifact. (1) The elementary components are linear devices that summate an input quantity. A typical input quantity is the frequency or mean rate of impulses (analagous to the action potentials of real neurones) arriving a t the input of the elementary component concerned and symbolized as P. Each input connection (by analogy, each synapse) is either inhibitory or excitatory which we symbolize by a quantity o = 1 (for excitation) or w = - 1 (for inhibition). Further, each connection is associated with a weight 1 u i j p 0. Consider the j t h component receiving inputs from several other components indexed i. Its transfer function is
+
or, allowing a delay of At and noting that wij depends upon the invarient structure of the network,
I n simulations such as Taylor’s [39],[40],the adaptation of a given synaptic connection on a component depends upon the previous inpulse frequencies it has experienced. Thus a t t = t, and starting a t t = to,
Taylor has shown that suitable networks of this kind can adapt to discriminate patterns of excitaion that are applied t o their input. I n fact, a great deal is known about the pattern-recognizing capabilities of different networks. The matter has been approached from an analytic viewpoint by Novikoff 1.111and Lerner [42],from a synthetic viewpoint by von Foerster [43], Inselberg [44], and others [45] working a t the University of Illinois, and by Aizerman 1461, in Moscow, who has recently 132
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
demonstrated a separation algorithm for a large set of patterns. I n the network of Fig. 8, for example, von Foerster defines an action function as the distribution of excitation passed from the input receptor layer A (a retina perhaps) to the computing layer B (manifestly an action function depends upon the coeEcients wii and aii if i indexes elements in layer A and j indexes elements in layer B). Imagine an indefinitely large number of infinitesimal elements. Let a denote displacement along the A layer and b denote displacement along the B layer. Defining the input excitation as /3* and the ouptut excitation as 8, it can be shown that for the illustrated network
so that this network is a sensitive contour detector. Its action function 9 ( a , b ) is a member of the class of binominal action functions which have several important properties such as producing no output for a uniformly distributed input and yielding fmther binominal action functions if layers like A and B are iterated through a network. The immediately interesting point is whether or not 9 ( a , b , ) could arise by any reasonably adaptive process. I n this respect, von Foerster’s group have considered a “maturational” adaptation. (In other words, they
-a
Layer A
Layer B
-b FIG.8. Network for binomial action function.
133
GORDON PASK
ask whether fibers could develop from elements in A to others in B according to some plausible plan and in such a fashion that 9 (a, b) would characterize the resulting network.) Here, of course, aij = 1 or 0 and w remains to distinguish excitatory from inhibitory fibers. It can be shown that 2 ( a , b ) will result from a random walk development process. Given a one-to-one mapping from A to B (such as the mapping 1 --f 1, 2 + 2 , . . . m +. m), consider chance perturbations of a developing A fiber from the assigned position in B. If these perturbations are generated by a random walk process which has variance Var +for excitatory fibers and Var - for inhibitory fibers and if Var -)) Var + the resulting network will, on average, transfer activity from A to B according to 9 (a, b,). We comment that although the (‘maturation’’ or adaptation is statistically specified in the sense that a random process is involved, there is nothing haphazard about the specification. The random process is independent of directional bias and has the caliber of a forcing function that represents (in an abstract model) the physical development that occurs due to the energetics of a real brain or artifact. The crucial assumptions are a sufficient number of independent developmental steps and the variance inequality Var - )) Var
+.
(2) Many adaptive networks feature threshold components with characteristics
Here, Pj is interpreted as a unit impulse of unit amplitude (or m y arbitrary and constant value) and the term yi t is called the threshold. I n-the simplest case the threshold value is constant so that yj t At =
+
Yi
Lt
Yi.
McCulloch and Pitt’s [47],[48]networks utilize elements of this kind but adaptation is not explicitly considered. Widrow [as],however, has constructed threshold artifacts using “adaline” devices with “memistors” as adaptive coupling elements while Willis [50] has simulated a number of adaptive networks in which various rules of adaptation have been used to vary the values of the aij and the wij. Willis [51], for example, considers an adaptive process that involves negative weights as well as positive weights. A particular adaptation rule that leads to successful performance in a, single-layer recognition device is a function of input pi and output pj. 134
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
Pi I t A aij
where E;
t =
pjlt
+At
+ 0.01 = A
1 0 1 0
1 1 0 0
= aij wij and Aa; = a;
A aij 11
- 0.01 = A 10 0 = A 01 0
It + A t
=A00
- a; It.
(12)
By far the largest block of data on adaptive threshold systems is due to F. Rosenblatt [52],[53] [who has conducted many experiments with ((perceptrons”]. I n most cases the change in coupling coefficients (the A quantities cited above) depends upon some external instructor who adjusts the value of a reinforcement variable 0 according to his approval of the perceptron’s behavior. (The instructor may, of course, be a program.) I n an “alpha” perceptron (assuming pi = 1 or 0 only),
A
a;It + dt
=
epis,lt;
(13)
whereas, for a ((gamma” perceptron,
where the index i refers to those elements which may be coupled to the j t h element and where M is the number of inputs to the j t h element. Other modes of reinforcement are possible; for example, in various simulations
A aijlt
+A t = Q
t (,!Ii t p j l t
+ dzt) - constant
is the rule employed. None of these networks, least of all “perceptrons,” are limited to a couple of layers or even a laminar topology. So far as the perceptron is concerned, the minimal arrangement is shown in Fig. 9. Although many structures are discussed by Rosenblatt we shall consider (apart from the minimal case) only the most elaborate perceptron structure that has been realized, which is shown in Fig. 10. Von Foerster considers a particularly interesting plan for adaptation [54]. Define the logical strength of a Boolean function as the number of zeros in its truth table representation. Let 4 index the set of all Boolean functions of m variables with the restriction thet +o > $b if the logical strength of a is greater than the logical strength of b . We now consider the network in Fig. 11. Let each element compute a function with logical strength of 0 (namely the least specific function, with all unit entries in its truth table, the tautology). The output from the summating device, 135
GORDON PASK
External Reinforcement ...........
A A Retinal Elements
Response Units
A A
.......... Association Units
FIG.9. A simple perceptron.
Extarnal Reinforcement
A 'A
7
\/ '
Retinal Elements
Association Units
FIQ.10. Back-coupled perceptron.
136
Response Units
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
v
output Summation
FIU.11. Adaptive network.
xi pi, so
rictuates a q3 increasing process by operating upon each element that, for an arbitrarily chosen sequence of binary inputs, k1.f At > q3ilt whenever the different &It A t > /lilt. l h e network adapts to become increasingly specific. We may regard this as a discrete maturational process or conceive the adaptation taking place inside a probabilistic device when these requirements are satisfied by one of Uttley’s models for generating a more specific structure within an overconnected conditional probability machine
xi
+
xi
+
WI. (4) Maron [56], [57] examines the behavior of fully connected networks of m input elements characterized by
&It + dt
i=m
=
1
if
aijltp i l t > y j l t
t-1
as in Fig. 12. It can be shown that such elements have a rational inductive inferential behavior, according t o the tenets of Bayes’ hypothesis. By taking logarithmic measures, the criterion can be reduced to an additive form. (The present representation is more convenient for our discussion). In view of a probabilistic interpretation that can be placed 137
GORDON PASK
FIG. 12. A fully connected network.
upon this kind of network it is important to notice that two quantities, tzij and yj, are variable. (Maron’s treatment is closely related to Uttley’s two variable probabilistic computation hypothesis [27].) (5) Crane [58], [59] has developed a computing system in which the unitary components are active transmission lines. An electrical version is shown in Fig. 13 where discharge of condenser C, is assumed t o
FIG.13. A neurister transmission line.
dissipate energy along a path which leads to closure of the contact T,,,which, in turn, leads to discharge of condenser C,,,. Hence a “wave” of potential decrement is transmitted from any point of stimulation and is accompanied by a contact closing “wave.” I n practice, the contacts may be realized by a thermistor material that is heated (impedance lowered) by the discharge current of an adjacent condenser. I n this case the contact closing “wave” is a thermal disturbance. Since the condenser C, takes a finite interval to recharge t o a critical potential 138
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
V,, through R, there is a refractory interval (after a wave has been propagated along a line) within which no further wave can be propagated. Crane calls his active transmission lines “neuristors” since a nerve fiber is a particular realization of this mechanism. Considering the thermistor embodiment, we can create either thermal or electrical coupling between “neuristor)’ lines, as in Fig. 14 and Fig. 15. Given the further facility of a undirectional propagation element it
FIG,14. Thermal and potential coupling.
FIG. 15. Thermal coupling, potential uncoupled.
is possible t o compute the “or” function as well as the “not.” Hence, it is possible to compute any Boolean function with neuristors (and it has been shown possible, in fact, to compute any Boolean or probabilistic function, economically, with neuristors). An adaptive “neuristor)) network has been constructed in my own laboratory [60, 61, 621 and is outlined in Fig. 16. The neuristor impe-
Impedance Dendrites
Instable Dendrites
FIQ.16. A chemical realization of a variable neurister transmission line.
139
GORDON PASK
dence R and its nonlinear “contact” are realized by different forms of the same physical process and, although this is not a necessary expedient (we could, perfectly well, change the coupling R by one process and the T coupling by another), it leads to some interesting additional properties. The physical process is the development of metallic dendrites, controlled by electrolysis. R,simulation, by stable dendrites, is readily achieved, a t a crude level. MacKay [63, 641 used a refined form of dendrite as a variable impedance and proposed its use as a delay component. (The DC potential between a pair of electrodes induces the development of a relatively conducting dendrite in a relatively nonconducting metallic salt solution, and the impedance between the electrodes is sensed by an AC current. However, if the solution is so constituted that a back reaction tends to dissolve the dendrite, so that the electrodeposition is countered, then it is also possible to produce unstable dendrites that act as T , components). The adaptive process can be reinforced either by varying the electrolytic current or the concentration of the metallic ion from which the dendrite is constructed. Either R, or T,development can be fostered; but since there is no hard and fast distinction between a stable and an instable dendrite, adaptation may also give rise to ambiguous components (or ambiguous couplings). Of course it is possible to separate the Rz dendrites from the T,dendrites, for example, by growing these components in isolated chemical systems. But there is no need t o do this in order to grow a network, and perfectly reasonable adaptations can take place in which, although it is possible to define the performance of the network, many of the physical components cannot be unambiguously assigned to the R, or T,form. Similar comments apply to adaptation in neuristor networks built from passivated fibers (Lilley’s [65] iron wire models) where a network simulation starts (in its overconnected form) as a steel wire scrubber hurled into cool 70 per cent nitric acid and differentiates as the often stimulated fibers wear away. A more refined approach has been adopted by Stewart [66],who has combined passivated neuristor elements with a dendritic mechanism for changing their coupling as a function of their activity. The same comments apply with greater cogency to neuristor networks realized as a mesh of polymer macromolecules, the polymerization of the network being controlled by local catalysis modulated with impulses passing along the partially constructed transmission lines. (This, of course, is the informationally desirable scale proposed by Bowman [67], and physical chemists admit that the system is marginally feasible). 140
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
(6) Beurle [68, 691 and, more recently, Parley [70] and Clark [ 7 I ] have simulated networks that are statistical models using digital computer programs. Babcock [72] has built a special purpose machine for work of this kind. I n a typical simulation, the artificial neurones have well-defined properties somewhat similar to the threshold devices described by ( 1 1) but with the added property of a refractory interval in which, after excitation, the element cannot be excited again by any input and a relative refractory interval in which the element can only be excited with difficulty. (These refractory properties are realized by a variation in 'yi after pi = 1.) The connectivity of a given simulation is defined according t o a statistical rule and a random number table. (Hence, any one simulation is well specified.) The statistical rule usually stems from empirical data about the dendritic field of real neurones and may, for example, stipulate that the probability of connection between unit i and unit j falls off exponentially with the physical distance between i and j. The simulation involves a very large number of unitary elements; and we are concerned with its macroscopic behavior which is found to be invarient with respect to changes in the random number table which will, of course, generate an assembly of different, well-defined, networks which are supposed t o characterize an infinite ensemble of networks with the stipulated statistical properties. The macroscopic behavior of these simulations is characterized by waves of excitation that are propagated in various ways through the network. Dynamic adaptations take place and, in addition, the interaction between waves of excitation gives rise to more or less permanent structural modifications. The network may be self-oscillatory, and stable modes are possible in networks that involve sufficient inhibitory connections. The elaborate perceptron is a special laminar case of a potentially self-oscillatory network. (7) Pappert [73] has pointed out that a network capable of computing the Z2" possible Boolean functions of m binary inputs must be adaptively controlled by a parameter that assumes 22m values. For large values of m this structure is gigantic and unrealizable. Consequently we cannot really build networks that adapt to recognize any pattern of stimulation imposed upon their input, if the dimension of the input is reasonably large. Some constraint must be introduced although, as Pappert also points out, the constraint need not be too severe. The problem of adapting to recognize whether a given input pattern belongs to A or B, where A and B are disjoint subsets of the set C of a11 input patterns, is tractable provided that the number of members of A v B 141
GORDON PASK
is modest and that no response is defined for members of C - ( A v B ) . (Notice that no logical restriction has been imposed upon the composition of A or of B.) Often, restrictions arise out of the choice of transfer function. It is well known, for example, that threshold components, characterized by (11) can only adapt to compute linearly separating functions. This limitation is considered by Scott Cameron [74] and Singleton [75]. Briefly, an input pattern to an m input threshold component is a binary m vector and the state of this component (if the threshold y is constant) is a point in a space with the m coordinates uij. Adaptive variation of the uij locates a hyperplane defined by
c
wij ocij = yi.
i
If a pair of input pattern vectors can be separated by some hyperplane determined by some assignment of azj values, they are linearly separable and the threshold element can adapt to discriminate between them in the sense that pj = 1 for one and pj = 0 for the other. But adaptation within a single threshold element can achieve no other discrimination and it can be shown that the proportion of linearly separable functions of m input variables decreases very rapidly with an increasing value of m. The possible adaptations of a single layer of threshold elements is also, of course, crucially dependent upon the d rule chosen for (12) or (13) or (14). (This issue is considered, a t length, by Willis [5U]and Rosenblatt [52].) Less restrictive conditions upon adaptive modifications are probably desirable. Pursuing a suggestion due t o Willis we might restrict our attention to automata capable of computing only disjunctively decomposable Boolean functions of the form. f(z1
.
f
- zm),
where there exists some other function
. . zt),
g(z,.
112
>I > 1
such that f(z1 *
*
’ 51) = 9 @1+1
--
*
x,)
or, generalizing the idea, functions of 1.tf
* * *
%&)
that can be expressed by 1 functions of subsets of a variables, m > u > 1. If this structure exists the automata can be reduced t o subsystems such 142
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
as those in Fig. 17. (It is necessary to distinguish this form of decomposition of computing process from the partitioning of an hierarchy of control which, in a sense, is a decomposition of the organization imposed upon the computing process.)
FIQ.17. Subsystems in a partitioned system.
We comment that an hierarchy of control could, very readily, be introduced to select the partitioned subsystems in Fig. 17 (so that, for example, either Blor B, is processing the input data), Indeed, such an hierarchy will be needed in any case to convert the d adjustments made (as suggested above) in terms of event frequencies into adjustments made upon the basis of “desirable” event frequencies. (8) It is prudent to lay a rather different emphasis when discussing the constraints that act upon the adaptive or even (‘growing”structures of 2.6(5) and the statistically specified, self-oscillatory networks of 2.6(6). (Incidentally, such a network can perfectly well be “grown”; and if its components are rendered infinitesimally small, they reduce to neuristors. ) Neither the transfer function of an individual component nor its detailed connectivity is particularly relevant to most of the enquiries we might make. The whole idea of an individual component becomes a fiction as we reduce the volume in which a signaling event is manifest as a local change of state and as [due to the ambiguities of 2.6(5) which, at a microscopic level, are no longer optional] we blur the distinction between a signal and the structure that transforms this signal. Macroscopic properties are, of course, important and these may be
143
GORDON PASK
either (i) properties of the material from which the artifact is built (if a neuristor network has been fabricated with a thermistor layer as the nonlinear constituent, this fact determines a maximum and a minimum transmission rate) or (ii) topological properties of a structure (given an anisotropic amplifying medium like a plane of neuristor, a torroidal connectivity will lead to self-oscillatory action). I n general, the physics of a self-oscillatory network determines its stable and resonant modes. Moreover, a designer cannot get rid of these. They are constraints upon the assembly. Peter Greene [ 761, [ 773 has suggested how a designer can take advantage of their existence when realizing a self-organizing system. We reiterate the point of 1.8, that interesting automata are, in fact, physically realized. Their design is not a matter of logic alone but a compromise between logic and nature. Finally, recall the distinction of 1.2 between localized automata that can be identified with specific objects and unlocalized automata that reproduce and possibly evolve in a medium. Although no hard and fast distinction exists, the structures of 2.6(5) and 2.6(6) are more akin to media than computational objects. The automata that reproduce are organizations, which may be spatially localized or which may, like stable oscillatory modes, be spatially distributed. (These organizations are automata in the sense that the existence of stable activity maintains the condition in which certain computations take place.) The important point is that cybernetic concepts (like “an hierarchy of control”) apply to an organization and only in very special cases to some localized structure. We shall return to this topic of realizing unlocalized automata in 2.8. 2.7 Control Hierarchies and the Stable Self-organizing System
The model of a self-organizing system developed in 2.5 is trivial because it is inherently instable (in the sense that eventually it will not appear to become more organized or more adapted). The same comment applies to any realization of this model, however sophisticated the mechanism that is chosen from 2.6 to embody it. I n order to examine the important distinction between stable and instable self-organization we must, as suggested in 1.2 and in 1.3,look a t the linguistic constraints entailed in our relation to the physical artifact. (1) The control hierarchy of 2.4(3) is isomorphic w i t h A of Fig. 18. The ci E C are called “subcontrollers)’ which interact with the environment by selecting the term y in the product pair u = x,y. The bj E B are higher level “subcontrollers” which select from the ci E C and A selects amongst the bj E B. 144
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
a,
C
Y
E
C
0
>
E
.-c
w
I
H
I
1
0 E
The Environment
The Environment
145
FIG.18. The system.,&. In the case of 4 values of C parameter and 2 values of B parameter these illustrations are isomorphic.
GORDON PASK
Mesarovic [I11 distinguishes between “causal” and “teleological” descriptions. (In a “causal” description we state the exact dynamics of an automaton over the domain of its inputs; in a “teleological” description we assert the basis of the automaton and the goal i t is designed to achieve.) For the moment we assume that all the subsytems in A are adaptive and that A! is (and must be) defined in L* in the latter fashion. For use later we adopt the convention that a statistically determined system is a “causal” system; in other words, it would be irrelevant and unnecessary to open the “chance” device in order to achieve a “causal” specification. The C,E C compute a particularly class of function and aim to achieve the subgoal of maintaining a particular feature of their dnvironment invariant. Similarly, the b, E B compute another class of function and aim to achieve a higher level invariance. Further, A selects among the bj E B t o achieve an over-all goal. Consequently, the hierarchy of control can also be interpreted as an hierarchy of goals. As one plausible identification of A@ the c, compute functions f as in (4)when their index i = 4.The elements bj compute the functions denoted as g in (5) selecting successive values of i = 4 and the over-all controller A selects values of j = r according t o the rule A. (2) As MacKay [78] has pointed out, i t is possible to regard the bj E B as selecting among symbols for the invariant features of the environment maintained by the ci E C (or equivalently the goal of bi is achieved by selecting among the subgoals). Similarly, the goal of A is t o be achieved in terms of a selection among symbols for the invariants maintained by the bi E B. Hence, in a certain sense, the behavior of the different levels in 4 is representable as a sequence of expressions in different levels of language. Let us make this idea more precise.
(3) An observer (who may encounter rather than construct Ji? so that he is ignorant of its exact structure) can specify various experiments upon this artifact in terms of the scientific metalanguage L*. For each experiment he must communicate with some part of d ,and his interaction amounts to discourse in a language that Cherry 1791 calls an object language Lo. We shall write Lo = V o , Go, go,because LO consists of a vocabulary or a finite alphabet of signs v E V o , a set of restrictions s20 which determine the admissible methods of concatenating the signs v E V o to form expressions va, vb . . . in LO, and the denotation of v E VO, written as go.Some signs, say v C Vlo, denote operations, admitted by Qo, for concatenating signs. If Lo is used to describe the behavior of some c, E C, for example, these operations depend upon the functions computed by ci and the functions, specified 146
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
in the environment, by the experimenter. Other signs, v E V20 say, denote equivalence classes of states, for example, if v E V20 and ui E U then 8 0 may establish the mapping vi t-)ui/E,where E is a relation of equivalence in U . We are admittedly using the term “language” in a slightly eccentric fashion. In the first place (like most formal languages but, unlike natural languages) V o is determined and invariant. Again (like most natural languages but unlike formal languages) the denotation bois specified in LO. Hence LO is an identified “language”. However, although we shall have occasion to depart from this convention, the eccentric usage is convenient. (4)It is possible t o distinguish between the languages L defined in L* and in addition, between the level of language$ L defined in L*. We shall denote the level of a language as q = 0, 1, . , . and use this index to assert that if Ln+land L“ are defined in L* then L?+l is a metalanguage with reference to L“ in the sense that further axioms are needed to derive L“+lfrom L“. We adopt the convention that 77 = 0 is the index of Lo,an object language, and comment that, if Ln,7 > 0, is defined in L*, it should, strictly, be called a system metalanguage to distinguish it from L*. Since we are committed to a ‘‘teleological’’ description there will necessarily be an hierarchy of experiments concerned with 4 which entail communication between an observer and the physical artifact in terms of an hierarchy of system metalanguages. If Lo is used t o communicate with ci E C, then the v E Vlo denote a set of equivalence classes of u E Ui, while expressions in Lo are the behaviors of some ci E C. Similarly, if L1 is used to communicate with bi E I?, then v E Vll denote ci goals or symbols for the invariants preserved in the environment by the ci E C. The L1expressions will be bj strategies of control and the axioms needed t o derive L1 from the object language Lo will be the specification of goals in the “teleological” description of 4. Finally, it will be possible to interact with A in terms of L2. The nontrivial distinction between languages at a given level depends upon a distinction between the type of their denotation. We shall use the index p = 1, 2, . . . for achieving this distinction and comment that if different values of this index are specified, Lqp,L;+l,then the denotation of Ln, and the denotation of L:,, are distinct ontological classes. (5) Any system, in the sense of 2.4, is isomorphic with a n identified language defined in L x ,although the converse is untrue. To demonstrate the identity notice that a system Er is specified in L*, by definition, and 147
GORDON PASK
that the description 8 is a denotation of equivalence classes of states. Thus, for Z, = U,, F,, b,, the alphabet V of a corresponding language t = V ,52, 8 consists of V , and V , of which V , is the set u E U,where €or each u,there is a correspondence u, t-) = oJE determined by 8,for uLEZ,. Next the operations in 8, are members of F so that Vl is the set v ++F and V , = 8 ( F ) u “0’)where “0” is composition. Since Y = V , u Y 2 we obtain Y d (ti,)u & (F)u ‘‘0’’ and b, is a part of 8. Finally since F c [ U , U , . . .] the constraints LJare operations that disallow some relations in [ U , U , . . .] - F . (6) I n his teleological description, Mesarovic distinguishes between levels and goals and hetween interactions that involve normal communication (inputs and outputs of subsystems) and those that involve goals (statements of evaluation). Thus a simple adaptive control mechanism is a single-level single-goal system if it is viewed in a teleological fashion. On the other hand, it can obviously be reduced to or discussed in terms of a causal system if the parameter-adjusting strategy and a sensible part of the environment have been specified. If the corresponding automaton is finite and localized and if the environment is stationary, this causal specification is possible a t a certain level of language (which will characterize a certain level of experiment). Let us call the level of language required to render a reduction from teleological t o causal representation possible qmaY.For the single-level single-goal control mechanism, qman = 1. Hence, if a suitably denoted L1 is defined in L* and if experiments are performed at this level of communication, they can refer to a causal system in which there is no distinction between goal interaction and normal communication. On the other hand if experiments are performed at the level of Lo the system will necessarily appear “teleological. ” The system A‘ is a many-level many-goal system (unless special restrictions are applied when it degenerates into a single-goal system). For 4 ,the value of qmax= 2, and consequently A appears to be causal in experiments conducted a t the level of L2 but teleological in experiments performed a t the level of Lo or L’. At the most, A may have four Lso,L,O four goals at level q = 0 in the possible object languages L,O, LzO, two goals a t ’1 = 1 corresponding to the pair of metalanguages L,’ and L,’ and an over-all goal. The system is representable in causal form in L2 hence qmax= 2. One of the most degenerate forms of A’ is shown in Fig. 29 (p. 179): although the hierarchy Lo, L1, L2 is preserved, all distinction between language types is obliterated by the expedient of minimizing communication (in the sense of normal input and output coupling) between the goal-directed subsystems. The structure in Fig. 29 is a typical sequential 148
x,
~
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
processing organization (of a kind we shall often encounter in connection with artificial intelligence) and contrasts with the parallel two, four hierarchy of the unrestricted A. Broadly speaking A would neither be degenerate nor reducible to a “causal” form if (i) the subcontrollers c, E C act upon environments that are incomparable in I,* or (ii) their goals are incompletely comparable in L*, or are partially incompatible or (iii) the over-all goal is no more than ostensively defined in L* or (iv)there is some evolutionary process that builds an hierarchy by adding on subcontrollers. With these comments in mind, let us return to our model of a selforganizing system. Let us examine the system in terms of an object language which denotes either states u c U or equivalence classes of these states. There are a couple of extreme possibilities, namely: (i) It is impossible to distinguish the E,. (ii) There is a distinct type label attached to any 3, that has once been observed. Assumption (i) is pl~usiblein view of the initial isomorphism of the subsystems M,. Adopting it, the experimental object language Lo will have an alphabet 8”(we write V o rather than V,O for convenience, since it is not necessary to consider VIo )with signs denoting equivalence classes of states that contain one representative member of each disjoint subset U,. Thus, if vl E V o this denotation implies r-p
V,
u,lE = v (u~,).
t)
r=l
Now it is true that the redundancy R ( V o )will increase over a long interval (since the construction of Lo implies that the observer is looking a t a statistically lumped version of the p-fold process [(Pdr,II,), A , ] . But since - H,,,,, ( V ” )is constant and - H ( V o )will fluctuate due to the discrete selections made by A , the observer will not see a selforganizing system and d R( V o ) / d twill not be consistently positive. Indeed, if the obvious triviality of having only p subsystems is removed by providing a mechanisn~,call it “J,” that generates an unlimited supply of Mr (so that A can continue to select subsystems indefinitely), it is no longer true that an observer experimenting in LO will discern an adaptive system, however long he looks. The proposed modification is indicated in Fig. I 9 but it could equally well be realized by a “Pandora’s box” system such as Foulkes [80]describes. Now consider assumption (ii) which suggests experimental communication in terms of distinct object languages L,O with alphabets so denoted that V: t)U,. The observer will now see a sequence of adaptive systems each with a goal of maximizing the index dB,/At E A R (V,“)/Atwhich spring up in succession. By observing a sequence of these, 149
GORDON PASK
FIG.19, Model for evolutionary system.
the observer could reduce the goal of “adaptation” to some causal form (like a rule 4 --f T).But he would be in difficulties about the action of A , in selecting the M,, which cannot be causally represented in any Lo. Nor, if “J” replaces the finite set of subsystems, can the action of A be represented in any combination of the L,o such as Lomaxwith
Vmax 3 u
7L P
=
u
(U?).
r-1
Indeed, to make sense of the system and to justify collecting the necessarily distinct subsystems into a sequence called a self-organizing system, the observer must invoke some further axiom, and in this way 150
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
he is forced to construct L1 wherein V1 denotes the set of subsystems 8,,E 8. At this stage the deliberate triviality of the model is revealed because the goal in LO and the goal in L1are isomorphic and the selective process h discerned by experimental interaction in L1 has precisely the same form as the rule 4 + T (in the sense of Mesarovic the model is a single-goal many-level system which, in the absence of “J,” could be transformed into a many-goal single-level system). I n passing, the triviality does exhibit one feature with an important analog in artificial intelligence if the basic process of problem solving is introduced in place of the basic process of adaptation [in 3.2(4) on p. 1851. As in the case .of A the possibility of ((causal” representation (and triviality with respect to self-organization) can be avoided by any of the expedients 6(i), 6(ii), 6(iii), 6(iv);being equivalent to adjoining “J.” ( 7 ) Any appearance of self-organization entails some ingorance on the part of an observer. The interesting issue is not the fact of this ignorance but the form it assumes. We shall consider a few typical cases, illustrating them (when pertinent) with our model. Type (i). A system is called “self-organizing” because a n observer who knows V o and its relevant denotation go (perhaps because the system is an artifact he has built) is ignorant of Go. As he discovers 520 the system’s behavior seems to become more organized. This is the marginal case of ”black box” observation, Suppose the observer wishes Goin order to encode information and communicate with the “black box”. He inductively infers the functions computed by the black box from its inputs (which he may control) onto its outputs. Increased knowledge of Q0 increases his experimental efficiency (particularly if the “black box” has adaptive parameters). The crucial point is that in order to make sense of this enquiry VO, go,must be well defined. Typically and nontrivially, the “black box” might contain a learning machine with a well-specified adaptation rule such as Gabor’s “learning filter” [81] or one of the systems cited in Andrew’s [32] discussion. Type (ii). An observer performs experiments in Lo and wishes to maintain communication within a given universe of discourse. If, in order to achieve this result, he is forced to communicate also in L1,he calls the system “self-organizing” [60]. A typical situation arises in experimental psychology, The object language of the experiment is Lao. The subject may also attend to irrelevant fields of attention by uncontrolled interaction in Lbo. I n order to maintain Laocommunication, the psychologist (as in an interview) issues instructional statements which amount to expressions in 151
GORDON PASK
L1. (MacKay [82] has discussed the semantic and informational status of instructions and similar metalinguistic assertions.) This simple situation can be simulated on the slightly refined model shown in Fig. 20. The chief refinements concern A, which becomes a probabilistic rather than a sequential rule (given a request to select some M , the mechanism A chooses one with a probability distribution pl,p 2 . . . p ) and the criterion for requesting selection (given that M , is selected, a selection is requested from A if At), 3 To > 0).
Inetruction Board
0
0
0
0
0
Stimuli Board
FIG.20. Demonstration arrangements.
The environments 2, are replaced by signal lamps actuated by the M , outputs and buttons actuated by an experimenter that deliver stimuli or inputs to M,. This communication takes place in Lo,the distinctions L,O, Lbobeing arbitrary. In addition, there is a further set of buttons which convey L1 instructions “look a t a,”“look a t b,” where the a and b are values of r . Any instruction momentarily lowers the limit To(interpreted as a sort of expectancy) which later returns to its normal value. I n addition, a 152
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
specific instruction signal such as “look a t 2,” is averaged, individually for each value of r , to derive an ‘‘r instruction rate,” g,. The values of the p , are computed in proportion to the g, and inverse proportion to the 0, (hence in proportion to the possibility of r adaptation). The experimenter is told that the machine will aim to maximize its rate of adaptation, but the correspondence between stimuli and the instruction button inputs can be concealed by shufflingtheir connections. I n these conditions the behaviors ofthe machine and of the experimenter prove amusingly lifelike. Type (iii).A system is called self-organizing because an observer, anxious to interact with it, either is or becomes ignorant of either the alphabet or the relevant denotation (of a given alphabet) that is required in order to sustain this communication. This ignorance may apply at a given level (the observer knows 7 but is uncertain of p ) as in “changes of attention” or it may also involve the value of 9 (as in “reinterpretation of stimuli”). I n the simplest case the observer is placed in the position of a biologist who (unlike the engineer) recognizes that in nature L; must be discovered by broad scrutiny of the animal in its natural habitat. (The admission that there must be a close match between the symbols in Ll and the stimuli used in an experiment is fairly recent and demarcates modern behaviorism from its naive precursor.) To cite a case, the visual system of an animal is simple as the frog accepts and responds to symbols in the space of four quite bizarre attributes of the environment, while excitations of the retina which appear like atomic stimuli to the experimenter fail t o elicit any response. These attributes could hardly be discerned by any number of meticulous enquiries intended to reveal principles of perception that are “simple” according to the normal tenets of simplicity. However, they are readily discovered by a crass, intuitive examination of what a frog actually does. A closely related issue is considered in 3.2(7). Most creatures also change their attention or their attitude. Hence a biologist countenances the existence of several LPoand makes p > 1 as a matter of course. Type (iu). A system is called self-organizing because its hierarchy of control (or, in Mesarovic’s sense, its structural hierarchy of goals) is modified and possibly extended as a result of discourse. A system of this kind will exhibit all the curiosities of type (iii) systems and type (ii) systems, but the behavior of these could be accounted for on the basis of invariant if obscure organizations (brought into play by specific agencies like mechanisms of attention). The type (iv) self-organizing system will be more or less indelibly modified by structural changes. These, of course, will alter not only the code i t uses but the level at 153
GORDON PASK
which communication takes place and the level of abstraction a t which data are processed. There is nothing mechanically absurd in the suggestion that such a system can learn concepts, and control procedures must have the logical caliber of conversations. Our idea of an invariant framework of languages breaks down when the artifact acquires the ability to build language systems. 7 and t . ~must be regarded as variables and our communication systems as approximations t o the existing state of affairs. (8) A type (iv) organization is derived from
-----
I
as in Fig. 21.
- - --
The Environment
I
FIG.21. Cooperative interaction between the subcontrollers in terms of control.
154
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
Andrea [83] has recently constructed an artifact that lies on the borderline between type (iii) and type (iv). As in a type (iii) system, Andrea’s device is hierarchically structured. At one level it learns about sensory motor connections, a t the next about the solution t o problems, and a t the next about sets of solution patterns. On the other hand, it has more than one mode of activity. It may seek external reinforcement or it may perform an internal reorientation of solution patterns, seeking to achieve internal reinforcements. Like a type (iv) system, its objective is to learn [84] not to have learned. It must involve an evolutionary process, in which subsystems compete, in some pertinent sense, for survival. But the development of an increasing level or organization depends upon cooperation between the subsystems. Mechanically speaking, the activities of the device will be distributed (which is a flexible and, in some cases, stable arrangement). Consequently, the invariant feature of this cooperative system (which may be embodied, for example, in the connectivity of a network) is an organization. The abstract automaton that images this physical structure is an unlocalized automaton. 2.8 Unlocalized Automata
(1) As indicated in 1.8 we are concerned with unlocalized automata that reproduce and evolve. How can these automata be conveniently represented? A localized automata has a representation, as in 2.4(1), that is isomorphic with a state graph. An unlocalized automaton can also be represented in this fashion, but the formalism is cumbersome and apt to be misleading. I n the first place, we must recognize the possibility that the automaton may not reproduce (that the physical machine it represents fails to survive). Hence there must be criteria whereby its state graph is or is not dejined. With Ashby [85] and Rosen [86],we must admit that an automaton does not reproduce itself. It i s reproduced because of a dynamic interaction with its environment. (The fact that we are talking about unlocalized automata implies that we regard this interaction as relevant.) So the state graph must be dejhed if and only if certain relations (an abstract reflection of material and energetic relations, to do with the “metabolism” of the machine) are adequately maintained. Rosen has formalized rather special cases of this situation. Next, the proliferation and evolution of the automaton need to be represented, and this entails a calculus for extending and modifying 155
GORDON PASK
the state graph. Rashevsky [87] has dealt with this problem for some biological systems but his transformation methods are difficult to instrument in the case of automata that compute. It is perhaps better to admit that an automaton is a property of the whedium in which it is defined. (If we regard the flux of physical constituents that form an animal as part of the environment of the organization “animal” then this medium is the environment. On the other hand, it is more usual, as in 1.2, to define a special internal environment, such as a brain, as the medium in which automata can evolve.) For economy and elegance we seek the least specialized medium that is possible. The proposal made by von Neumann [88] was an infinite plane of cells (a so-called “tesselation”) in which any cell i could assume a finite number of states u E Ui. Any state subset Ui includes a special “null” state u, E Ui. Von Neumann’s representation has been developed by Burke [89] and Loefgren [19]. An automaton is defined as a configuration of notnull states on a tesselation. (Hence, it is a property of this tesselation.) Entry into u, implies the obliteration of some aspect of an automaton and the transition from uo into another state entails the creation of some aspect of an automaton. The states of cell i, say, undergo transition according to a rule that depends upon the immediate state of i, u E Ui, and the state of neighboring cells, say j . . . 1. Formally, this rule is a mapping
a;[vi,uj,.. . U,]-+[U,]. A localized automaton is a connected region of not-null states. An unlocalized automaton is commonly either a connected region that spreads out over the tesselation, as in Burke’s construction, or a wave of replicating individual automata that spread out over the tesselation as in Loefgren’s construction. An observer is a t liberty to interpret any feature of this process as the automaton that is relevant. I n particular we shall be concerned with evolutionary processes in which, to begin with, there are automata a E A , interacting in object language Lo,that satisfy some pragmatic criterion [they are “historians” or “philosophers” as suggested in 2.7(6)] with respect to any environmental constraints we may impose in LO. As a result of evolution, there appear further automata b E B (but B is a species of individual organizations b which consist of cooperative aggregates of a which interact in terms of L1).Now if the b E B also act like “historians” or “philosophers” we can perfectly well call them the automata of interest. Indeed, if the b E B are better historians or philosophers and we aim to interact with the “best” species, we are forced into discourse with b E B , and, con156
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
sequently, into constructing an hierarchy of metalanguages Lo,L1 . . . , which places us in the position of the observer in 2.1. ( 2 ) The completely abstract tesselation model is mathematically elegant but it is difficult to choose interesting rules 8 and it is often difficult to interpret the configurations that appear. It would be convenient to have a direct interpretation for values like proximity to a goal and the cost of maintaining the structure needed to perform whatever computations are involved in goal achievement. On the other hand, it is obviously desirable to preserve the logical simplicity of the tesselation model. We need some representation between this abstraction and the self-building computer programs we shall encounter in connection with artificial intelligence. As a compromise, Masano Toda [90] has conceived a model in which automata akin to small animals move around in a very simple environment characterized chiefly by a distribution of “food.” This “food” is a commodity that the automata must acquire and store because their structure must be paid for in terms of food, expended for this purpose. I n addition to seeking food, the automata in Masano Toda’s model seek an independent goal and niay cooperate with one another in pursuing it. Grey Walter [ Y I ] , Barricelli [92], and Goldacre 1931 have also conceived models of this kind. Independently, various similar models have been simulated (handsimulation, assisted by the apparatus mentioned in connection with the model of 2.4 and also computor simulated) in my own laboratory [94, 95, 961. One of these will be briefly described. The most primitive automata are creatures, a E A , able to move about in their environment, to eat the food available, and to emit signs (which they can also receive). These primitive automata are able to reproduce and create further automata but their survival depends upon the acquisition of sufficient food. The environment in which the automata evolve is a network of nodes, either over-all toroidally or over-all planar connected. Each node is associated with a food store that is filled at a rate that depends upon the availability of “food” (which is determined by the experimenter) and upon the local conditions. This environment is “malleable” in the sense that the local rate of “food” inflow depends upon the food that has previously been eaten from the store. In terms of the actual simulation, the food store is a condenser of capacity C that is charged through resistances R, R,. Of these R, is a fixed linear element, whereas R, is a nonlinear thermistor element. Hence R, depends not only upon the current that has passed (and 157
+
GORDON PASK
heated the element) but, also upon a lag term, which determines the rate a t which the heat is dissipated from the element. We make the dissipation slow with reference to the motion of the automata, As in Fig. 2 2 , the automata attach themselves to the nodes and partially discharge C by eating. Hence the potential on C , denoted V , changes according to the number and avidity of the automata resting at the node concerned, as well as the influx of current.
FIG.22. Siinulated automata in environment,.
When an automaton rests a t a node it eats food a t a rate t h a t is proportional t o the difference between V and 6, where 0 is the amount of food t ha t the automaton has accumulated in its internal store, unless 9 > V , when the automaton is unable t o eat. The food in the internal store is depleted to pay for th e fabric of th e automaton at a rate p which is a function of the age of the automaton. If 0 falls below a certain value, Bo > 0, the automaton falls apart an d is removed from the model. On the other hand if 6 exceeds a certain value, 0 > Om, a further 158
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
automaton is born as an offspring. At this point the parent automaton is “rejuvenated.” It and its offspring receive a starting amount of food in their internal stores of $0,. I n each case the aging function that determines the cost of maintenance, p, is assigned its age 0 value. In terms of the actual simulation, Fig. 23, the internal store is a condenser C charged through a resistance R and a diode that prevents the automaton from eating if 0 > V . The “constant” current valve determines the cost of maintenance. When a further automaton is produced contacts I and I1 are momentarily closed. Hence 49, is transferred to the internal store of the offspring and the condenser in the age circuit is discharged.
if 8
-Offspring Remove
>8
8UhXRfLt8
i f 8,-,
>8
I To ‘Internal Computation
FIG.23. “Age” circuit and “stomach.”
An automaton can move to any of its neighboring nodes or it can remain where it is. The decision to move is made upon the basis of several items of evidence which we shall consider in a moment. The adaptation in the automaton entails placing various ((learned” interpretations upon this evidence. 159
GORDON PASK
The chief data, however, are information about the value of V prevailing at the various accessible nodes (for the automaton aims to survive, hence t o maximize 0, which depends upon an adequate supply of food to eat). The automaton is thus born with a sensory apparatus that allows it t o discern the value of V a t the five accessible points indicated in Fig. 24 (which are also the points “0” to which an autocan move a t the next instant). maton resting a t
“+”
FIG.24. Possible moves.
The decision to move into one or the other of these locations depends upon certain design principles : (i) An automaton must be active; for, in fact, we are interested in the motion of automata, rather than automata themselves. Consequently, the rate of eating a t a node is made greater than the maximum rate of replenishment of the food. Hence, an automaton that remains in one position must eventually decay. (ii) The automata must gain as a result of correlated activity; in terms of game theory, the payoff function must determine an essential nonzero-sum game with a number of participants that depends upon the accumulated payoff. The basic requirements are facilities for providing an automaton, say, a,, with information about the action contemplated by any neighboring automaton, say a2,with which its activity is correlated and facilities for adaptive modification of the coupling between a, and u2. These requirements are satisfied by providing a communication system whereby the automata can indicate their state before a motion is completely determined. We comment that this provision is strictly redundant. It is possible to replace the communication system by an interpretative facility exercised in respect to the sensory system. (For example, in other simulations, the automata have been designed to sense the gradient of food change and hence the potentiality for interpreting a steep change of food level as a sign for the presence of another automaton.) 160
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
The decision circuit that selects among the motions 1 or 2 or 3 or 4 or 5 is a set of amplifiers with common cathode connection, the outputs of which actuate each of five trigger circuits. (These select one of five actions and are so constrained that one and only one can be energized a t any instant.) The amplifier outputs in the a, decision circuit modulate the amplitude of five oscillators of frequency F1, F 2 , F 3 , F 4 , F5 to produce signs al,a z , as, a4, as. The oscillator outputs are combined and applied to an image network of the “food” network, as in Big. 25, a t the image node of the node a t which a, is located in the “food”
Food Network
Signal Network
3b2k
fa&€ rTTT FIG.25. Food and signal networks.
network. The oscillator output signal, which conveys information about the tendency of a, to select each of the alternative motions, is attenuated in the image network and received by other neighboring automata, such as a2.I n a2,the signal is filtered into components F1, F2, F3, F 4 , F5. These are rectified and averaged and, through the 6 maximizing adaptive circuit of Fig. 26, determine the sense as well as the degree of coupling between the decision process in a, and in a2 or, of course, vice versa. Starting with a t least one automaton, the rate of food inflow is increased, and the automata reproduce to form a population in dynamic equilibrium with the food that is available. Groups of automata form due to cooperative coupling, either by *The behavior of this simulation demonstrated several features of the evolutionary process and the form of simulation made it possible to examine why the behavior took place. However, it was obviously impracticable to use more than a few of the rather elaborate individuals. In more recent work we have simulated statistically respectable populations consisting of between 200 and 500 “ live ” individuals on a small computer (an I.C.T. 1220 machine) using a progrsm that embodies most of the characteristics described but which allows for multiple data processing (the ‘‘ individuals ” have some shared facilities). This work is also restricted by mechanical practicality, but is part way to a program that is being written for the ATLAStype computer. We shall only comment upon aspects of the evolutionary system behavior that appeared in the small program behavior as well as the initial simulation.
161
GORDON PASK
I
J I
Decision Circuit
FIG.26. Signal system.
explicit communication, or through the food network. I n this connection it is important to recall that the signals and the image network are, in a sense, redundant. The automata interact with their environment. They may also interact through their environment with one another. Hence, there are organizations which are not automata as such, nor 162
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
even groups of coupled automata, but organizations partly embodied in the environment, i.e., some property of this medium. These organizations reproduce, using a mechanism of reproduction which has evolved, in place of the reproductive process built into the automata, which is relegated to a subsidiary place. The evolved process is a dynamic “template” mechanism, Any behavior of the automata necessarily induces a pattern P upon the environment because of its “malleable” character. Suppose this pattern favors the perpetuation of this or similar behaviors. The automata which jointly give rise to a behavior z , which are characterized by certain adaptive modifications, act upon their environment to produce P,which favors the survival of this sort of automaton (or more pertinently, since it is the motion of automata, not the automaton, that is important a t this stage, the survival of this behavior). The process is autocatalytic and represented by a mapping
z
P or, equivalently IAn dc tuicviint yg un ST termu cptlua rt ee
that is defined providing that t h i s organisation (of which the mapping is a specification) can obtain sufficient food. I n fact, stable organizations are characterized by many-to-many mappings from a set Z of z into a set B of P such that any z in Z induces some P in 9 and any P in 9 induces some z in Z. In passing, we comment that if Z is identified with a set of oscillatory modes and if B is identified with a set of synaptic modifications induced in a malleable network when these modes exist, this z, P, model is isomorphic with a mechanism of learning in neurone networks proposed by J. W. S. Pringle [97]. This model is also a special case of Wiener’s [7] formulation of self-replication, the “noise” that acts as a forcing input to Wiener’s filter being the autonomous activity in the system. Hence evolution entails the development of different levels of organization, or, by analogy, of a species B from the original species A . We regard automata a in species A as level 1 organizations and members of the species B as level 2 organizations. There is an interaction between level 1 organizations, as distinct from an interaction between level 2 organizations, and these interactions are characterized by languages, say Lo for level 1 and L1for level 2. The signs in the LO language are discrete motions and their indices a. The signs ,8, say, in the L1 language are distributions of food or sequences of signals. Commonly an CL sign has little effect upon a level 1 organism, and a /3 sign will have little effect upon a level 2 organism. But many an expression in L1 will induce some Lo expressions and many sequencies of LO signs will 163
GORDON PASK
induce an expression in L’.Thus there is A B and BA interactbn, and in some cases identification occurs between Lo terms and L1 terms, At high density an interaction effect reminiscent of crystallization takes place. Some automata differentiate to indulge in distinct and invariant capacities in the level 2 organization. (They may, for example, perform only one motion as members of a chain of automata.) Broadly, differentiation is due to the fact that many automata are born and live their life in an environment that is almost entirely determined by their neighbors. Although we have provided only one A species of automaton, differentiation admits the coincident existence of several distinct B species of organization, say, B,, B,, . . . which may have languages L,, La,. , that are distinct, and it becomes necessary to distinguish
.
between interactions like A= B (between levels) and others like B , C B , ( a t a given level of language).
To what extent is this a nontrivial self-organizing system (providing, of course, that it continues to evolve)? The medium or environment is always capable of fostering the mechanisms that evolve. The fact is that different properties of the medium become important a t different levels in evolution. It would, in a certain sense, be possible to predict the possible modes of interaction, and this is also true if we adopt the obvious expedient of specifying an indefinitely extensive medium. But although this kind of comment is true, it is largely irrelevant. As observers, we are anxious to interact with the organizations that evolve, to make them compute for us (like the population of apes in 1.8) and to make them adapt their computation so that certain goals are a,chieved. (In the present model these goals will be to bring about certain food distributions.) The lowest level interaction in which we can indulge is to observe a distribution of a E A or of the a signs they emit in Lo and to operate either (i) upon the local food distribution or (ii) upon the signal network with CL signs. (One or the other is admissible depending upon details of the specification.) I n this way we can induce specific adaptations (that achieve the goal). Thus we can make the automata behave, in a trivial sense, like the desired “philosophers” or “historians”. But unless we have chosen a trivial goal which can be optimally achieved by a E A (so that there are no better “philosophers” or “historians” than a E A ) we can do more than this by interacting (in terms of the /3 signs) with b E B. But we are forced to build an hierarchy of metalanguages LO, L1,. . . in order to maintain this interaction with organizations in species B,, B,, . . . and, further, to translate between the languages L,, L,, . . . of different species B,, B,, . . 164
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
We comment that it is not difficult to relate A , B,, B,, . . . to the forms (or, as they are often called, “plans”) of program which are used in artificial “intelligence” systems. On the other hand, there seems to be no reason why a self-organizing system of this kind should be reducible to triviality. To put the matter from a logician’s point of view, our communication with the evolving organization entails building an hierarchy of nontrivial metalanguages, and its unambiguous description involves an hierarchy of logical types of statement. 3. Artificial Intelligence
3.1 Basic Definitions (1) When we say that X is intelligent, X being a machine that somebody has built, we usually mean more than the trite assertion that X can deal with a suitably encoded intelligence test. The fact is, although we may accept test passing as sufficient evidence that a man is intelligent, we need more evidence when predisposed against X because it is a machine. MacKay’s [98] distinction between “intelligence’ and “intellect” is pertinent. Constructors and critics of artificially intelligent devices seem to be aiming for intellect (creativity in pursuit of rational as well as imaginative ends) rather than the logical dexterity that satisfies a narrow definition of “intelligence.” Given a man, we can take a modicum of intellect for granted. By convention, we cannot assume the existence of intellect in a machine and we shall take the requirement of intellect as an objective, that is, ideally, to be satisfied by an artificial intelligence. Tests for the logical component of intelligence present no difficulty. To satisfy them, the tested device must compute a suitably elaborate set of functions of its environmental input. Tests for the intellectual component are quite a different matter. The fabric of an artifact is irrelevant to its intellect (and to its computing capability as well). To be told that a man has a brain made of tinplate or blancmange does not shake my faith in his intellect. Similarly the mechanical specification of an artifact is irrelevant, for I could not recognize an intellectual circuit and doubt whether it is a meaningful entity. Whatever else, the test for intellect applies to the behavior of an artifact and not to the mechanism that mediates this behavior or the material from which it is built. Ashby [99] laid emphasis upon this point when he proposed a crucial test for intelligence in terms of the selective activity of a system. Among its other quirks, “intellect” is the disjunction of many 165
GORDON PASK
ostensively defined properties we feel bound to ask for in the repertoire of anything, man or machine, that is intellectual. Thus the system undergoing test should be able to use signs for things as its symbols (To solve problems in a universe of rational discourse rather than a factual environment). Another property we look for is adaptation. The exercise of intellect implies a certain lability, so that the function computed by an intelligent machine is adjusted to meet the demands of the moment. But this much lies within the repertoire of many computers and controllers that are never deemed intellectual; for example, the self-organizing systems of type (i)in 2 . 7 ( 7 ) . I n order to pass the lability test, an artifact must change not only the function it computes but its system of symbols (its “concept” of its environment). Convemely, it must be undisturbed if it is presented with different environments. (For each, it must construct a suitable representation of its own accord.) Hence it is, a t least, a self-organizing system of type (ii) in 2.7(7), or of type (iii) in 2.7(7).
A test of lability and symbol construction is very much stronger than a test for the nontrivial employment of symbols for it involves “concept” building. The question is, in the first place, “In how many different environments must the machine be able to build concepts?” or “How many different kinds of problem must it be able to solve?”and, secondly, “How different must these environments or problems be?”. There is no completely unequivocal reply, but to avoid triviality, the set of environments that are used in the lability test must be formally incomparable within the language L* that is used to describe the test situation. Hence, any machine that passes the test will have, by definition, a behavior which is a self-organizing system of type (iv) in 2.7(7). It will be able to carve out an area of relevance within a wider environment and it will be able to impose a conceptual pattern upon this area. On the other hand, not every type (iv) self-organizing system is intelligent. We might regard a Martian as intelligent or not (assuming he could pass the lability test), and our view about him would not depend entirely upon the tricks he could perform. We could perfectly well say he was a type (iv) self-organizing system (yet an unintelligent one) because we could not understand how or why he managed to control his surroundings. I n the first place we might be unable to appreciate features of the environment that were obvious to a Martian. (All the same we could, quite conceivably, observe the regularities achieved by a control system that used these abstracted features as its input signs.) Secondly, we might fail to understand a Martian’s objectives or to discern what he deemed important. (A Martian may not eat food or 166
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
need his batteries charged or have any very consistent metabolic requirements. ) Any decision we make in this matter is heavily weighted by our attitude. We agree that a control mechanism for an office block elevator is a computer but deny it intelligence although, when presented with a demonstration machine that computes exactly the same functions, we may waver in our pronouncement. Partly, this is due to the fact that we know the control mechanism has no option of its own. Mostly, however, our rejection of the elevator control as potentially unintelligent is due to familiarity alone. We are accustomed to this particular automaton and have assigned it an other than intellectual status. Given a self-organizing system of type (iv) in 2 . 7 ( 7 ) , its intelligence depends upon the form of metastatment that we have made in order to associate its otherwise disparate component systems, any one of which may represent a separable concept. Most people would agree to acknowledge the intellectual facet of intelligence if the relations between these concepts have the caliber of the relations between our own concepts of the same environment, and if the machine concepts are acquired in much the same way as our own. Crudely, an intelligent artifact learns in the same way that we learn. This comment can be extended to other aspects of mentation. Thus a “proof” offered by the artifact must have the status of the ‘‘proofs)’ we offer. (Ideally, an artifact should not be constrained to a single type of proof; for example, to count a5 intelligent it should be capable of acting like some kind of historian, some kind of lawyer, some kind of biologist, and some kind of physicist.) The property of intelligence entails the relation between an observer and an artifact. It exists insofar as the observer believes that the artifact is, in certain essential respects, like another observer. The best check for the property is an attempt t o converse with the artifact and to develop joint concepts as a result of this conversation which is suggested in 2.7(8). But manifestly this is not a “test” in the ordinary sense, for its result is informative only insofar as the observer (who administers the procedure) participates in the conversation, and to this extent his view is biased and equivocal. I n a slightly different connection, MacKay points out that although such an observer is influenced by empirical data he must, a t some stage, make an independent decision to “name” any machine that he deems intellectual. At this point it ceases to be an arbitrary creature and becomes one of his own clan. Thus an artificial intelligence is a self-organizing system of type (iv) in 2.7(7) which learns about a symbolic environment in order to solve the problems posed by this environment. Further, its “concepts’) 167
GORDON PASK
and its methods of problem solution are peculiarly human. The first speciality lies in a specification of this symbolic environment in which the system acts. To elucidate it, we shall describe a rather simple problem-solving computer program devised by Kochen [IOQ]. The second speciality is the human-like component of the artificial intelligence which may be introduced: (i) as a set of “heuristics” (to use a term proposed by Polya [loll) or broad rules and suggestions for problem solution, (ii) by close-coupled interaction between a man and the machine (literally by a conversation in which the machine acquires man-like habits of thought), (iii) by embedding constraints into the program that stem from psychological models of concept learning, or (iv) by embedding similar constraints derived from physiological models of the process that underlies concept learning. Obviously, these restrictions are imposed a t different levels of discourse. Suppose we choose Lo to comprehend the physiological or mechanistic level of (iv), the constraints in (iii) are applied in L1 (strictly in Lql,r ] , > 1) and those of either (ii) or (i) are applied in L2 (or strictly in Lq2,q2 > qI). Although these constraints may often be applied jointly, it is convenient to make an arbitrary distinction between them. The heuristic constraints of (i) that lead to autonomous machines with little structural resemblance to a human brain are considered immediately. The constraints of (ii) are discussed in Section 5 and those of (iii) and (iv), which give rise to rather abstract models of a human brain, are examined in Section 4. (2) Bruner, Goodnow, and Austin [I021 performed a psychological experiment in which a sequence of cards, each displaying the presence or absence of attributes like color, shape, and number, were presented to a subject. With each card, the subject was informed whether or not the card belonged to an unknown subset of the possible universe of cards displaying these attributes, and he was required to assert his current belief about the composition of the unknown subset and ultimately to define this subset with confidence. Depending upon the sequence of evidence, it is possible to deduce various characteristics of the unknown subset A. Bruner, Goodnow, and Austin were concerned with several cases but we shall concentrate initially upon the X of the kind they call conjunctive “concepts.” (Their usage of the word “concept” differs from the present usage. A conjunctive subset is a subset defined by the conjoint possession of several attributes.) 168
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
To formalize the environment, consider n binary attributes denoted as xi. Any exemplar (such as a single card) is defined by an n component binary vector X = xl,x, . . . x, and the entire universe of exemplars by the 2” vectors X. When an exemplar is presented, say a t an instant t, the student also receives the information that it is or is not a member of A. Thus the input t o a student a t the tth instant is
V t
x,mt
= {x,lt,
. . . x,It,
[It}= X l t , ( I t ,
where
t t = 1 t
ifXItEh
=0
if not.
Since any A is a conjunctive subset it can be represented by an n component vector of three-valued variables of which one value indicates an indifference. Call these variables
yi
if xi = 1 ifxi=O = Z if xi is either 1 or 0.
=1
= O
To illustrate the indifference value, we show the subsets hi E Y , = [ I , 0, Z] and A, Y , = [l, 1 , 01 for the case of n = 3 in Pig. 27. The output of a subject for an input sequence
--
[ V l l , V l 2 , . . .I
=
[ ( X l l , 4 1 1 ) , ( X I 2 5 1 2 ) , * . .I
is formally a sequence Y l 1 , Y l 2 . . . to which may be adjoined a sequence of assertions about the value p 1 t of his confidence in the guess that Y l t = A. We shall assume that the input sequence is arbitrary, probably redundant, but devoid of logical inconsistencies, with reference to the proposition that A is a conjunctive subset. I n other words, sequence I of Fig. 28 is admissible, but sequence I1 is not. Kochen devised and tried out a number of different computer programs, each of which embodied one of several plausible strategies for guessing values of Y l t and asserting values of p l t . He compared these with one another according to various criteria and with the performance of human beings. I n each case, the program (artificial intelligence “machines” M I , M , . . .) “guessed” Y l t = A, for various conjunctive subsets A, before sufficient evidence had been examined t o prove, deductively, that Y l t = A. On the whole, the M programs 169
GORDON PASK
Y1
-
Ya
Yo
. I
(1)
V11=O,l,O,l Vl2=0,1,1,0 v23=1,1,0,1 v 1 4 = 1,1,1,0
(2) Vll=O,l,O,l
VlS
=
vla=o,o,o,o
V~Z=O,l,l,Ov13=1,1,0,1 v14=1,1.1,1 v15=0,0,0,0
.. . . ..
, 0,0,1,0 V 1 7 = 1,0,0,1 V 1 8 = l,O,l,l,,
Possible Sequences
FIG.27. Concept space.
faired better than the human beings. Experiments were conducted for values of n = 3, 4, 5 , 6, 7, 8 , 12, 15, and various values of the numbers of z -valued entries in Y = A. (The symbolism adopted in the paper which contains these results differs from our present symbolism.) The simplest M is MI of Fig. 28. The initial “hypothesis”, Y I 1 , is that X = [z, z . . . 23 = Y 11. At any value of t it may be the case that (i) X l t E A n Y l t , when Y l t is confirmed. If so, M leaves Y l t unchanged. (ii) X l t E 1 n Y l t , where X is the complement of X and where Y It is the complement of Y It, when Y It is confirmed and MI leaves Y It unchanged. (iii) X l t E X n y l t , when the hypothesis is disconfirmed (because, by definition, the value of [It in V l t = 0 but Y l t , the current hypothesis, asserts that this exemplar is a member of A). I n this case, M changes its hypothesis. If j is the least value of i for which, in Y l t , the entry is z , then y i l t 1 = y , l t for i #j and the entry yJt 1 =q l t . 170
-
+
+
Yes
Test if YJt- 1 = z for i
I
1
1
t No
t Test if
Yes
[It = 1, P o s i t i v e if [It = 1.
Test if X I t E Y l t
1
-1
Logically Determined Item if t Differs from Previous P o s i t i v e Case in One Variable
Select i s u c h t h a t Y i l t - 1= 2 a n d Make yJt - 1 = Xil t if. I n c o n s i s t e n t R e p e a t for P a i r s a n d so on
T
S
XI
II 1
- .-
I
Yes
1-
Logically Item Determined
h b k e y i l t = z for all y i l t n o t Logically Determined
t
Logically Determined Item
t Test if y l t is C o n s i s t e n t
No I
i
Select i s u c h t h a t y i l t - 1 = z a n d Make y i l t = X i l t . If Inconsistent Repeat for Pairs a n d so on J
t
Yes I
I
t
I
Compute Measures
xl
t rL
Stop Program
plt
Logically Determined Item if t Differs from Previous Positive Case i n O n e Variable
-
No
Test for Sequence Remaining
FIG.28. A problem solving program.
Yes
__L
Next Input
vlt+1
5 a n ..
P
D
z
13 v)
rn r
n
GORDON PASK
(iv) X l t E h n Y l t when the current hypothesis is disconfirmed (because the value of f l t in V l t is 1 but the current hypothesis 1 = z for asserts that X l t is not a member of A). I n this case y i l t all values of i such that the values of y i I t are not logically or deductively determined in the sense of Fig. 28.
+
If C , l t is the number of occasions upon which the current hypothesis Y l t has been confirmed in the sense of X E Y n A and if C , l t is the number of occasions upon which Y It has been confirmed by X ELnX and if p l t is the estimated probability at the instant t that 4 = 1, and 1 - p l t is the estimated probability that f = 0, then, for M I ,
For each machine, ill, a “distance,” between the hypothesis and A, is computed in order t o evaluate the performance of M . I n fact, Kochen used seven different machines. The form of p l t was modified p l l t 4 p Z l t to remove undesirable assymetries and the selection rules (iii) and (iv) were changed so that modification of the Iogically undetermined entries depended upon a “random” process (selecting which entry should be modified). We comment that p Z l t typically undergoes sudden “insightful” transitions. Whereas M I and some of the other machines retained the entire sequence of inputs in a “memory” and compared all of them with the current state, the derived machines had a restricted memory. (In most cases the “random” process improved the performance, and restrictions upon the “memory” did not appreciably impair the performance.) X typical derived “machine” is shown in Fig. 28 as M , . As Kochen very carefully points out, none of these systems satisfy a criterion of artificial intelligence of the kind we adopted in (1). However, a combination of the experimenter with the machine may make u p for the deficiencies in M , namely:
(I) The machine does not learn. (11) Its environment is restricted. (111) The denotation of signs is determined externally (since the identification of the attributes is determined externally). (IV) The machine does not build an hierarchical structure. Of these, (I) is countered by the comment that the experimenter does the learning when he changes one machine into another, Mi -+ M3. The modifications introduced (the difference between M i and M,) depend upon data that are computed by M i , namely, the values of pl_t and of the distance function, and readily available parameters of 172
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
M ibehavior. (11)is not a real objection. The environment is restricted by intention rather than necessity. Subsequent machines can, for example, deal with disjunctive subsets A = A, u At u . . . which are predicted by evidence of the form in sequence I1 of Fig. 28 and there is no reason why other, feasible, M should not make guesses about probabilistically defined subsets on the basis of ambiguous evidence of the kind that is delivered by sequence 111. Since the possible 22n subsets of the exemplars X consist of disjunctive subsets A or conjunctive subsets h this type of machine is unrestricted in its domain and it can be shown capable of dealing with the indefinite and ambiguous sequences of evidence that pose the inductively solvable problem of characterizing a probabilistic subset. On the other hand (111)implies a more serious deficiency. A machine of this type is limited to situations in which both the attributes (of which values constitute relevant evidential data) and the objective or goal (the subset A) are well defined before probbm solving begins. Obviously there are some situations demanding intelligent problem solving wherein this limitation is acceptable. But we can never judge whether or not the criterion of (1) applies unless the machine is capable of dealing with situations where the alternatives are not well defined. I n this case, we know that the machine does not possess this capacity and the goal and the relevant data have both been selected by the experimenter. Unfortunately, there is no readily asserted algorithm that the experimenter adopts when he deals with the issues of (111). I n fact, he can, a t the best, rationalize his decisions by announcing some heuristics. Similar comments apply to (IV)if “heuristic” is replaced by “evolutionary” rule. (3) Artificial intelligence systems live in a symbolic environment comparable to the universe of binary vectors, but frequently of a much more elaborate kind. Sometimes the symbolic environment is restricted to geometrical propositions or logical expressions (as in Newell, Shaw, and Simon’s “Logic Theorist” [103])or figures on a retina. Sometimes, as in Newell’s [I041 General “Problem-Solving” program, the environment can be made up from any abstract objects and almost any relations between them. The system is provided with a set of operators with which it can act upon the objects in its environment and a set of diflerences, or distinguishing attributes that it can detect and which are used to discriminate between objects. A problem is posed by specifying some initial object and the goal of reaching some other object; for example, the initial object may be a logical expression and the goal object its proof or in the General Problem-Solving program [which we shall call G.P.S. (I)] any other 173
GORDON PASK
object related to it by a sequence of transformations in the symbolic environment that corresponds to a sequence of processes in the artificial intelligence. (4) The majority of systems can be criticized on the grounds that they do not embody the gamut of processes that make them independent of the experimenter or the programmer. But, as suggested in ( 2 ) above, this criticism is trivial if the experimenter’s or the programmer’s activity could be programmed. Hence it is very profitable to look a t systems that are fragments of an artificial intelligence and which deal with special facets vf problem solving providing that among them there are systems capable of assembling these fragments into a composite entity. Minsky [I051 believes that five types of process are usefully distinguished:
(i) Search for a goal, involving a sequence of choices based upon the evidence derived from measures like: (I) the value of achieving a goal, (11)the proximity to a goal, (111)the amount of computation that is expected (or the length of algorithm needed) to achieve this goal. (IV)an index of which method (or type of algorithm) is best, and (V) an index of the cost of the computation involved. (ii) A process that reduces the ultimate solution of goal achievement into partial solutions or subgoals. We comment that if the measures (I), (11),(IlI), (IV), (V) can be defined, then a suitable process exists. (iii) A heuristic procedure defining relations of similarity and of equivalence. It may be necessary, for example, to view a given and unsolvable problem as being equivalent to a problem that can be solved (when the same method is applied). It may be necessary t o adopt a novel method which is similar to (but, on some grounds, is supposed to be more successful than) a previously adopted method. Indeed, Minsky and Selfridge [106], Travis [ l o r ] , Marzocco [108], and others regard the basic heuristic of artificial intelligence as “Given a problem, apply a method of solution which is a generalized version of a method that was previously successful when applied to a similar problem.” (iv) Recognition of a pattern or, in a generalized form, the construction of a denotation or a connotation for expressions in the currently adopted language. Hence, the pattern concerned may be a set of relevant attributes [lo91or a relevant configuration of attribute values, or a goal, or some method of problem solution [110]. (v) Learning whereby organizations evolve, differentiate, or adapt. 174
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
We shall maintain these distinctions without, however, considering the items in a particular order. ( 5 ) These processes, which arise from the heuristic constraints of ( I ) , are defined with reference to a very special system of symbols (and, as in 1.4, we are interested in the dynamics of this system). Hence, the existence of a given process entails organizations of the kind we discussed in 2.7 and 2.8 which, in turn, entail physical mechanisms of the kind we considered in 2.6. But the correspondence between organizations at these different levels of discourse is usually many to many. Thus an hierarchical ordering that gives one “subgoal” priority over another and which might be said t o induce a kind of “preference” in an artificial intelligence bears no obvious relationship to the hierarchies that seem evident in a functional description of the physical entity in which the artificial intelligence is realized. (Consequently, as in (l), we cannot recognize an intelligent circuit.) Nor is there any reason why there should be such a relationship (terms like “priority” and “preference” become contentions because we have a sneaking feeling that there
should). On the other hand, there is a basic analogy between the structure of a type (iv) self organizing-system and the structure of an artificial intelligence. In the most propitious case this amounts to an isomorphism. In an artificial intelligence program the unit of organization looks like : (i) Test a current hypothesis against a given set of data. (ii) Perform an operation that is selected according to the outcome of the test. (iii) Observe the result of this operation as reflected in the available data. (iv) Either return t o (ii) or proceed to the next unit of organization. The unit is conveniently depicted as
-I where “0” stands for hypothesis testing and “D” stands for an operation that is performed. Since most artificial intelligence programs are written in a list processing computer language, it is relatively easy to make certain of the 175
GORDON PASK
operations into the creation of novel tests or novel operations or the deletion of unwanted tests or operations, Now this unit of organization is isomorphic with the “tote” unit (test operate test exiat unit) which Miller et al. [I111 use as the building block for mentation; and, as they point out, the existence of “tote” units is symptomatic of a “plan,” isomorphic with a program (just as the present “units” of organization are symptomatic of the program in which they are defined). The unit is also isomorphic with the realization of a controlled branching algorithm in the sense of Ma,*kov [ U Z ] ,or, as Hunt [I131 argues, a recursive computation of the form
f(z)
=
+),
=
f ( B ( z ) ) g ( z ) = b.
g(4 = a
Finally, it is isomorphic with the adaptive subcontroller, in a type (iv) system. Similar comments apply to the assembly of basic units of organization into a system, only in this case it must be recognized that the list processing computer language (and the necessary predisposition to linear representation) imposes very severe limitations. The point is illustrated in Fig. 29 where item (1) somewhat extends the basic model of an hierarchy of adaptive control shown in Fig. 2 and item (2) gives the isomorphic representation of this hierarchy in the selective representation of Fig. 18. I n item (3) we apply the restriction that one and only one subcontroller is selected a t once (thus introducing a sequential process). Item (4) demonstrates the isomorphism that exists (given this sequential, one a t once, restriction) between a “tote” unit and a subcontroller and item (5) is isomorphic with the hierarchical structure of item (3). There is nothing sacrosanct about sequential data processing. Oliver Selfridge’s “Pandemonium” [114], a parallel system we shall consider later, is a notable exception to this rule. As Newel1 [I151points out, a “Pandemonium” which is analogous to item ( 2 ) can take a broad view of all that is going on in a system, whereas a sequential program is written on the assumption that different parts of a computation are separable and interact by closely prescribed channels. There are many circumstances under which a broad view is handy to have. But is it necessary! The trite reply is that any finite dimensional image and presumably any parallel system can be represented in terms of a linear sequence providing punctuation terms, indicating disposition, are adjoined to the alphabet of signs. Markov [UZ],for example, gives a construction 176
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
for this purpose. Hence, there is some curiously encoded translation of a parallel machine which does the same tricks as the original. But is this the point Z I n particular it is a matter of doubt whether the linear shadow of a parallel organization can be realized in a physical fashion or whether it can evolve in a medium under rules of evolution that we are able to appreciate. 3.2 Specific Processes (1) The simplest kind of goal achievement is evidenced by a controller. Of course, the goal (indicated by the maximum of a suitable payoff function) need not be easy to reach. Minsky and Selfridge [I061 have considered various cases (simple optimization with “hill climbing” in the parameter space to a unique maximum, multiple maxima, the case of stochastic rather than deterministic “hill climbing,” and the intractable case of an isolated “pinpoint in a plan”). When there is only a single type of goal the search conducted by an artificial intelligence in its symbolic environment is analogous to the more difficult cases of control maximization; for example, search for one among the set of possible Boolean expression is analogous to the “pinpoint” in a “plain” case. On the other hand, this kind of goal seeking is unusual in the real world. Intelligent creatures aim for many and diverse objectives, using vastly different methods. Some of this richness is preserved in G.P.S. (I)by allowing several types of goal [126]. To describe these, let us assume that the symbolic environment of G.P.S. (I)is logic. The objects available within this environment consist of logical expressions like “A v B” and “A c ( B VC).” The operations given to the G.P.S. (I) are transformations like (‘A v B 5 B V A ” and like “B c A 3 B A A.” The differences between objects are of the form (‘changed position” and “changed connective.” The given operations are only relevant to certain of the differences, for example, the operation “ A v B 3 B v A” is only relevant to a difference in the position of the variables concerned and, further, a n y logical expression that is converted into the form “A V B” can be transformed into another logical expression “B v A” such that the only difference lies in the position of the variables. The relevance of different operations is conveniently described by a binary application ma.trix with a “1” indicating relevance and a “0” a lack of it: Operations F Differences 1 0 . . . 1 G 0 1...0
177
178
c
m
h
3 v
GORDON PASK
r
(4)
A single element reduce8 to
-
which i s , equivalently
(3)
-
I,
Z W
Thus this restricted J i s equivalent to The Environment
ma.29. Control hierarchies.
GORDON PASK
Now the main types of goal in G.P.S. (I)are: type T, trapsform an object a into another object b ; type R, reduce a difference G existing between a pair of objects a, b ; type A, apply an operator F , t o an object. When G.P.S. (I)is posed a problem, it evaluates the problem, as in Fig. 30, and if the problem is accepted it decides upon a method of solution which is associated with one of the types of goal. Each method applied to the initial object in pursuit of the selected type of goal will produce subgoals which are often of different type. The recursive character of this goal-directed computation is apparent from an inspection of Fig. 30. Thus a type T goal involves a method that tests a difference between a and b. If no difference exists, this goal is achieved. If a difference does exist the test leads to the type R subgoal of reducing the difference. This type R subgoal entails testing for the relevance of an operation F to this difference and induces the type A subgoal of applying F t o reduce this difference. Similarly, type A goals lead either to type A subgoals or to type T subgoals. The entire computation terminates either on G.P.S. (I) discovering the solution object or discerning that it cannot solve the problem posed to it. (2) The recursive character of the goal-directed organization (which led to a very convenient “list processing”) reduced the potentialities of G.P.S. (1) by rendering it too inflexible. [Newell [I171 points out that the price paid for the flexibility of many goal types, an advance over the rather earlier “logical theorist” program, was the restricted organization of the search process in G.P.S. (I).]It is a characteristic of this essentially sequential organization that tests and subgoals cannot readily be reactivated and that control resides a t any instant in a rather isolated subroutine. Newell [ l l 7 ] describes various methods that were tried to overcome this difficulty, for example, to impose an over-all control upon the subgoal routine which evaluated the success of each subgoal and, as a result, returned to the previous stage in the search or allowed the search to continue. A highly centralized system like this has the defect that the overall controller (the central authority) receives inadequate information on which to base its decisions. The program was ultimately reformulated [as a program G.P.S. (II)]. As indicated in Fig. 30 the search “tree” structure of G.P.S. (I) is replaced by a kind of mobile executive which assembles data about goal achievement, and refers back t o an over-all specification of the method to be adopted. This structure constitutes a compromise, on the way to an ideally much more parallel organization.
(3) As it stands, the objects and the differences and the operations 180
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION No difference, goal
G.P.S. I Coal Structure for Transform Coal Unique Outcomes or Operations Equivalent Tree Structure for T e s t s , in the c a s e of G.P.S.I testa o f di fference
C.P.S. ( I ) Goal Type Selection in terms of Subcontrollers
Find
-
Executive Search amongst Ordered Set of Methods
Yes Continue or Not
Succeed
Fai I
I 1
Carry Out
Yes
NO
Carry out
Diff
Subgoal found
Continue Subgoal or Not
No
-
Record Result. Next Input
--
C P.S. (11) Simplified Form of Goal Selection
FIG.30. The G.P.S. system.
181
GORDON PASK
are introduced by a programmer. But a practically useful artificial intelligence should be able to learn the objects, operations, and differences that it needs in order to solve problems. A couple of learning situations have been considered and partially simulated in G.P.S. The first is “learning the entries of the application matrix,” given that the operations F and the differences G are defined. It is the paradigm case for association learning. The naive approach to this problem is a collection of statistical registers that estimate how successfully the application of arbitrarily selected operations reduce each difference. The matrix is constructed by placing “1” whenever the success of an operation with reference to a difference is above some limiting value and a “0” if it is not. But the matrix can, in fact, be learned more efficiently by using the algorithm: “Apply the test for a difference to each operation and if the result is positive enter “1” in the application matrix. If the result is negative, enter “0” in the application matrix.” Essentially, the learning process has been removed from the domain of statistical aggregation and placed in the domain of problem solving. Hence it can be tackled by a problem solving machine. Next, we consider the much more interesting matter of learning a novel set of differences so that the problem solving machine, equipped with this learning algorithm, can partially build (or specify the domain of) its application matrix. This question is a special case of “similarity learning” and is obviously it basic issue in all discussions of intelligent and adaptable perception. It yields the paradigm case for concept learning. For this purpose we label the identified processing language in terms of which problems are posed as LO VO, Q O , go.Formally LO consists of a finite vocabulary V”,consisting of the names for objects and operations and differences, rules QO which restrict the strings of signs that can be generated by applying operations to objects in pursuit of some type of goal, and an identification bo between sets of object names and objects and operation names and operations and between difference names and the programs that recognize differences. (These programs are necessary components in the process of achieving type T and type R goals.) It is essential to realize that Lo is well defined and fixed, that the identifications of the signs are fixed, and that the objects they denote are fixed. Hence, the set of differences are fixed and there is no possibility of learning a novel set of differences within LO. Similarly, once we have agreed to accept certain properties of the physical mark “LION” as relevant (the usual indices of its constituent characters) 182 2
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
and to credit the mark “LION” with a certain denotation (a member of the usual set of animals) we cannot learn a novel set of differences between, say, “LION” and “GIRAFFE.” There is, of course, a world of difference between the elaboration of the human being’s sign system and Lo,and it is also true (as we shall argue later) that the human being seems to have nojxed processing language (whereas G.P.S. must have). But we may agree to fix the language we use, as in some kinds of argument, and, i f we do, we are in much the same position as G.P.S. Notice, by the way, that the goals of G.P.S. are not specified in Lo, although the system can decide between types of subgoal. The goals appear in G.P.S. as instructions from the experimenter or programmer that have a well-defined connotation in Lo. Now the programmer takes a much wider and more comprehensive view. He knows perfectly well, for example, that the objects denoted by object signs v E V o are not unitary entities but consist of parts. He also knows that either these parts or the entire objects are capable of description in terms of many different attributes such as, in the case when the objects are logical expressions, the possession of constituent symbols, of being right-hand or left-hand members of larger expressions, or having a given connective. So far as the programmer is concerned, there are a vast number of possible differences obtainable by comparing the attributes or features (collections of attribute values) chosen to describe the objects. For one reason or another he has chosen a few specific differences, has written programs to detect these differences, and has denoted these recognition programs by the difference signs in Lo.It is, of course, equivalent to say that the L* view of the world is more comprehensive than the LO view of the world and, as before, we shall use this formalism. I n fact Lo is specified by the programmer (or defined in the scientific metalanguage
L*). Since a novel set of differences cannot be learned in Lo,the question is “What further structures must be defined in L* in order to permit difference learning?” We first answer the auxiliary question, “What is the form of difference learning?” Since the denotation of a difference sign in G.P.S. is a recognition program, the act of learning a novel set of differences must involve writing difference recognition programs. It is thus necessary to provide an identified language L’ = V1, Q1, b’, in which programs that act as recognition programs in Lo can be constructed and compared with one another. These recognition programs must be more than concatenations of the programs originally defined 183
GORDON PASK
in LO. (They will be composed, in general, from programs able to recognize more elementary features of the objects denoted by Lo signs.) Since further axioms are adjoined for this purpose, L1 is a metalanguage with reference to a t least some expressions in LO. Given that L1 is defined in L* i t is possible to denote the elements of a higher level problem environment. The objects in this problem environment are subsets of the difference recognizing programs denoted by Lo difference signs. The operations in this problem environment are capable of modifying these objects (denoted by L1 object signs), for example, by deleting elements from or adding them to the sets concerned, and they are related in L1 to the algorithms used in program construction. Differences exist between the attributes of sets of the original differences and the denotations of L1 difference signs will be higher order programs that recognize these differences. The ultimate goal for operations denoted in L1 will be t o achieve programs that recognize a ‘(good” or “adequate” set of Lo differences. Now the whole of this hierarchical construction is somewhat arbitrary, the choice of the operations denoted in L1, the choice of objects, and of differences. The programmer or experimenter does the choosing and he justifies his selection by reference to canons of rationality or efficiency that make good sense in L*. However, we, who also communicate in terms of this language, may agree to the choice; for example, we may agree that the operations stipulated tally with the operations we say we perform when solving problems in everyday life. Our agreement in this matter specifies the constraints Q1 that determine permissible strings of signs u E V1 and adds whatever sanction is necessary to the form of V1 and of 61. (We have tacitly agreed to quite a lot already by sanctioning LO and its identification.) What method should be adopted for building and selecting the programs in this higher level problem environment? It is, of course, quite possible to apply the naive algorithm of generating strings of signs by “chance” and selecting those which satisfy the criteria (i) of being programs and (ii) of being able to recognize a “good” or “acceptable” set of Lo differences. However, as Newell, Shaw, and Simon point out this is likely to be an impractical procedure and it is certainly unnecessary. If we are able to specify sensible goals in the higher level problem environment (which amounts to giving a sensible interpretation to “good” and “acceptable”) then we are certainly in a position t o advocate a more provident mechanism. As in the case of learning the entries of the application matrix they recommend that G.P.S. should be used in its 184
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
normal mode of activity as a problem solver (in other words that the problem of finding a “good” or “acceptable” set of La differences should be solved by the methods used in the lower level problem environment to find and transform logical expressions). Since the denotation of the domain of the lowest level in G.P.S. is arbitrary (the objects may be logical expressions or images or sets of control variables) there is no objection to this proposal and it can be argued that the proposal is optimal in the sense that it minimizes the number of axioms that need to be introduced. In a general theory of constructive problem solving automata,as in 2.8, the argument for optimality is very strong indeed. (4)Suppose that this proposal is adopted, we arrive a t two important conclusions : (I) The activities going on a t different levels in an hierarchically organized problem solver are analogous and in a suitable representation may be isomorphic. (11) The process which we call “learning” in an automaton which solves problems communicated to it in Lo is no more nor less than the activity we should call “problem” solving if we communicated with the system in L’. Hence an over-all prescription for difference learning is t o create a problem solver (in a broad sense of the word which would include, among other things, an adaptive controller) and to make it solve problems posed a t different levels in an hierarchy (the solution t o higher level problems determining the differences used by the lower level processes). Although there are many technical difficulties, there is no reason why this construction should not be applied to “operation learning” as well as “difference learning” in which case the solutions of higher level problems determine the structural parameters of the lower level systems. Further, the hierarchy can be extended by adding L2 . . . L.”. Manifestly the organization is isomorphic with the hierarchical structures of 2.7 and 2.8. In the present case we call it an artificial intelligence because the constraints upon the system resemble the constraints upon our own problem solving. (5) I n a recent paper, Newel1 [I181gives a fairly detailed construction for a difference learning system and stresses the point that recognition (or, in a system capable of changing its own denotative faculties, of perception) must be mediated in the same language as its operation (or, a t the lowest level, of manifest behavior). Very much the same argument is advanced by Mittelstadt [119],MacKay [120], and others. MacKay, for example, rejects the commonly proposed mechanism of Fig. 31(1) in favor of the mechanism of Fig. 31(2) which compares an 185
186
GORDON PASK
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
input from its world with the actions it will perform to modify its world and acts to reduce the difference between the two. The actions are engendered by a self-organizing system which tacitly constitutes an internal representation of the environment, and it is evident that comparison can only take place between similar representations. I n our present terminology only expressions in the same language are comparable, unless some other process, such as a translation, is introduced into the model. Returning to Newell's [I181 construction but using our own nomenclature, there is a problem environment with objects u, b . . . (that are assumed to be logical expressions) denoted by v E V oof Lo.The artificial intelligence is also provided with a more discriminating perceptual apparatus able to discern certain attributes of the objects in the problem environment and with an identified language L' in which i t is possible to construct propositions about the attributes of and the differences between objects. (This does not contradict the assumption that problems are posed by communicating with the artificial intelligence in terms of LO). If expressions in L1 are described in graphical notation so that attributes appear as the branches of a structural graph, the L1images of a and b will appear as shown in Fig. 32(1). The difference between a and b is the difference structure of Fig. 32(2). Only differences that can be reduced by applying operations to the objects involved are relevant, and the difference structures corresponding to these differences are obtained by matching the input and output of an operation as indicated in Fig. 32(3). Equivalently it is possible to synthesize operations that reduce the differences that have been discerned. But these operations determine expressions in L1. Hence an operation derived from the most primitive elements in a difference structure would be inapplicable. Useful operations correspond to classes of operations capable of reducing these primitive differences and L1 must be able to represent the aggregation of these classes. (Expressions in L1 must define the relationship between the primitive and the sophisticated forms of operation.) The process used for this purpose in Newell's construction is Feigenbaum and Simon [ I Z I ] , and Feigenbaum [ 1221 abstractive sorting program EPAM (but any other abstractive mechanism could be used). The ultimate criterion for selecting an abstract difference or a sophisticated operation is the possibility of using it to generalize and make inductive inferences. We have already pointed out that only a few of the possible abstractions lead to words that are capable of generalization, and one of the chief criteria in constructing L1 must be that its pertinent expressions have this property. Newell embeds the principles of generalization in G.P.S. as heuristics that suit plausible 187
Left sign A
A
Right
Replace
B
b 4 "BAA" Di fbrence Structure
(3)
Matching Operation AAB-BAA
Difference after Matching Opemation
(4)
Generalized form of Difference
ma.32. Difference pattern.
Replace
Left
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
arguments in L*.One principle is a unity of symbols, thus one “A” is taken as the same as another “A” and one “B” is taken as the same as another “B” regardless of the expressions in which they occur. On applying these principles to the difference structure of the sophisticated operation “A” we obtain the generalized form of difference which corresponds to an extension of “A” and which is shown in Fig. 32(4). Given the comments we make in 3.2(7) and 3.2(8) on the intarpretation of goals, there seems to be no reason why principles of generalization must be embedded in such an explicit fashion. As one alternative, the system could generate its sophisticated operations according to rules that determine the way it looks a t the environment. The same generation rules may govern its abstractions from primitive attributes. Broadly speaking, the system is an evolutionary device that aims to impose its own pattern upon the environment and the principles of generalization, which we shall return to discuss in 3.2(6) are implicitly embodied in its structure. (6) Amarel[123,124] has programmed an artificial intelligence which learns t o present and prove theories. It does so by learning to build programs. Relative t o G.P.S. it has only a modest problem environment but the system is worked out in great detail and the basic ideas (in particular the need for an hierarchy of languages) are exhibited unambiguously. We shall not describe the entire system (nor attempt to detail any part of it) and the original paper should be consulted t o expand the present outline. The problem environment is a set labeled a, consisting of 16 elements that constitute the nodes of the symmetric lattice in Fig. 33 under an ordering relation “>” so that, for any pair z1 E u,z , E u,it will be the case that either z1 > z2 or z , > z, or that z1 and z , are incomparable. Let zo be the uppermost point of this lattice and let u,, ul, u2be variables with values that index z E u. A transformation T,from the product set [a,a,] into [a] is explicitly defined by a set C,, of 16, correspondences uo = T,(ul, u2).The job of the system is to learn, a t the nth move in its history, a program P, ( n )which, given any pair ul, u, in the domain coordinate of a set C;, ( n )will successfully compute the value uo = T, (ul, u,) (where C;%( n ) c C,, is the particular subset of correspondences which has been presented to the system a t its nth move). If a successful computation is performed for all pairs ul, uz and if Ca, (n)c C,, it can be argued that PTs(n)constitutes a theory about Ti when i t is expressed in a suitable language. Similarly, when the system creates and evaluates tentative programs P, (m),n > m, which are not necessarily successful, it can be argued that the machine language representations of these programs constitute hypotheses about 189
GORDON PASK
Tibased upon the evidence available at the mth move. The problem environment is open t o a number of plausible interpretations; for example, since any pair of the elements contained in a have, by definition, a complement and a G.L.B. and an L.U.B., we may write exu2)” or “Tz= L.U.B. (ul,uZ)” or pressions like “T1= G.L.B. (ul, “ T 3 = Comp [G.L.B. (ul, u,)]” or “T4= Comp [L.U.B. (ul, u,)].”
FIU.33. The lattice.
The system embodies knowledge about the structure and the extent of the problem environment and it is provided with certain basic problem solving facilities which are represented in an identified processing language LIo. The vocabulary V o of LIo consists of signs denoting elements of a* = u u empty set; signs denoting the elements of p (a set of all subsets of u*) signs for collections (processing lists) of the elements of p and signs for program statements, for association, and for equality. I n addition, there are signs denoting basic, inbuilt, logical operations like “n” and “u“and for procedures like searching a list and adding members t o a processing list. A feature of L,O which we shall return t o in 3.3 is that it is “open ended.” I n other words, additional operations and 190
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
“compound operations” can be added to V o as they are developed and, in our own nomenclature, additions of this kind amount to transformations of the form Llo L,O . . . Since we shall not pursue the action of the program in detail, and since Amarel uses a slightly different notation, we shall refer to the processing language simply as LO. The programs P , are represented in Lo as strings of Lo statements. It is possible t o replace the operations appearing in these strings by operational variables X , (with values that are operations) and the characterization of each X , (a term A , that denotes the domain the range of its value). Thus, for example, we may write “XI =’ n,A,; >, [ p , p,] +. . . . p . . . or again “ X , = Comp, A , ; [u* -+ u*].” The crucial importance of the characterization A appears in connection with relevance, or, as we called it in the discussion of G.P.S., applicability. We say that X is relevant to Y if (i) domain X = domain Y and if (ii) range X 5 range Y . Compound operations in Lo can be replaced by strings of operational variables X , = [xll,xzI,. . . xcl], and their characterization. Programs P,, are assembled in an open-ended language L’. The vocabulary V1 of L1contains strings of signs representing transformations Ti, signs denoting the initial set of operations “n”,and “u”and so on, together with signs for operations that are added to V oand strings of signs for simple and compound operational variables (and their characterization). I n addition there are substitution rules, compatible with a 1 which permit the recursive assembly of a string representing a program P,,,the initiation of this recursive process, and its termination. (In terms of G.P.S. these substitution rules are broadly equivalent to the methods for goal achievement.) The initial statement, posing a problem to this system, will have the form “ X T 1A, = Ao” where “Ao” is the initial string in L1. Suppose X, is a compound operational variable characterized by A p and that it is relevant with reference to XTIs; the corresponding string A, in L1,with Ap as its left-hand member, may be substituted in the original expression so that we obtain “X,,Aj --f X,, A,” or equivalently “Ao + hl.” The continuation rules allow for insertion of L1 strings between the strings of any suitable L1 expression while the termination rules apply when the resulting string is a completely consistent expression in U . This will occur when, as a result of substitution, the compound operational variables pertinent to the original expression have been evaluated, in the sense that each member of the set of constituents, for example, each member of X , = Exl1,xZ1. . . X,] has been replaced by an operation. The process involves testing for applicability or relevance (as illustrated by the initial substitution) and, whenever more than one alternative is 191
GORDON PASK
possible, making a decision to select among the possibilities. The organization involved can be represented as a “tree,” as in Fig. 31(2). At the point when evaluation is complete there will be well-defined paths in this tree (terminating a t operation nodes) and, if this condition is achieved at the mth move, this structure is equivalent to a string A (m)in L1that represents a program PTi (m). The program is now translated into Lo and tested. The tests are rather elaborate but embody the criteria of 3.1(4) (i). Depending upon the outcome of these tests P, (m) may be incorporated as a composite operation, it may be modified, it may be rejected or accepted for further testing against the evidence of C& (m 1). Hence the decisions made in the development of the program tree may either be substitution decisions or decisions that modify the program tree by adding nodes. I n general the results of testing are fed to each relevant decision node and in some conditions the over-all program, which mediates the assembly rules and makes the decisions whenever an ambiguity has to be resolved, calls for the generation of a novel compound procedure. (This will occur when the compound operational variables cannot be substituted in a manner yielding tentative programs that satisfy the tests.) This novel compound procedure is generated by a n auxiliary mechanism (at the moment, by the experimenter himself). Since the number of possible program trees that might develop is enormous, the realization of successful testing and control depends upon an a t least local metric in the set of program trees. Further, the over-all control upon the development of trees is an evolutionary process and a t least the program value tests and the program elaboration tests will resemble a n economic control in which the long term value of a program is pitted against the cost of maintaining the structure required for its performance. Finally, efficient convergence towards the solution of a problem depends upon the possibility of recognizing ambiguous situations which are similar, and when they recur of making a similar decision. (7) I n Amarel’s system, this feature appears in an important but rudimentary form as the transfer of decisions between the decision nodes. The same feature is broadly expressed by Minsky’s generalization heuristic of 3.2(4)(iii) which is, “Given a situation that calls for a decision, recognize a similar, previous, situation and apply a close variant of any algorithm that was successfully applied on this previous occasion,’’ It is no accident that this is a special case of an “awareness” heuristic. We might call an artificial intelligence “aware” or even “conscious” if (i) it can recognize its present configuration (as similar to its past 192
+
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
configuration) and generalize its structure by applying whatever algorithm is likely to maintain its integrity and (ii) (which we have guaranteed by embedding this artificial intelligence in a linguistic framework) if it can name an invariant of its configuration. All this leaves open the issue of what similarity criteria and what measures of success should be used. So far as an artificial intelligence is concerned, these similarities and measures must be chosen so that the matching process which underlies its problem solving activity has the caliber of “proof making” or possibly of “inductive inference” (however weakly either of these are defined). Conversely, a choice of this kind has been made, though possibly not in an explicit fashion, for any successful artificial intelligence. We stress the dissemination of “proof making” or “inductive inference.” It permeates and colors every action in the system. (Hence, ingenious and efficient procedures, such as an inductive inference algorithm proposed by Solomonoff [125], are probably too specialized, for the machine is given inductive inference as a special faculty rather than part of its character.) Wiener [ 7 ] has discussed the matter chiefly from a mathematical point of view and examined the restrictions entailed by having a finite machine and an orderly environment. Viewed a t this level, the whole issue is very difficult. It is, after all, a hoary talking point among philosophers. Fortunately, a somewhat analogous situation pertains a t the level of perception and motor behavior and the aura of mystique evaporates within this relatively tangible framework. The question in this connection is, “what kind of abstractions and what kind of program synthesizing and goal achieving mechanisms are compatible with an efficient machine?”. To avoid dispute over criteria of efficiency, let us lay down the dictum “efficient enough t o survive in a given and reasonable environment.” Wiener [ 7 ] has also examined this question. The fact is that any sensible machine must characterize the data it receives as some kind of Gestalt; and when it acts upon an orderly environment or when (at a higher level) it manipulates and tests an internal representation of this environment, it must compute universals. McCulloch and Pitts [47] exhibited the first explicit mechanism to achieve this objective and they did so in a physiologically plausible fashion. Ullman [I261 has recently built a device, capable of learning to organize motor activities constrained by the possible motions of the joints in a limb, which has the required properties. The simplest case will illustrate the kind of restriction that is needed. To abstract a Gestalt, the relevant test operations of an artificial intelligence must be transformations that belong to a finite group. 193
GORDON PASK
Further, it must be possible to define a functional (which may correspond to a perceptual attribute) that assigns a unique number to each transform of some input object that belongs to this group (in other words, to each test transformation of an input), The group average of this functional (taken over a space of the parameters that index the several transformations) will determine an invariant. (In many cases an average taken over a suitable subset of points in this parameter space will have the same property.) Finally, an ordered set of group averages is a Gestalt. A matching process may compute a universal if the selected operations are transformations (indexed by parameters) that belong to a group, and if there is a monotonic measure of distance between the transformed input object and a subset of points (or concept, according to our previous definition) in the space of perceptual attributes. The process does compute a universal insofar as i t selects operations to minimize the distance concerned. The group properties implicit in this special case are more rigid than necessary but indicate the kind of restriction that must be applied. Insistence upon an hierarchy of control rather than an hierarchy of abstraction, in 2.7 and 2.8, guarantees that restrictions of this kind are built into the system. (8) An abstractive mechanism is a basic component of any artificial intelligence [28]. In Amarel’s system, expressions X in L1 denote assembly operations that build an hierarchy of more or less abstract programs (denoted by expressions in LO). I n a different sense, the denotation of L1 is an abstract representation of some of Lo. (In this case there is a further distinction of type.) Similarly, Newell’s G.P.S. contains an abstractive routine. A number of other mechanisms have, however, been devised, many of them including some learning capability. Although most of these are oriented to visual pattern recognition, the input domain is irrelevant and it would be legitimate to use them for recognizing patterns of programs or parameter values. The simplest abstractive schemes involve sequences of tests which are conveniently represented as a test tree, in which each node corresponds to the sorting of an input item. Hence the input to the program appears a t the uppermost node of the test tree and the output consists of the selection of one of the lowermost nodes, indicating that the input item has passed a unique sequence of tests. Combined with list programming techniques, as in a program written by Banerji [ I 2 7 ] , rather elaborate sorting strategies can be conveniently instrumented. Data are retained in lists, but (at the simplest 1evel)no real learning takes
194
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
place. The sorting criteria are not changed or replaced, as a result of the system’s experience of previous tests. The next degree of elaboration is introduced with the possibility of replacement and modification which appears in a very efficient pattern recognition program due to Vossler and Uhr [I281shown in Fig. 34(1). The input is a retinal matrix on which is projected a binary input pattern (black and white elements). The system looks for features of the input pattern by matching feature operations, which are predefined binary submatrices, against the input on the retinal matrix in all of a set of positions. At each location, a matching test is made between the input and the feature submatrix. Further operations combine the test output derived from this lowest level of the system to determine higher order features. The system is externally reinforced and the lowest feature submatrices are evaluated for their degree of relevance and of discrimination. Useless features are discarded and other features provided to replace them. (One feature generating algorithm is to copy some region of an inpiit pattern as a test operation which ensures that a t least one test is passed for one input pattern.) Depending upon the feature generating rules, the system may (or may not) be called evolutionary. If there is a mechanism for variation and recombination of the existing features and if there is some form of economic or competitive constraint, whereby the system is forced to be provident regarding the number of features used, by levying a cost for their structural maintenance, then it probably is evolutionary. The VossIer and Uhr program closely resembles the parallel system pandemonium” which was devised somewhat earlier by Oliver Selfridge [114]. A typical “pandemonium” is shown in Fig. 34(2). The lowest level elements (which may either be subcontrollers interacting with their environment or the feature recognition programs we shall assume for the present discussion) are whimsically referred to as “demons.” These demons are supposed to perform a parallel computation and their joint output, indicating the attribute values of the environment, is abstracted by higher order or middle “demons.” At this level, the weight attached to the output of each demon may be adaptively modified. The resulting signal, from the middle demons, is conveyed to a decision-making system which uses this evidence to support one of several alternative hypotheses about the state of the environment (or, if the environment is a retina, about the pattern displayed on this retina). The connections between the middle demons and the decision elements are weighted (and these weights will certainly be adjusted by feedback controlled adaptation). The feedback or success information, delivered by the decision element, is also supplied 195 ((
w
S e l e c t Submatrices a b -and Match Against Various Locations
Abstract Properties from Successfully Matched Submatrices
a,&
in Retinal Matrix I
t
I
of Submetrices
Assess the Weighted Output from Program by External Reinforcement
Pattern Recognition R
I1
Over-all Demon
External Reinforcement
_----
L o w e s t Demons
FIG.34. Vossler and Uhr’s system and Selfridge’s Pandemonium system.
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
to a process which selects lowest order demons. This typically parallel process resembles the information feedback in Amarel’s system, t o each of the active decision nodes. As in the Vossler and Uhr [I281 system, some of the features computed by the lowest order demons will be acceptable whereas others are likely to prove useless. The latter must be discarded and replaced and even the component features may be modified with advantage. Consequently, depending upon the reports received from the over-all decision making element, these feature detecting programs are deleted or altered. They could, of course, be improvidently replaced by chance variation, but several alternatives are possible. I n the simplest case, it may be sufficient t o preserve some of the experience gained by the system by creating new demons from parts of previously successful demons. This is a method of “recombination” of parts. An alternative procedure is to exploit the cooperative interaction that can be encouraged between members of the lowest order demon population if they are provided with a suitable language for communication. Cooperation will take place, as in 2.8, if these lowest order demons aim to survive in a partially competitive environment. I n this case, the feedback from the over-all decision-making component is used to determine certain evolutionary rules (rate of aging, payoff distribution, or the parameters cited in 2.8). The important point is that the demon population is a program (or a set of programs, one for each demon) that is embedded in a n overallprogram structured to sustain evolution. The species of demon that survives will (i) be able to thrive in the conditions that are maintained by the over-all decision maker and (ii) will be able to interact cooperatively with other members of its species. Regarding these demons as constituents of a program, cooperative evolution implies that its elaboration is maximized. The cooperation rule may engender further advantages by way of computational stability. Finally, there are abstractive mechanisms like EPAM. The original EPAM program was devised by Fiegenbaum to stimulate the memory and recall of items such as word or pattern lists. An input object (or pattern) is processed by a tree of feature tests and its image A is assigned to a certain terminal node of this test tree, say C. Suppose that C is already characterized by an image B (of one or more previously processed inputs). The present image A is matched against B , and if A is identical to B then A is assigned t o C. On the other hand, if a difference is detected, a further test D is constructed t o distinguish between A and B. Now the image A is assigned to one terminal node of D and the image B to the other. The test D is finally attached to the 197
GORDON PASK
original test tree a t the node C and the test develops in this fashion, Later versions of EPAM, developed by Fiegenbaum [I221 use a parallel type of associative memory (which, in the present design, is simulated as a sequential mechanism by dint of association lists), Suppose the environment contains composite objects made up from relatively familiar simple objects. (We assume that feature recognition has already taken place.) Thus the input may be a composite object rn which includes a pair a and b of simple objects that are members of a, list of possible simple objects a, b, c, d . As before, the system derives images A , B , C, D of simple objects a, b , c, d, and in the, case of m as an input it will derive an image M with component images A and B. The image M is now associated with the images A , B, which have been produced by learning a variety of composite objects having these constituents, for example, with an object p containing b and d with an image P that is associated (like the image M ) with the image B , but not, in this case, with A . Hence the process of abstraction is intermingled with memorization and involves building associations between relevant subsets of images. (The novel composite object is abstracted and memorized in the context of the familiar images of its constituents.) The contextual plan is particularly evident in the reprocessing of data when the images of partial objects act as cues that recall sequences of other images from the association system. As required by the matching paradigm of difference learning, essentially the same system is used for abstraction and for the synthesis of operations.
3.3 Intelligent Artifacts (1) The apparent gulf between the physical artifacts cited in 2.6, 2.7, and 2.8 and the computer simulations of artificial intelligence is
filled by relating the states of an artifact to the symbolic entities of programs and problem environments. Briefly, we need to render a sensible correspondence between signals and messages. Mere identification presents little difficulty (a simple construction is given in 2.6 and in 2.7) where a system is related to an identified language, although fairly elaborate procedures are sometimes needed, for example, in denoting the stable modes of Beurle’s[68,69], Agalides, [ l 2 9 ] ,Caianiello’s [15],or Farley’s [70, 711 system as signs in a processing language. (One method involves a wave guide structure that delivers a characteristic plane wave front to a block of medium having an artificially elevated average threshold level, in which the input wave is attenuated to a characteristic excitation of a single element of the medium.) 198
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
However, this is only a part of the tale. Whatever the identification, the symbolic structure must reflect the form of the underlying mechanism. In this respect a physical system is much more restrictive than a computer simulation. This may or may not be an advantage. (2) Any competent artificial intelligence must compute universals, and the chance that an arbitrary set of abstractions and an arbitrary set of operations and synthetic procedures will furnish us with a system that does compute universals is remote. Do the restrictions imposed by physical and mechanical laws help us in this respect? Is there a greater likelihood that our systems will compute universals if we abide by these laws? I n principle, the answer must be “yes.” A brain, after all, is a physical artifact and any brain (barring, perhaps, the simplest) does compute in the desired fashion. But in practice, the effect of physical restrictions upon the design of an artificial intelligence has only been considered in special cases. Greene [76, 771, for example, discusses the requirements of mentation (pointing out, among other things, that an artificial intelligence has a specific organization such as the kind proposed by Wiener [ 7 ] and that it must aim t o impose its organization upon its environment). He goes on to suggest certain analogies between the current mathematical analysis of physical systems and the entities that characterize the transformations executed in the artificial intelligence. In fact, Greene is chiefly concerned with non-linear oscillatory networks of the kind considered by Beurle and Farley, but on the assumption that the behavior of a large ensemble of these systems can be approximately characterized by a set of linear equations. He seeks to establish relationships between the quirks of symbolism that appear as necessary features of his model of mentation and certain modes of oscillation and their properties. Thus, symbols must carry implicit information about the distribution of other signs as well as acting as marks (like the “images” in Fiegenbaum and Simon’s program which bear contextual as well as specific data), and it is argued that special oscillatory modes do have this property with respect to the distribution of other nodes. Again, having identified symbols with a suitable set of oscillatory modes, the stable (or resonant) modes of oscillation evocable from the system for certain parameter values are analogous t o the Gestalten of perception. Although this work is very stimulating, a more general approach is probably needed. Is it possible, for example, t o build a statistical mechanics of computing systems and physical systems alike? If so, the gulf between signals and messages is filled in a curiously elegant fashion. 199
GORDON PASK
Work is in progress in various quarters; but although its direction has been indicated, for example, by Wiener in his recent lectures a t Naples, there are no publications, so far. A still more generalized approach is adopted, by Churchman [130] and, in a very different way, by Petri [I311 and Gunther [132]. (3) The processing languages of a competent artifact are almost certainly open ended, not only in the sense that terms are added to the vocabulary but also in the sense that the meaning attached to the existing symbols is changed. Hence, the descriptive framework we have adopted with languages L,” could be more accurately (but less clearly) replaced by a language that evolves. (4) The point is illustrated by an artificial intelligence due t o Fogel [133].The meaning of a symbol is its denotation of an interval (between a pair of threshold values) along the coordinate of an input variable x. Thus i may be assigned to values of x between the thresholds Tiand Ti+l. If the input sequence z ( t ) is nonstationary, Fogel argues that, in order to satisfy a number of plausible criteria, such as
(i) maintaining the transition probabilities between symbol sets within bounds, so that the symbols are useful elements in a probabilistic model and (ii) maintaining a reasonably informative correspondence between the symbols and the events they denote,
it is necessary to adjust the values of the Tiand, consequently, the meaning of the symbols so that the probabilities of symbol occurrence, pi,are roughly equalized. One strategy for equalizing the pi by constructing a sequence Ti( t ) is described in a recent paper. 3.4 Some Difficulties (1) I n each of the systems we have examined, the problem is posed and the goal is specified in L1.It would be possible to provide a further language L2 capable of expressing different kinds of problem and alternatives (in the sense of different kinds of problem) and different forms of goal. This expedient has not, however, been adopted although Newell discusses the matter, (Newell points out that G.P.S. is a machine that works in one “sense modality” and he cites the need for a higher order problem environment, denoted by L2, for a comprehensive goal-seeking activity.) I n fact, failure to represent a universe of goals within the artificial intelligence is tantamount to viewing the artificial intelligence as a 200
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
calculator. The idea of giving the system advice by way of heuristics is fictional. Really, we tell it what to do. This criticism can be partially avoided when simulating a population of problem solvers communicating in Lo and in L1.George [134,135],for example, has programmed a variety of game playing systems that interact with one another. I n some of them the play involves a choice between types of action and types of outcome, and it is necessary to allow for communication of these choices in a language L2 over and above Lo and L1. This is particularly true if the game playing machines are required to settle whether or not they will cooperate with one another and, if so, to bargain over acceptable terms. On the other hand, George must determine the sort of population that he is considering and must embed certain common criteria of success in each member (otherwise communication between the members could not take place). To what extent does this commit us to viewing the population as a set of calculators? The crucial feature seems to rest upon the way that the hierarchy of languages is constructed. Like Lo and Lxin the systems we examined in 3.2, the languages will be “open ended” in the sense that transformations “Lqu+L7, * . .” can occur as a result of experience. I n a certain sense, no transformation of this kind‘is able t o create an essentially different problem solver (for no further axioms are introduced into the logical system which the language denotes). On the other hand, a -+ . . .” can and, except in very transformation of the form “L“w-+Lqu+l special cases, must involve such a distinction for the metalanguage5 L7+ldenote a system embodying certain axioms that are absent from L7.If the population opts out of the game altogether or if some members decide t o play a different game using rules and reinforcements that the experimenter had not recognized, then such a transformation can be inferred. Although it is convenient to envisage members of a population of machines that cooperate with one another when solving the problems posed by a common environment (communicating a t various levels in order to achieve cooperative activity), the same comments apply to cooperative interaction (and the necessary degree of communication) involving functional parts of a single mechanism. But, if this mode of interaction exists within an artificial intelligence, then there must be some kind of parallel computation. This, in my view, is the chief significance of the sequential or parallel dichotomy of 3.1(5). If there is parallel organization, as there is, for example, in a “pandemonium,” then cooperation may occur and the set of expressions that are messages communicated between the parallel components a t any instant determine the name of a Gestalt. 201
GORDON PASK
Whether or not a parallel organization does, on some occasions, require a parallel mechanism is an open issue. (2) Any cooperation depends upon some kind of communication between members of the population. When each member is able to build its own interpretative and synthetic programs, stable cooperation depends in a critical but marginally understood fashion upon communication a t different levels. There must be a level of discourse (level U perhaps) that conveys instructions and intentional statements (in contrast to the object language expressions in LO). Since A can adopt different ‘(views” of the environment and different “attitudes” to the environment, cooperation with B is impossible unless A can inform B of what these are, by dint of expressions in L1. If A and B are jointly matching a collection of objects, for example, the process can only take place if these objects are commonly represented. Otherwise they cannot be compared.
(3) Maron [57] points out that rules of logic and of sign substitution are constraints that determine what cannot be done. They do not guide a system in selecting what should be done. I n particular they are not useful decision rules. By providing a linguistic hierarchy, we give an artificial intelligence a framework in which it can abstract from the state of its environment and synthesize programs that select among relevant environmental operations. By introducing heuristics, we ensure that the system is not utterly stupid, that is, concepts (in the sense of 3.1) are intelligible, and we remove as much uncertainty as possible. However, some issues are undetermined and, if they are encountered, a decision is required, (In fact, (‘decisions))are needed quite often, to choose between plausible alternatives. But we shall emphasize the ((undecidable))situation where no substitution rule is available.) Now, as Maron argues, none of this structure determines what should be done under these circumstances. However, a problem solver that survives must select some operation and an artificial intelligence must, in addition, communicate its choice. Over the ensemble of systems, no alternative is favored. Hence, for any one system, A , the situation is resolved by an arbitrary selection R, of a sign V , from a relevant vocabulary. Of course, the selection made by R, is individually significant. It is an index o i individual A preference, for operations or for forms of program, depending upon the vocabulary. Suppose V , E V 1 so that V , is a statement of an A preference of the latter kind. To show this, imagine a couple of systems, A and B , living in a common problem environment and communicatingin Loand L1. AproblemM 202
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
is posed (an expression in Lo)and one member of the pair, A , adopts a strategy denoted in L1 as ,A (system B is informed of ,A in Ll).At some point, ,A terminates undecidably since no substitution rule is available. The undecidability is also manifest t o B. Hence, when R, selects V , E V1 to resolve the situation, B interprets V , as an A ( I preference.” (It is one case of a mapping from the states of R, into a subset of Vl.) To compare (‘preference,” B selects ,A if A selects A, and the denotation of V , is matched, by B, against the denotation of V , (selected by RB). (4) The need to maintain a certain rate of action, or rate of application of operations, is derivable from the need to maintain a positive rate of adaptation. The latter requirement is a consequence of the isomorphism between the organization of an artificial intelligence and it type (iv) self-organizing system. At some point, the association learning of 3.2(3) (which is an admissible form of adaptation) must give place to the “concept learning” of 3.2(3) (because, as in 2.6, a stable self-organizing system must change the domain with reference to which it adapts its behavior, in the present case, its symbolic behavior). I n terms of “openended” languages, the transformations of concept learning may either be (i) denotative LI;))+ L;+l,a novel sign, Vo and its denotation is adjoined to V,q to form Vi+l; or (ii) constructional, when Lv,,-+L;+l due to embodying one or more novel axioms. Mode (i) corresponds t o the change of attention and mode (ii) to the metalinguistic construction, considered in 2.7 and 2.8.
(5) There is a necessary and important distinction between the approach of a biologist and a logician (or a scientist who programs a computer) when faced with the issues of artificial intelligence or of self-organizing systems. We have argued that the effective construction of an artificial intelligence or a self-organizing system will always involve an artifact able t o build an hierarchy of metalanguages. Now the biologist and the computer-oriented logician are both concerned with a realizable artifact. But, in a certain sense, the biologist can take the whole of natural evolution to be the artifact concerned and the individual brain one member of a specific class of end products. This allows him, in fact, to regard the construction of an hierarchy of metalanguages as plausible and realizable. Most of the construction occurs in natural evolution. Some facet of this capability is embodied in the medium of an individual brain. The computer-oriented logician cannot accept this point of view. I n a sense, his program must exhibit both the historical or maturational and the immediate proclivities of an intelligent or a self-organizing
203
GORDON PASK
system. Now if the program is to be realized, there must be some way of limiting the proliferation of the metalanguages that are needed, in our formulation, to express the distinct levels of control or instruction. As Gorn [I361 points out, the expedient that is normally adopted consists in using a “mixed langiiage” capable of expressing various instructions (control instructions, descriptions, object designations) in place of an hierarchy of metalanguages. (The hierarchy corresponds to a stratified and restricted system.) Gorn also points out that the advantage gained by such a system of “unstratified control” is bought a t the price of a certain “pragmatic ambiguity” in the sense that some expressions in the mixed language are open to various interpretations. Now we comment that the biologist and the computer-oriented logician are not really a t variance. The brains examined by a biologist do exhibit unstratified control and the languages in which they communicate internally or externally, must be deemed mixed languages. Only the biologist is a t liberty to regard a brain or the whole gamut of presently considered brains as one stratified subsystem of a large stratified evolving system; whereas the computer-oriented logician, lacking this possibility, must give explicit consideration to the mixed language in which he describes his program. Similarly, we may use the necessity for some explicitly developing hierarchy of metalanguages, or some explicitly “mixed” language and the use of “unstratified” control to counter the criticisms of 1.1. “Pragmatic ambiguity” and, indeed, some embodiment of paradox are necessary and completely harmless consequences of rea.lizing any artificial intelligence. Their appearance is a structural matter of fact and is no argument against the conception of an artificial intelligence or a self-organizing system as such. 4. Other Disciplines 4.1 Psychological Models
Since any realizable logic comes within the compass of psychology, there is an obvious relation between psychological theories of problem solving, learning, and perception and the designof anartificialintelligence. A t the most abstract level, any well-validated, man-made, heuristic is a candidate for embodiment in an artificial intelligence. Conversely, *It is always possible, in principle, t o obtain an unambiguous statement of what gocson. I n thc“purc”casc of logic, the paradoxes are avoided by introducing a distinction such a s thv typc tliatinction, i n the theory of logical types. Sirnilarly, in the present case we can (as in 2.8.), introduce a distinction of logical type to avoid ‘hcchanical” ambiguity (when the descriptive mctalanguagc for tho system must contaiii a theory erf types or its equivalent). There is, however, no need t o cto this. The distinction would serve our own ideas of completeness rather t h a n the functioning of the system.
204
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
any heuristic that works in an artificial intelligence constitutes a testable hypothesis about mentation. At the level of organization, Hovland’s [137]definition of a concept is isomorphic with the widely acknowledged construction we have adopted. Using this idea, Hovland and Hunt [I381 have devised computer programs which may equally well be interpreted as a simulation of concept learning in a human being and as parts of an artificial intelligence. Similarly the TOTE hierarchy used by Miller, Pribram, and Gallanter as their descriptive framework for mental activity is isomorphic with the hierarchical organization embedded in an artificial intelligence. The fact is that psychologists and computer logicians are now using the interdisciplinary terms of cybernetics to describe their problems and their results, aiid the fact that more is gained from the exercise than a mere change of nomericlature is a welcome justification of this science. These analogies are fairly recent. Many others exist; for example, the mass of work due to Piaget [I401 (Flavell [ 1391)upon the maturation of various faculties in the human being (the ability to appreciate the persistenceof objects, todetectinvariants,and tocomprehend number) provides a myriad of clues to the development of these faculties in an artificial intelligence (and although we have not yet stressed the education of machines to reach an acceptable standard of competence, this is one of the most important issues). Bartlett’s [I411 work on memory and Craik’s [I421 on perception suggests the proper choice of the similarities and invarients of 3.2(7). The ethologists, starting from the empirical foundations laid by Tinbergen [I431 and Lorenz [lad],provide rules for hierarchical organization in any system and its environment that must, it seems, be obeyed by either organisms or machines. Hull [la51 and later Hebb []as],Broadbent [147],Brown [148],Gregory [lag], Mackworth [150],Milner 11511, Barbizet [152],and Welford [I531 are among many psychologists with a mechanistic bent who have, in fact, described certain subsystems of a n artificial intelligence. Finally, the experimental methods of psychology have influenced and have been influenced by the experimental situations used to test an artificial intelligence. The very different methods of Skinner [I54,1551 (reinforcement learning and behavior shaping), Harlow [156],and Bruner [I571 [as in the system of 3.2(2)] are applied to the artifacts while, on the other hand, the study of these artifacts is gaining admission as a proper concern for comparative psychology. 4.2 Physiological Models There is no necessary connection between the physiological mechanisms in a brain and the mechanical structures in a n artificial intelligence. Thus a brain is largely aparallel computer and, although it is safe enough 205
GORDON PASK
to assert that the organization of an artificial intelligence is also parallel, this fact need not imply a parallel mechanism. This point of view can be justified even if we insist upon functional identity between the behavior of a brain and of an artificial intelligence. (1) Consider an artificial intelligence A and a brain F which are supposed to have some behavioral property P that is detectable by tests T,. Now we know that P is, a t the moment, ostensively defined and tZhat we cannot actually list all of the attributes of P or all of the tests T, that may be relevant. However, the activity of the artificial intelligence, A , may be known to depend only upon a collection of subsystems B , that compute a function R while the activity of P may be known to depend only upon subsystems G (such as neurones) that also compute R. Now if there are tests T , which exhaustively specify R (in the sense that R is a consequence of certain basic physical principles) we may, while aiming for behavioral A , P, identity, replace the subsystems B by any other convenient subsystems that compute R. If, for example, R is “AND” and the original B units are thermionic “AND” circuits, these can be replaced by their transistor analogs. But the argument only holds true if R is exhaustively defined. If certain features of F, other than those revealed by the T,, could be relevant to P,then B should be made as nearly like G as possible. At the level of neurones, for example, i t would be injudicious to assume too much about R. The unitary organization may, in fact, involve glial cells, as proposed by Hyden [158], Galambos [159],and others. The physical events that act as signals may be impulses or phase relations between impulses or the average rate of impulses as argued by Mountcastle. On the other hand, it would be equally injudicious to imitate a brain a t the biochemical level. Almost certainly it is possible to achieve the exceptionally stringent objective of behavioral identity between A and F by using a more convenient material than protein. This is certainly true if behavioral identity is relaxed to similarity. All the same, the brain is probably the best provider of design principles and of the hunches that guide intelligent guesswork. A detailed survey of brain models and their empirical validation would be out of place. We shall, however, select a few cases to illustrate the interaction between physiology and the artifacts.
(2) The over-all principles of a type (iv) self-organizing system are supported by hroadly specified models such as an integrative structure proposed (and largely verified) by Anohkin [IGO].The role of unspecific oscillatory mechanisms (of the kind simulated by Beurle, Caianiello, and Parley) is confirmed by Magoun’s [I611 work. 206
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
Recent experiments by Jasper [162], McCulloch and Kilmuir [163], and others substantiate the existence of attention mechanisms embodied in the reticular formation analogous to the mechanism which we argued to be a necessary part of the artificial system. Braines and Napalkovand Setchvinsky [164,165],and Napalkov [I661 haveunearthed an hierarchical organization of reflex systems, open to some modification by conditioning, which corresponds to an hierarchy of algorithms. Bishop [167], using the data of comparative physiology, argues that a brain is an orderly and hierarchical concatenation of more primitive brains, some of which no longer serve their primitive functions. (This structure will almost certainly exist in any artifact that evolves.) Finally, a t a more detailed level, Uttley [26, 271 pioneered the hypothesis that a brain is analogous to a conditional probability machine (which he demonstrated by building a number of artifacts) and that learning entails the reduction and differentiation of connections between its components. (3) Uttley’s model lies in between the physiology of over-all plans and a set of fairly specific models, for brain activity. I n the latter connection, the mechanical consequences of Pavlov’s [168]pioneering work on systems of conditioned reflexes have been exhibited by Grey Walter [91].The problems of coding have been examined by Barlow and Donaldson [169] and by Agalides [I70].Specific feature detectors, realized as neurone networks that act as filters, are demonstrated by Letvin, Matturana, McCulloch, and Pitts [I711 (frog’s eye), Hubel and Wiesel [172],and Reichart [I731 (themechanism of lateral inhibition). The statistical histology of the brain is undergoing active investigation by Braightenberg [I741 a t Naples, while Schade [ l 7 5 ] , in Amsterdam, is adding to the data published by Scholl [176],which was used in Beurle’s simulation. The recent data in this field are available in the proceedings of an interdisciplinary symposium organized by Gerrard [lY7]. (4) One difficulty that besets the use and interpretation of the available data is the fact that brains and self-organizing systems are unlocalized automata in the sense of 2.8. Usually this implies a lack of correspondence between the anatomy of a brain and the functions it computes. Computation is distributed in the sense that one part of the job is done in parallel by several groups of elements such as neurones. Again, identical components serve different functions upon different occasions, and it is hard to find a tangible embodiment of the rigid organization that clearly exists. Fortunately, nature provides a few special cases in which the anatomy of a beast’s brain and the computation it performs are closely correlated. 207
GORDON
PASK
When the system is examined, the organizational picture turns out to be a curiously accurate replica of the ideas we have voiced. We may hope that these oddities of nature represent specializations in which the plan is preserved intact, although it is mediated by a localized and tractable mechanism. One outstanding case is the visual system in the frog, investigated by Lettvin, Matturana, McCulloch, and Pitts [ l r l ] , which images the requirements of 3.2(7). There is a mapping from visual domains in the retina to corresponding domains in the colliculus. For each domain there is a neural network which filters four chief attributes of the state of a frog’s environment. In the context of the possible and relevant actions of the animal, certain classes of invariant, derived from subsets of points in a space with the four attributes as its coordinates, have the caliber of universals. Another case is J. Z . Young’s [178] recent analysis of the octopus, which shows the beast t o be an hierarchically organized homeostat in which the different levels of matching activity involve specific sensory modalities, each of which has a phylogenetic significance (the tactile modality, for example, the simple distance reception of the eye and the visual pattern-recognizing system). Each level of homeostasis (or matching) has facilities for adaptation and, in a slightly restricted sense, it acts as a sign system. Finally, these strata are coupled by amplifying systems, and the reinforcement for adaptation a t a given level is derived from the output of inferior strata which may account for the creature’s ‘(drives” or, in the sense of 3.3 its “preference” for a particular outcome. I n the octopus, in other words, the (‘mapping from R into Y” of 3.3 is a mapping from a set of systems that structurally represents its phylogency into the states of the currently active control system. Finally, as Young [I781 points out, the octopus is peculiar in possessing a brain with functionally localized parts. Hence its organization is readily detected. We may hypothesize that a similar organization persists in other brains where the functions are distributed and where their pattern is consequently obscured. 5. The Interaction between Men and Their Intelligent Artifacts 5.1 Present Position
We have examined those artificial intelligences that are structured at the outset, by design, to a degree of competence which (in the case of a tangible realization of the system) is sufficient t o maintain the form of the artifact. The d e s i g n entails embedding certain basic structures 208
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
of language (to mediate coherent communication), of organization, and of heuristics, stemming chiefly from logic, that are applied within this prescribed structure. In 4 . 1 and 4.2 we briefly considered the otherthan-logical origins of certain heuristic and organizational constraints. Let us now discuss machines that are educated rather than designed.
5.2 Educable Machines (1) As the limit, conceive a Tabula Rasa, realized as a network of uncommitted components, of indefinite extent. The network is either (i) a crassly overconnected system in which case there is a mechanism, controlled by a parameter 0, to reduce the coupling between these components, unless they are jointly active, or (ii) the network is slightly connected, in which case 0 controls the production of association between active subsets of these components. As Uttley points out, the former mechanism is t80be preferred. This network is embedded in some environment that is of interest to and is possibly controlled by an observer who aims to train or educate the network by varying 0 (the “reinforcement” variable) whenever he approves of the behavior produced as a result of the network’s activity. In the simplest case 0 is binary. (The observer can approve or disapprove, allowing or inhibiting the consequent adaptations of this network.) Of course, this is the reductio ad absurdum of the least tractable kind of perceptron. For a sensibly large array of components, an observer’s chance of training the network is negligible. (2) Nobody is likely to doubt the need for some constraints though the form they should assume is arguable. For modest arrays it may suffice to provide a many-valued 8 or, better still, t o allow some discrimination by making 0 a vector of many-valued variables. I n this case, if the observer can detect the proximity of the behavior to his ideal, i t may be possible to secure practicable adaptive convergence. Its rapidity depends upon how “good” a training routine is presented to the network. But there is no really adequate criterion of what a “good” training routine looks like, except in some special cases. (3) Alternatively, it is possible, as suggested in 2.8, to constrain the medium of this network so that computing systems are likely to evolve in it. Next, it may be possible to embed evolutionary rules which allow the observer to predict the forms of the organizations that will evolve or even to predict the instants when they will evolve, so that he can take advantage of any chance to “imprint” items of data. Once again, success depends upon the effective sequencing of stimuli, 209
GORDON PASK
particularly those to be “imprinted,” although in this case there are a number of fairly efficient principles. (4) Finally (and perhaps in addition) the constraints can be designed to open up the possibilities of communication with the machine. The idea, in this case, is that the observer will become a participant observer [assuming, so far as possible, the status of one of the machines in the cooperative population in 3.4(2)]. Apart froin communicating in Lo, to determine the stimuli and observe the responses of the training routine, the observer must bargain with the machine, which entails “preference” in the sense of 3.4(3),and must replace “reinforcement,” which assumes a predetermined “preference,” by persuasion and compromise. Ideally, a participant observer will aim to have a conversation with the machine he is educating. For this purpose, he will adapt his mode of communication to suit the state of the machine and he will try to achieve a compromise rather than a well-defined objective. (He may start off with some idea of an ideal inductive inference machine but he will modify this idea in view of his experience providing the machine performs Some kind of inductive inference.) I n order to reach a compromise, the participant observer must glean some information about the behavior that the machine prefers (that its state or its history renders more probable [17911. I n fact, very little is known about the best way tointroduceconstraints that allow the necessary communication to take place. The data that are available come from a field that seems remote from artificial intelligence, namely from “man-machine interaction” studies and the more scientifically disposed studies of “teaching systems” [180, 1811. For the present discussion, we shall consider the issue within the compass of man-machine interaction. 5.3 Dividing Labor
There are many jobs a t which man is somewhat inept. Hence, he uses various tools to aid him. The tools may be adjoined to his motor output (manual aids like cranes, pliers, and hemmers) or adjoined to his input (microscopes, telescopes, and data processing devices). Hence, these tools either perform transformations of input data or output data. The parameters of these transformations may be invariant or changed a t the man’s discretion as when he changes the microscope objective or the speed‘of a crane. Trivially, parameter variation is a result of assertions in L1 compared to the input of assertions in Lo. Other tools are interpolated in man’s problem-solving or thinking process. They carry out procedures that the man agrees to be rational 210
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
but which he cannot carry out himself due to limitations of computing rate or memory capacity. One of the earliest tools of this kind was the combinatorial wheel devised by Gardner [182] and widely used by the theologians of his day, for example, in ascertaining the possible property combinations of the angelic host. A more recent device is the desk calculator. I n each case, the tool obeys the instructions delivered by its user, unless it is defective. Further, the user can always opt out of the situation. He, alone, decides that the tool is relevant. The position is slightly different when man is not allowed this freedom and, in some adequate sense, subscribes to the rationality of the tool at the outset and agrees to remain part of a joint system while a job is completed. This occurs, for example, in a system examined by Edwards [183]. Computers are ideally suited t o calculating a Baye’s estimate p(hypothesis 1 data)
=p
(data 1 hypothesis) p (hypothesis) p(data)
7
~
which a man does slowly and with difficulty. But the machine cannot normally calculate the term p (data I hypothesis). On the other hand, a man, providing he agrees to the rationality of producing a Baye’s estimate and providing he receives the currently calculated value of p (hypothesis 1 data), can easily appreciate the value of p (data I hypothesis). Consequently, in the system described by Edwards, there is a fixed division of labor between a group of men who estimate the value o f p (data hypothesis) from the input to the system, given the current value of p (hypothesis I data) and a machine that calculates this value. This system depends upon the fact that there is an agreed goal and the division of labor is fixed. However, it is easy to conceive systems in which these constraints are very considerably relaxed. The user may be able to make the machine adopt different modes of problem solving, and t o choose these as a function of his previous interaction with the machine. Insofar as this entails knowledge of program assembly (in the case we have cited, it would not, but if a learning artificial intelligence program replaced the Baye’s estimate program, it would), the interaction will entail L1 in addition to Lo (and in this case nontrivially). Further, if there is any issue of conflict between the goals of the man and the machine, it is difficult to say a t what point the machine deliberately modifies the data in order to induce a certain choice on the part of the man. We comment that an elaborate computation hardly ever yields a uniquely optimum output and that a machine can learn t o modify the data without falsifying the data. Again, if there is some conflict between the goals, some man-machine competition as well as co-
I
211
GORDON PASK
operation, then the man will adopt the same persuasive gambits as the machine. To take the process one stage further, the machine may make suggestions, phrased in L‘, about methods of problem solving. If the L1 proposals advanced by the man and the machine disagree, the issue may be decided according to an independently computed measure or relative merit (rather than allowing the man to have the final word). A man-machine system of this kind deserves the epithet “symbiotic” [ I 8 4 ] . The man-machine interaction has the logical form of a conversation. It is perfectly true that the man is teaching the machine (for the machine must learn to make suggestions that suit the man, to code the data in a fashion he deems intelligible, and so on). Rut it is also true that the machine is teaching the man. The entire “symbiotic” system is an artificial intelligence that cannot be partitioned. From another point of view, the machine is a medium in which the man can exteriorize some of the mentation that normally goes on in the medium of his brain. Equivalently, it constitutes a medium, like the man’s brain, in which the computing system responsible for this mentation can evolve. The machine learns to become a medium of the most suitable kind. The man learns togain cooperation from the machine by exteriorizing his problem-solving activities [185]. 5.4 Adaptive Machines
Systems of this kind have been fabricated and are fully considered in other papers [186-1921. Most of them have been used as teaching devices though a few have been designed as aids to performance. The skills and problem solving tasks embedded in these systems have, so far, been simple, but we have argued that there is no reason why the same system (referred to a more elaborate task) should not be regarded as a mechanism for educating an artificial intelligence. (The fact that the existing apparatus is biased to educate the man is a quirk of detailed programming). A number of conditions must be satisfied as a prerequisite for designing such a system. These can be asserted as axioms that determine a “structured skill” or “structured problem” environment. To satisfy these axioms LO and L’ must be defined, for communication between man and machine. The problems denoted in Lo must reduce to subsets of different types of problem, these being named in U.Within each subset there must be operations that partially solve each problem as well as primitive classes of operations that completely solve each problem. Hence, for each problem type, there is a method for 212
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
simplifying a problem. A t least some of the problems will be more effectively solved by applying compound operations, specified by suitable expressions in L1,which are not members of a primitive operation class. (In the case of a structured skill, this amounts to insisting that there is a t least some positive transfer of training between its constituent subskills.) Finally, assume that man and machine are self-organizing systems that maintain a certain rate of adaptation and, in the case of a teaching system, adjoin an axiom of preference (that given the chance of adapting in a fashion that leads to the more effective performance of a skill the man will prefer this particular form of adaptation). If the preference axiom is satisfied, we call the man a student. Given these conditions, it is possible to construct an adaptive teaching machine which; in a sense that is fully considered in other papers, delivers an optimum instruction. The design of the simplest machine is isomorphic with the system of 2.5. The mechanism A selects, among subsystems ,ZT assigned to problem types. The selected subsystem Er selects a variously simplified sequence of problems from a problem type. As in 2.5 the adaptive machine aims to maximize the rate of change of behavioral redundancy and the preference axiom permits identification between this index and an index of learning rate. I n the simplest case the adaptive machine also selects among the problem types in order to maximize the expected value of the learning rate. The subsystems it selects a c t as variably cooperative mechanisms which help the student to solve the problems they pose (by partially solving them on his behalf). Although, in a teaching system, there is a preprogrammed criterion of correct solution (which is available to the machine and is used to compute a learning rate), this is unnecessary. The correct solution to a problem, even if it exists, may be unknown. There must, of course, be rules and conditions; but, within the compass of these, the optimum procedure may be a matter for argument. At the next level, we introduce an L1 interaction. The student is provided with a “bank balance” of a commodity called money, the value of which depends upon his average success at problem solving. When A is instructed to select a new Srthe student is also asked t o select the new Er he prefers and his selection is an assertion in L1. The preference exhibited by the student is weighted according to the current value of his “bank balance” and is added to the corresponding selection probability computed by A to yield a compound vector. The outcome, or actual selection, depends upon this compound vector in such a way that the student’s degree of control over the system depends upon his success. 21 3
GORDON PASK
Proceeding further, we could invoke L2 assertions to modify goal selection, but in this case the preference axiom needs alteration.
5.5 Concluding Remarks Empirically, the close coupling between man and machine is adequately confirmed and there seems no reason why we should not regard the arrangement as one method for teaching a self-organizing system to be an artificial intelligence. At present, the problem environment is restricted, but the present model can probably be enlarged to comprehend any plausible universe of discourse. Development of the interaction correlates with a process whereby the self-organizing system becomes structured in the image of a man. I n order to initiate this process, certain constraints must be built into the self-organizing system. It can be argued that all this structure can be embedded, in the same fashion, as initial constraints or heuristics. Maybe it can. But to choose the latter alternative would be to neglect a lesson learned from the brain of any sentient organism, namely, that maturation is of the greatest importance. On the face of it, an artificial intelligence is more economically created by allowing it to evolve in the environment i t will later inhabit providing that we ensure its survival by building into it a set of basic and necessary capabilities. ACKNOWLEDGMENTS
I would like to t,hank Mr. B. N. Lewis for valuable discussions of the subject matter and Mr. J. Cowan for reading through the manuscript and pointing out certain omissions in t,he original draft. Also, I wish to acknowledge the support of the office of scientific research, O.A.R., through the European Office of Aerospace Research, U.S.A.F., under contract AF.61.052.640 for my own work upon learning systems.
Glossary Algorithm. An Algorithm is any well-defined sequence of operations that are applied to a given collection of entities or objects in order to yield, unambiguously, a specified rcsnlt. Tho cntit,ies concorned may constitute words in a vocabulary or signs in an alphabet (and, it can be argued that since the entities must be well dcfined they always can be identified with a vocabulary or alphabet). Markov [I121 speaks of normal algorithms. (Tn a normal algorithm the process represented by the string of well-defined operations is reduced to a set of elementary formulas-- its alphabet is finite and its termation is defined.) Markov conjectures that all algorithms can be normalized. This and the possibility of proving theorems about the existence or nonexistence of algorithms that solve a given class of problems are among several issues elegantly discussed by Curry 11931.
214
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
In the present discussion we use the term “algorithm” without necessarily implying a normal “algorithm” which is also the usage of Braines et al. [I651 and of Napalkov [16G]. Strictly, such algorithms correspond to “effective processes” in the sense of Church [ 194] .
Attribute. A unitary property abstracted from the form or behavior of a physical or symbolic system. Attributes may assume several descriptive values and these may be identified with variables, bearing the same name, and assuming numerical values. Confusion occasionally arises over the usage “value of an attribute” to mean, in fact, “value of the variable with which this attribute is identified.” To avoid this, notice that in a 2-valued logic, the attribute “Truth” has values only of “True” and “False” which may be denoted only by binary variables with values “1” and “0” or “T” and “F”. This is a matter of necessity and definition. But the attribute “roundness” may be viewed in several ways according to the conditions of measurement and the objectives of our enquiry. We may only choose to determine “round” and “not round” in which case this attribute can be identified with a binary variable. On the other hand we can, perfectly well, identify degrees of roundness, when this attribute may be identified with a many-valued variable. Hence the 2-valued case is more exactly viewed as a mapping which assigns to each value of the many valued variable exactly one value in the pair 1 , 0. Adopting this interpretation, roundness, the attribute, is associated with a definite procedure for measurement, which defines it. Failing this, roundness could mean differently measured things, according to the whim of the experimenter. Communication process. The process of conveying data from a transmitter to a receiver along a channel of communication which may be perturbed by irrelevant data or “noise.” The mathematical theory of communication abstracts from the commonplace interpretation of transmitter, receiver, and channel to yield precise and purely mathematical conceptions. As a result each relevant datum is associated with a value, its “selective information,” which measures the extent to which a receiver’s uncertainty regarding the state of a relevant system would be reduced if this datum were signaled by the transmitter and were perfectly received. (The receiver is, of course, assumed t o know the possiblestates of the relevant system.)The rule governing the process whereby the transmitter signals relevant data along the channel is called coding and is formally represented as an assignment of one or more signs to each collection of data. Irrelevant dataorextraneoussignsinjectedinto thechannelarecalled‘hoise.” It can be shown that whatever coding is adopted a certain limit is reached beyond which no more information can be conveyed along a given channel per sign or per interval, and this limit is called the channel “capacity.” This mathematical theory, due to Shannon [ 1 4 ] ,is descriptive in the sense that it refers, as Cherry points out [ 79] to an outside observer’s account of the communication process. Other formal communication models exist which are more broadly applicable but have less deductive possibilities (there is no analog for the capacity theorem), as in Harrah [195]. Computation. An operation carried out upon data in order to produce the values of specified features or functions of this data. The idea of a communication channel can be extended to the idea of a computation channel. But as Cowan
215
GORDON PASK
points out [I971 a computing channel will reduce the information potentially conveyed by sequences of input signs. Vinograd and Cowan [I981 provide a comprehensive account of computation by finite automata and networks of components.
Denotation. The assignment of a name or a sign to one or more physical objects or collections of signs. Frame of reference, A field of relevant physical or symbolic data denoted by terms in a deductively manipulable system of hypotheses which are associated with methods of proof and disproof. Thus a science, like classical physics, that is assoaiated with a well-defined hypothetico deductive framework (and rules for inference and for induotive argument and for empirical confirmation) is a frame of reference. But, as pointed out [16],so are many other systems. Information. Regarded as a measure, information is a value of data (such aa its selective information). However, a number of valuo functions can be assigned in different conditions and for different purposes. The system of Bar Hillel [I961 and Carnap, which evaluates propositions in terms of their degree of logical discrimination is considered by Cherry [ 7 9 ] . The measure due to Fisher in [I991 and [ Z O O ] , and the information measures used in science by Brillouin in [ Z O l ] . Although we do not use Fisher’s measure explicitly in the present discussion, a precise analysis of discursive informational statements about the value of statistical data would lead us to adopt this measure. Logical type. The idea of an hierarchy of logical types was introduced by Russell and Whitehead [ZOZ]to resolve various paradoxical situations (like theclass of all classes and the paradox of self-reference)which are due to the ambiguous usage of terms. Each class of elements in a logical system (propositions, propositional functions, and SO on) is assigned an hierarchical-type designation and any function of a given type is allowed only elements of some type preceding it in this hierarchy as members of its domain. The concept of a type hierarchy has been minimized in the present discussion, largely because of our rather pedantic conventions regarding hierarchies of metalanguages and our pedantic insistence upon a distinction between algorithms and heuristics (or hints about the class of algorithms to use). But the distinctions entailed by the type hierarchy are basic components in our arguments. (We may allow ambiguity in building an artificial intelligence program but wo must not fail to recognize that we have allowed it and could remove it by a distinction of logical type when saying what this program is meant to do). Maturation. The process whereby the brain of an embryo develops into the brain of an adult organism. It is essential to recall that the brain of an embryo is coupled to its external environment throughout riuch of this process; hence, maturation may be held to include normal “imprinting” of the kind that occurs in a duckling where, the first moving object with certain broad characteristics that is encountered within a short, physiologically determined, interval is subsequently recognized LM the duckling’s parent. Metalanguage, object language and language. The term “language” is discussed in detail by Curry [193]. A formal language consists of an alphabet of signs and certain syntactic rules for their concatenation and substitution. We have insisted
216
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION
upon identified formal languages in which the sequences of signs enjoy specified denotations. Our insistrnce upon this issiie parallels Gorn’s insistence upon the intensive as well as extensive definit,ion of the terms used in machine languages. Gorn points out [I361 that the intensive definition of a machine language embodies the control mechanism that mediatcs linguistic operations. (The extensive definition is, of course, the alphabet of signs and the set of strings or sequences of signs that can be legitimately produced by its manipulation.) A typical object language is the set of signs, together with the syntactic constraints that define a channel of communication (and we may say that communication along this channel takes place in terms of this object language). A typical metalanguage is the language in terms of which an external observer defines this communication channel.
Neurone. A cell, for present, purposos, in the central nervous system of an animal which is specialized for conveying and producing impulses of electrical or chemical activity that act assignals. The active components in the central nervous system appear to be neurones and glial cells. The part played by glial cells, once thought to have no functional significance, remains in considerable doubt. Certainly these cells are concerned in the metabolism and maintenance of neurones and they may also take part, in their data processing activity and in memory [158, 1591. Confining our attention to neurones, there are still many varieties and possibly they mediate a great many different functions. The classical picture is a cell with branching processes called dendrites and one main process, which may bifurcate terminally, called the axone. Nerve fibers are the axones of certain numerous but specialized neurones. The dendrites and cell body of a given neurone receive excitation from impulses propagated along axones that terminate in their vicinity and form synaptic connection. The coupling at a synapse involves chemical intermediaries (such as acetyl choline) and the incoming impulses of excitation undergo spatial and temporal summation. When the spatial or temporal sum of excitation exceeds some characteristic value called the threshold of the neurone (the threshold, in fact, is variable) a state of excitation is engendered. This is propagated along the axone of this neurone as an impulse. The required energy is obtained from a local ionic transfer mechanism which is maintained against a potential and concentration gradient by a slower mctabolic process. Propagation of an impulse serves to disrupt this instable ionic equilibrium. The rate at which impulses may be propagated is limited by the required recovery interval. In a cortical neurone a whole gamut of different recovery processes contribute to the so-called absolute refractory interval (that occurs after the neurone has been excited) within which no impulse sequence will stimulate it. (Later, it may only be stimulated by an atypical impulse sequence, and later still it returns to a normal state.) Recent work indicates a great deal of structure in the cell membrane and suggests that the summative picture of a synapsc! is a very crude account of the coupling mechanism. There is also some evidence that a neurone is, to some degree, an analog data processing system. According to any view, the neurone can be said to compute if either impulses or impulse rates are regarded as input signs since, in each case, its output depends upon the form of its input,. Reinforcement. A badly defined word used in psychology to denote at one extreme some event which is said to be pleasurable or rewarding and at the other extreme to denote any occurrence which leads to an increased conditional probability of response B given stimulus A if associated in some suitable fashion with
217
GORDON PASK
the given stimulus-response pair, A and B. Reinforcement is used chiefly in the latter sense, so far as machine adaptation is concerned although different mechanisms are involved in different systems. Retina, In physiology, the collection of light-sensitive elements in the eye of an organism. The word is used, by analogy, in pattern recognition to denote a collection of photoelectric cells on which a n input pattern is impressed. Sign. The name given to those invariant features of a givcn class of physical objects or shapes which arc used to denote either the class of shapes itself or some arbitrarily chosen class. Concatenations of signs may also serve 8s signs. Thus a word, as well as an alphabetic character, may be a sign.
Symbol. A sign and its denotation. Strategy. A set of actions or moves decided upon by one participant in a game, which he will adopt contigent upon all conditions and possible situations that may arise in this game. The term strategy is often used in connection with mechanical participants and a wider class of competitive and partly cooperative systems, as given by Luce and Raiffa [203] and by Howard [204]. Further, the decisions a t various stages in a strategy or between a set of alternatively possible strategies may be made by a chance device.
Synapse. A connection that establishes loose informational coupling between neurones.
REFERENCES 1. Scher, J. M., Theories of the Mind. The Free Press, New York, 1962. 2. Hook, S . , Dimensions of Mind. New York Univ. Press, New York, 1960. 3. von Foerster, H., and Pask, G., A prcdictive model for a self-organising system, Cyhernetica 4 (1960); 1 (1961). 4. von Foerster, H., On solf-organising systems and their environment, in Self-Organising Systems (M.C. Yovits and S. Cameron, e&.), Pergamon Press, New York, 1960. 5 . Bartlett, F., Thinking. Allen and Unwin, London, 1958. 6. Loefgren, L., Qualitative limits for automatic self-repair. Tech. Note N O N R 1834(21), Elec. Eng. Hes. Lab., Univ. of Illinois, Urbana, Illinois (1961). 7. Wiener, N., Cybernetics 2nd ed. Wilcy, Ncw York, 1962. 8. Wiener, N., Comments at Spring School of Theorctical Physics, Naples, 1962. To be published. 9. Beer, S., Towards a cybernetic factory, in Primiples of Self-Organisation (H. von Foerster and G. Zopf, eds.), Pergamon Press, I.onclon, 1961. 10. Mesarovic, M. D., On self-organinational syst)ems,in Self-Organising Systems -1902 (M. C. Yovits, C. T. Jacobi, and G. D. Goldstein, ods.), Spartan Press, Washington D.C., 1962. 11. Mesarovic, M. D., General systems, in Proc. 2nd IFAC Con.. Automatic Control, Basle, 1963. To be published. 12. Pun, L.,Aulonzatique. Association Suissc pour I’Automatique, 1963. 13. Bertalanffy, L. von, An outline of general systems theory, Brit. J . Phil. Sci. 1 (1950).
218
ARTIFICIAL INTELLIGENCE AND SELF-ORGANIZATION 14. Shannon, C. E., and Weaver, W. E., Mathematical Theory of Cornmunications. Univ. Illinois Press, Urbana, Illinois, 1949. 15. Caianiello, E. R., Outline of a theory of thought processes and thinking machines, J. Theoret. Riol. 2 (1961). 16. Pask, G., A n Introduction to Cybernetics. Hutchinsons, London, 1961. 17. Pask, G., Statistical computation and statistical automata. Proc. DAGK Conf. Cybernetics, Karlsruhe, 1963. 18. Ashby, W. Ross, Introduction to Cyhernetics. Chapman and Hall, London, 1957. 19. Loefgren, L., Tesselation models of sclf-repair. I n Biological Prototypes and Synthetic Systems (E. E. Bernard and M. K. Kare, cds.), Plenum Press, New York, 1962. 20. Rosen, R., The representation of biological systems from the standpoint of the theory of categories BUZZ.Math. Biophys. 20 (1958). 21. Ashby, W. Ross, Design f o r a Brain 2ndcd. Chapmanand Hall, London, 1960. 22. Haire, P. F., arid Harouless, G., Jenny: an improvcd homeostat, AFCRC TN 60-379 (April 1960). 23. Williams, R. E., Static and dynamic responses of the homeostat Jenny, AFCRC 505 (June 1961). 24. Chichianaze, C., and Charkviani, C., Thc problem of employing the adaptive syst)em for automation of processes, in Optimising and Adaptive Control (L. E. Bollinger, J. G. Truxal, and E. J. Minnar, eds.), Instr. SOC.Am., Pittsburgh, Pennsylvania, 1963. 25. Tarjan, R., Problems of stability in adaptive control systems, in Optimising and Adaptive Control (L. E. Bollinger, J. G. Truxal, and E. J. Minnar, eds.), Instr. SOC. Am., Pittsburgh, Pennsylvania, 1963. 26. Uttley, A. M., The theory of the mechanism of learning based on the computation of probabilities. Proc. 1st Congr. Intern. Assoc. Cybernetics, Namur, 1956. Gauthior-Villars, Paris, 1958. 27. Uttley, A. M., Conditional probability computing in the nervous system, in The Mechanisation of Thought Processes. H.M.S.O., London, 1959. 28. Uttley, A. M., The engineering approach to the problem of neural organisation, Progr. Biophys. and Biophys. Chem., 11, (1961). 29. Uttley, A. M., Conditional probability machines and conditioned reflexes, in Automata Studies ( C . E. Shannon and J . MacCarthy, eds.), Princeton Univ. Press, Princeton, New Jersey, 1956. 30. Steinbuch, K., and Frank, L., Nichtdigitale lernmatrizen als perzeptoren, Kybernietik 1, 3 (1961). 31. Katz, R. C., and Thomas, G. M., The development of a conditional probability computer for control applicat,ion, in Information Processing (R. Popplewell, ed.), North-Holland Pub. Co., Amsterdam, 1962. 32. Andrew, A, M., Learning machines, in The Mechanisation of Thought Processes, H.M.S.O., London, 1959. 33. Andrew, A. M., Self optimising control mechanisms and some principles for more advanced learning machines, in Communication Theory (C. Cherry, ed.), Butterworths, London, 1962. 34. Estes, W. K., Towards a statistical theory of learning, Psychot. Rev. 57 (1950). 35. Wattanabe, S., The learning process and the inverse H thcorem, I R E Trans. Inform. Theory 28 (1962).
219
GORDON PASK
36. Bush, R. R., and Mosteller, F., Stochastic Models f o r Learning. Wiley, New York, 1955. 37. Luce, D., Individual Choice Behaviour. Wiley, New York, 1959 38. Harman, L . D., Tho artificial neurone, Science 129 ( 1 9 5 9 ) 39. Taylor, W . K., The theory of cortical organisation and of learning, I R E T r a n s . I n f o r m . Theory 28 (1962). 40. Taylor, W . K., Pattern recognition by automatic analogue, Proc. I n s t . Elec. Enyrs. ( L o n d o n ) PL.13 (1959). 41. Novikoff, A., Integral geometry: an approach to the problem of abstraction, in Principles of Self-Orgunisation ( H . von Foerster and G. Zopf, eds.), Pergamon Press, New York. 1962, 42. Vapnik, V. N . , and Lerner, A. Ya, Recognition of patterns with the help of portraits, A u t o m . Telemecanica 24, 6 (1963). 43. von Foerster, H., Biologic, in Biological Prototypes and Synthetic Systems ( E . E . Bernard and M. R. Kare, eds.), Plenum Press, Ncw York, 1962. 44. Inselberg, A., and Von Foerster, H., The principlcs of pre-organisation. Tech. Rept., Contract N O N R 1834(21), Elec. Eng. Res. Lab., Univ. of Illinois, Urbana, Illinois (1961). 45. Dersch, W . C., A decision logic for speech recognition, Bionics 1, WADD (1960). 46. Aizerman, M . A., Automatic control and learning systems, Proc. 2nd I F A C Conf. Automatic Control, B a d e , 1963. To be published. 47. McCulloch, W. S., and Pitts, W., The logical calculus of the ideas immanent in nervous activity, B u l l . M a t h . Biophys. 9 (1947). 48. McCulloch, W. S., and Pitts, W., How we know universals, B u l l . M a t h . Biophys. 9 (1947). 49. Widrow, B., Generalization and information storage in a network of adaline ncurones, in Self-Organising Systems-1962 (M. C. Yovits, G. T. Jacobi, and G. D. Goldstein, eds.), Spartan Press, Washington, D.C., 1962. 50. Willis, G . D., The functional domain of complex systems, in PrincipZes of Self-Orgunisution (H. von Foerster and G. Zopf, eds.), Pergamon Press, New York, 1962. 51. Willis, G. D., Plastic neurones as sensory elements, Lockheed Report LNSD 48432 (1959). 52. Rosenblatt, F., Principles of Neurodynamics. Spartan Press, Washington, D.C., 1962. 53. Rosenblatt, F., Theorems of statistical separability, in T h e M e c h a n i m t i o n of Thought Processes. H.M.S.O., London, 1959. 54. von Foerster, H., The circuitry of clues to Platonic ideation, in Aspects of the Theory of Artificial Intelligence (C. A. Muses, ed.), Plenum Press, New York, 1962. 55. Uttley, A. M., The design of conditional probability computers, I n f o r m . and Control 2 (1959). 56. Maron, H . E., Artificial intelligence and brain mechanisms, Mem. RM 3522 PR, Rand Corp. (1963). 57. Maron, H . E., The design principles for an intelligent machine, IRE T r a n s . I n f o r m . Theory 28 (1962). 68. Crane, H . D., Neurister studies, Tech. Rept. 1506-2, Stanford Elec. Lab. (1960).
220
ARTIFICIAL lNTELLlCENCE AND SELF-ORGANIZATION
59. Crane, H. D., The neurister, in Principles of Self-Organisation (H. von Foerster and G. Zopf, eds.), Pergamon Press, New York, 1962. 60. Pask, G., Physical analogues to the growth of a concept, in T h e Mechanisation of Thought Processes. H.M.S.O., London, 1959. 61. Pask, G., The growth process in a cybernetic machine. Proc. 2nd Conj. Intern. Assoc. Cybernetics, N a m u r , 1958. Gauthier-Villars, Paris, 1960. 62. Pask, G., The natural history of networks, in Self-Organising Systems (M. C. Yovits and S. Cameron, eds.), Pergamon Press, New York, 1960. 63. MacKay, D. M., and Ainsworth, A,, Electrolytic growth processes, Proc. DAGK Conf. Cybernetics, Karlsruhe, 1963. 64. MacKay, D. M., Self-organisation in the time domain, in Self-Organising (M. C. Yovits, G. T. Jacobi, and G. D. Goldstein, eds.), Systems-1962 Spartan Press, Washington, D.C., 1962. 65. Alfieri, R., L e nerj arti,ficiel de Lillie sous l'angle cybernetique. Intern. SOC. Cybernetic Med., Naples, 1960. 66. Stewart, R. M., Fields and waves in excitable cellular structures, 1st. Passadena, Cal., S y m p . on Self-Organizing Systems, 1963. 67. Bowman, R. A., Transmission linc leading t o self-organising systems, in Principles of Self-Orgunisation (H.von Foerster and G. Zopf, eds.), Pergamon Press, London, 1962. 68. Beurle, R. L., Storage manipulation of information in the brain, J . Inst. Elec. Eng. [NS] 5 (1959). 69. Beurle, R. L., Properties of a mass of cells capable of regenerating impulses. Phil. Trans. Roy. Soc. Ser. B 240 (1956). 70. Farley, B., Aspects of behaviour in neurone network model, 3rd Bionics Symp., Dayton, Ohio, 1963. To be published. 7 1 . Farley, B. G., and Clarke, W. A., Activity in networks of neuron-like elements, in Information Theory (C. Cherry, ed.), Butterworths, London, 1961. 72. Babcock, M., An adaptive reorganising automaton, Tech. Rept. N O N R 1834(21), Elec. Eng. Res. Lab., Univ. of Illinois, Urbana, Illinois (1961.). 73. Papert, S., Redundancy in linear logical nets, in 1st Bionics Symposium, WADD Tech. Rept. 60-600 (1960). 74. Cameron, S., An estimate of the complexity requisite in a universal decision network, in 1st Bionics Symposium, WADD Tech. Rept. 60-600 (1960). 75. Singleton, J. A., A test for linear seperability applied to self-organising machines, in Self Organising Systems -1962 (M. C. Yovits, G. T. Jacobi, and G. D. Goldstein, eds.), Spartan Press, Washington, D.C., 1962. 76. Greene, P. H., Computers that perceive, learn and reason, General Systems Yearbook Vol. iv, 1960. 77. Greene, P. H., On the representation of information by neural net models, (M. C. Yovits, G. T. Jacobi, and G. D. in Self-Organising Systems-1962 Goldstein, eds.), Spartan Press, Washington, D.C., 1962. 78. MacKay, D. M., The epistemological problems of automata, in Automata Studies (C. E. Shannon and J. MacCarthy, eds.), Princcton Univ. Press, Princeton, New Jersey, 1956. 79. Cherry, C., O n Human'Communication. Wiley, New York, 1957. 80. Foulkes, J. D., A class of machines which determine the statistical structure of a sequence of characters, I'roc. I R E Western Joint Comp. Conf. P t . 4 , 1959. 81. Gabor, D. A., universal non-linear filter predictor and simulation which optimises itself by a learning process, I R E Trans. 108 B (1961).
221
GORDON PASK
82. Mackay, D. M., The informational analysis of questions and coinmnnds, Communication Theory (C. Cherry, ed.), Buttcrworths, London, 1962. 83. Andrea, J. H., Stella: a scheme for a learning machine, Proc. 2nd I F A C Conf. Automutic Control, Basel, 1963. To be published. 84. Pask, G., A model for concept learning, 10th Intern. Sci. Congr. Electron. To be published. 85. Ashby, W. R., A self-reproducing system, in Aspects of the Theory of Artificial Intelligence (C. A. Miises, ed.), Plenum Press, New York, 1962. 86. Rosen, R., A logical paradox implicit in the notion of a self-reproducing automaton, B d l , Math. Biophys. 21 (1959). 87. Rashevsky, N., “Mat,hemat~icalBiophysics.” Dover, New York, 1960. 88. von Neumann, J. Unpublished works. 89. Burke, A. W., Computat,ion, behaviour and structure, in PrincipEes of Self-Orgunising Systems (M. C. Yovits and S. Camcron, eds.), Pergamon Press, New York, 1960. 90. Toda, M., The design of a fungus eater, Behavioural S c i . 7 (1962). 91. Walter, W. G., T h e Living Brain. Duckworth, London, 1953. 92. Barricelli, N., Numerical testing of evolution theories, Acta Biotheoret. 1 and 2 (1963). 93. Goldacre, J ., Morphogcnesis and communication, Yroc. 2r~dConf. Intern. As,~oc.Cybernetics, Nnmur, 1958. Gauthier-Villars. Paris, 1960. 94. Pask, G. The cybernetics of evolutionary systems and of self-organising systems. Conference Gcnerale in Proc. 2nd Congr. Ir~tcrn.Assoc. Cybernetics, Nnmur, 1961. 9.5. Pask, G., A proposed evolut>ionarymodel, in Princ,iples of Self-Organisntion (H. von Focrst,cr and G. Zopf, eds.), Pergamon Press, Ncw York, 1962. 96. Pask, G., The sirriulation of learning and decision-making behaviour, in Aspects of the Theory of ArtificiaE IntelEigence (C. A . Muses, ed.), Plenum Press, New York, 1962. 97. Pringle, .J. W. S., On the parallel between learning and c!volut,ion. Behnviour 3 (1951). 98. MacKay, D. M., Operational aspccts of intellect, in l ‘ k e Mechanisution of Thought Processes. H.M.S.O., London, 1959. 99. Ashby, W. R., Design for an intelligence amplifier, in Automatu Studies (C. E. Shannon and J. MacCarthy, eds.), Princeton Univ. Prcss, Princeton, New Jersey, 1956. 100. Kochen, M., Experimental study of hypothesis formation by a computer, in Communication Theory (C. Cherry, etl.), Rutterworths, T,ondon, 1962. 101. Polya, J., H o w to Solve I t . 1’rincet)on Univ. Press, Princeton, New Jersey, 1945. 102. Rruner, J. S., Goodnow, J. J., and Austin, G. A.. A Study of Thinking, Wiley, New York, 1956. 103. Newall, A., Shaw, J . C . , and Simon, H. A., The logic theory machine, I R E Trans. Irlform. Theory, 3 (1956). 104. Newall, A . , Intelligent, learning in a general problem solver, in Self-Orgunising Systems (M. C . Yovits and S. Cameron, eds.), Pergamon Prcss, New York, 1960. 105. Minsky, M., Steps t,owards art,ificial intelligence, Proc. I . R . E . 49, 8-30 (1961). 106. Minsky, M., and Selfridge, 0. D., Random nets, in Communication Theory (C. Cherry, ed.), Butterworths, London, 1962.
222
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
107. Travis, L., Observing how humans make mistakes to discover how computers may do likewise. System Dcvel. Corp. Rept. S P 776 (1962). 108. Marzocco, F., System Devel. Corp. Summary Repts. (1962). 109. Minsky, M. et al., Symposium on artificial intelligence, in Information ProcessijLg (R. Popplewell, ed.), North-Holland Publ. Co., Amsterdam, 1962. 110. Taylor, M’. K., A pattern recognising adaptive controller, Proc. 2nd I F A C Conf. Automatic Control. To be published. 1 1 1 . Miller, G. A., Galanter, E., and Pribram, K., Plans, and the Structure of Behaviour. Henry Holt, New York, 1960. 112. Markov, A. A., The Theor!]of Algorithms. Moscow Academy of Sciences, 1954. 113. Hunt, E. B., Concept Learning. Wiley, New York, 1962. 114. Selfridge, 0. D., Pandemonium, a paradigm for learning, in T h e Mechanisation of Thought Processes. H.M.S.O., London, 1959. 115. Newell, A., Shaw, J. C., and Simon, H. A., Elements of a theory of human problem solving, Psychol. Rev. 65 (1958). 116. Newell, A., A report on the general problem solving programmc, in Proc. 1st Intern. Conf. Inform. Theory, UNESCO, Paris, 1959. 117. Newell, A , , Some problems of basic organisation, in Self-Organising Systems -1962 (M. C. Yovits, G. T. Jacobi, and G. D. Goldstein, eds.), Spartan Press, Washington, D.C., 1962. 118. Newell, A,, Learning and problem solving, in Information Processing (R. Popplewell, ed.), North-Holland Publ. Co., Amsterdam, 1962. 119. Mittelstaedt, H., Control systems of orientation in insects, Ann. Rev. Entomol. 7 (1962). 120. MacKay, D. M., Models of space perception, in Aspects of the Theory of Artificial Intelligence (C. A. Muses, ed.), Plenum Press, New York, 1962. 121. Fiegenbaum, T., and Simon, H. A., Elementary perceiving and memorising machine, in Information Processing (R. Popplewell, ed.), North-Holland Publ. Co., Amsterdam, 1962. 122. Fiegenbaum, T., The simulation of verbal learning, Proc. I R E Western Joint Comp. Conf., 1961. 123. Amarel, S., The automatic formation of a computer programme that rep(M. C. Yovits, G. T. resents a theory, in Self-Organising Systems-1962 Jacobi, and G. D. Goldstein, eds.), Spartan Press, D.C., Washington, 1962. 124. Amarel, S., An approach to automatic theory formation, in Principles of Self-Organisation (H. von Foerster and G. Zopf, eds.), Pergamon Press, New York, 1962. 125. Solomonoff, R . , Research in inductive inference, Zator Corp. Rept. ZTB, Contract AF 638 376 (1961). 126. Ullman, L. El., A cybernetic model that learns a sensory connection, in Medical Electronics and Biological Engineering, Vol. 1, No. 1, 1963. 127. Banerji, R . C., An information processing programme, General Systems Yearbook Vol. v., 1962. 128. Vossler, C., and Uhr, L., A pattern recognition programme that generates, evaluates, or adjusts its own operations, Proc. I R E Western Joint Comp. Conf., 1961. 129. Agalides, G. E., The cybernetics of the brain, IEEE Cybernetics Meeting, Detroit, 1963. To be published. 130. Churchman, H. W., Enquiring systems, System Devel. Corp. Rept. SP 877 (1962).
223
GORDON PASK
131. Petri, C. A., Fundamentals of a theory of asynchronous information flow, in Information Processing (R. Popplewell, cd.), North-Holland Publ. Co., Amsterdam, 1962. 132. Gunther, G., Cybernetic ontology, in Self-Organising Systems-1962 (M. C. Yovits, G. T. Jacobi, and G. D. Goldstein, eds.), Spartan Press, Washington, D.C., 1962. 133. Fogel, L. A., Towards inductive inference automata, in Information Processing (R. Popplewell, ed.), North-Holland Publ. Co., Amsterdam, 1962. 134. George, F. H., Pragmatic machines, Proc. DAGK Conf. Cybernetics Karlesruhe, 1963. 135. George, F. H., The Brwivr. as a Computer. Pergamon Press, New York,'1961. 136. Gorn, S.,The treatmcnt of ambiguity and paradox in rncchanical languages, Proc. A m . S y m p . Pure Math. To be published. 137. Hovland, C. I., A communication analysis of concept learning, Psychol. Rev., 59,1961. 138. Hovland, C. I., and Hunt, E. B., Computer simulations of concept attainment, Behavioural Sci. 5, (1961). 139. Flavell, J. H., The Developmental Psychology of Jean Piaget. Van Nostrand, Princeton, New Jersey, 1963. 140. Piaget, -T., The Construction of Reality in the Child. Basic Books, New York. 1954. 141. Bartlett, P.,tlememberin,g. Cambridge Univ. Press, London and New York, 1933. 142. Craik, K. J. W., The Nature of Explanation. Cambridge Univ. Press, London and Ncw York, 1943. 143. Tinbcrgen, N., A Study of Instinct. Oxford Univ. Press, London and Now York, 1951. 144. Lorenz, K. Z., King Solomon's Ring. Crowell, New York, 1952. 145. Hull, C. L., A Behaviour System. Appleton-Century Crofts, New York, 1952. 146. Hobb, D. O., The Organisation of Behaviour. Wiley, New York, 1949. 147. Broadbent, D. E., Perception and Communication. Pergamon Press, New York, 1957. 148. Brown, J., Information, redundancy, and decay of the memory trace, in Mechanisation of Thought Processes. H.M.S.O., London, 1959. 149. Gregory, R . L., Models and localization of function in the central nervous system, in The Mechanisation of Thought Processes. H.M.S.O., London, 1959. 150. Mackworth, N. H., fiesearches in the measurement of human performance, Med. Res. Council Spec. Rept. No. 268, (1950). 151. Milner, P. M., The cell assembly, Psychol. Rew. 64 (1957). 152. Barbizet, J., and Albarde, P., Memoires humaines et memoires artificielles, Concours Med. 6 (1961). 153. Welford, A. T. Aghg and H u m a n Skill. Oxford Univ. Press, London and New York, 1958. 154. Ferster, C. B., and Skinner, B. F., Schedules of Reinforcement. AppletonCentury-Crofts, New York, 1957. 155. Skinner, B. F., Teaching machines, Scientijic American (1961). 156. Harlow, H. F., Learning set and error factor theory, in Psychology: A S t u d y of a Science, Study 1, Vol. 2, (S. Koch, ed.), McGraw-Hill, New York, 1959. 157. Bruner, J. S., Studies in Cognition. Prentice-Hall, Englewood Cliffs, New Jersey, 1959.
224
ARTIFICIAL INTELLIGENCE A N D SELF-ORGANIZATION
158. Hyden, H., in The Cell (J.Brachet and A. E. Mirsky, eds.), Vol. 4, Academic Press, New York, 1960. 159. Galambos, I. Commcnts in Proc. Leiden Symp. Inform. Processing, 1962. 160. Anokhin, P. K., Comments on integration in the central nervous system, in Proc. Leiden S y m p . Inform. Processing, 1962. To be published. 161. Magoun, H. W., Non-specific brain mechanisms, in Biological and Biochemical Causes of Behaviour (H. F. Harlow and C. N. Woolsey, eds.), Univ. of Wisconsin Press, Madison, Wisconsin, 1958. 162. Jasper, H. H., Reticular-cortical systems and theories of the integrative action of the brain, in Biological and Biochemical Causes of Behaviour (H. F. Harlow and C. N . Woolsey, eds.), Univ. Wisconsin Press, Madison, Wisconsin, 1958. 163. McCulloch, W. S., and Kilmuir, J., 3rd Bionics Conf., Dayton, Ohio, 1963. To be published. 164. Braines, I., and Sechvinsky, V., Matrix structures and the simulation of learning, I R E S y m p . Inform. Theory, Brussels, 1962. To be published. 165. Braines, I., Napalkov, A., and Sechvinsky, V., Problems of Cybernetics, D.S.I.R. translation, London, 1961. 166. Napalkov, A., The organisation of reflex systems, Proc. 3rd Conf. Intern. Assoc. Cybernetics, Namur, 1961. To be published. 167. Bishop, J., Environmental feedback in brain functions, in Self-Organising Systems (M. C. Yovits and S. Cameron, eds.),Pergamon Press, New York, 1960. 168, Pavlov, I. P., Conditioned Rejlexes. Oxford Univ. Press, London and New York, 1927. 169. Barlow, B., and Donaldson, P., Sensory mechanisms: the reduction of redundancy and intelligence, in The Mechanisation of Thought Processes. H.M.S.O., London, 1959. 170. Agalides, J. E., Communication and information theory aspects of the nervous system, Tech. Status Rept., General Dynaniics Corp., 1963. 1 7 1 . Lettvin, J. Y., Matturana, H. R., and McCulloch, W. S., What the frog’s eye tells the frog’s brain, I R E Trans. 47 (1959). 172. Hubel, H. D., and Wiesel, T. N., Receptive fields of single neurones in the cat striate cortex, J . Physiol. (London) 148 (1959). 173. Reichardt, W. von, and Ginitie, G. M., Zur theorie der lateralen inhibition, Kybernetik 1, 4 (1962). 174. Braighknberg, V., Some models of the cerebral cortex, Proc. 10th Intern. Congr. Electron., Rome, 1963. To be published. 175. Schade, .J. P., The structural organisation of the human cerebral cortex, Acta Anat. 47 (1961). 176. Scholl, D. A., The Organisation ofthe Cerebral Cortex. Wiley, New York, 1956. 177. Gerrard, R . W., S y m p . on Computing in the Nervous System, Leiden, 1962. 178 Young, J. Z., Some essentials of neural memory systems, Proc. 10th Intern. Congr. Electron., Rome, 1963. To be published. 179. Pask, G., Conception of a shape and the evolution of a design, in Conference on design Methods (J. C . Jones and D. G. Thornley, eds.), Pergamon Press, New York, 1963. 180. Lumsdaine, A. A., and Glaser, R. (eds.), Teaching Machines and Programmed Instruction, Natl. Ed. Assoc. Am., Washington, D.C., 1960. 181. Coulson, .J. E. (cd.), Programmed Learning and Computer-Based Iizstruction. Wiley, New York, 1962.
225
GORDON PASK
182. Gardner, M., Logic Machines and Diagrams, McCraw-Hill, Ncw York, 1962. 183. Edwards, W., Prohabilistic information processing in command and control systems, Tech. Rcpt. AD 3789-12-T, Eng. l’sychol. Lab., Univ. of Michigan, Ann Arbor, Michigan, 1963. 184. Johnson, D. L., and Kohlcr, A. L., Man-computer interface study, Tech. Kept., Dept. Elec. Eng., Univ. of Washington, Seattle, WEtshington, 1962. 185. Pask, G., A model of learning wit)hin systems stabilised by an adaptive teaching machine, Tech. Note No. I, USAF Contract A F 61(052)-402(1963). 186. Pask, G., The logic and behaviour of self-organising systems as illustrated by the interaction hctwccn men and adaptive machines, I.S.I.T., Brussels, 1962. 187. F’ask, (+., Self-organising systoms involvcd in hnman learning and performancr, I‘roc. 3rd Bioriics Co)if. Dnyton, Ohio. To be published. 188. Lewis, B. N., The rationalo of adaptivc teaching machines, in Mechanisation in the Classroom, (M. Goltlsmith, cd.), Souvenir Press, London, 1963. 189. Lewis, B. N., and Paslt, C . , ‘I’he theory arid practicc of adaptive teaching systems, in !l’eoc.liiri:/ Machines and l’rogramned Itistruction: Datu and Directioris (R. Glaser, ed.), To be published. 190. l’ask, G., and Lttwis, B. N., An adaptive automaton for teaching small groups, Perceptual arid Motor Skills 14 (1962). 191. Lewis, B. N., Communication in problcrn-solving groups, in Conference on Design illethods ( J . C. Jones and D. G. Thornley, ed.), Pergamon Press, New York, 1963. 192. Pask, G., Interaction bct,wocn a group of suhject,a and an adaptive automaton to prochicc a self-organising systcm for decision-making, in Self-Oryanising Syslems-19U2 (M. C. Yovits, C . T. Jttcobi, and (2. D. Goldstein, cds.), Spartan Press, Washington, D.C., 1962. 193. Curry, H. B., Foundations of Malhematicul Logic, McGraw-Hill, New York, 1963. 194. Church. A., Introduction fo Matherrintical Logic, Princeton Univ. Press, Princeton, New Jersey, 1956. 195. Harrah, D., Conamunication: A Logical Illodel, M.I.T. Monograph. Wiley, New York, 1963. 196. Bar-Hillel, Y., Semantic information and its mcasrircs, in Circular Causal arid Feedback Alecha,riisms i n Bioloqicccl a r i d Social Systerns (H. von Foorster, etl.), Josiah Macy Foundation, Princeton Univ. Press, 1955. 197. Cowan, J. D., Toward a proper logic for parallel computation in the presence of noise, Bioriics 1, WADD (1960). 198. Vinograd, C., and Cowan, .J. D., Reliahlc computation in the presence of noisc. To be publishccl. 199. Fisher, K. S., Design of E’xperirnents. Oliver & Boyd, London, 1949 200. Fisher, 1%. S., Statistical Methods for Research Workers. Oliver & Boyd, London, 1949. 201. Brillonn, L., Science atid Inj’ornzation Theory, 2nd ed. Acadeniic Press, New York, 1962. 202. Russell, B., arid Whitehead, A. N., Principia Mathematica, Cambridge Univ. Prcss, London and Ncw York, 1927. 203. Luce, R. D., and Raiffa, H., Games c4iid Decisions. IViley, New Yorli, 1956. 204. Howard, H. A . , Uyr~urtlic I’royruniming and Markov Processes, M.T.T. Monograph. Wiley, New York, 1960.
226
Automatic Optical Design ORESTES N. STAVROUDIS Notionol Bureou of Stcrndords Washington.
D.C.
1. Introduction 2. Ray Tracing . 2.1 Description . 2.2 Requirements . 2.3 Rotationally Synimetric Systems 2.4 Ray Tracing with the High-speed Computcr. . 3. Classical Methods of Lens Design 3.1 Groundwork and Terminology . 3.2 Beginning the Design . 3.3 Thin Lenses . 3.4 Lens Bending . 3.5 The Thick Lens 3.6 Third-Order Design . 3.7 The Final Frocess . . 4. The Computer Applied to Lens Design 4.1 Background . 4.2 Spot Diagrams . 4.3 James G. Baker. . 4.4 Gordon Black . . 4.5 Donald P. Fedcr . 4.6 Procedures of And& Girard and C. G. Wynnc 4.7 Joseph Meiron . 4.8 Robert E. Hopkins . 5. Conclusion . Acknowledgments . Refcrences .
. . . . . . . . . .
.
. . . . . . . . . .
. .
. . .
227 231 231 231 232 232 233 233 233 233 234 234 234 237 238 238 238 240 242 243 245 247 249 250 252 252
1. Introduction
An optical system is a device which transmits light. Light enters an aperture a t one end and exits from an aperture a t the opposite end; occasionally there may be more than one entrance aperture and more than one exit aperture. Usually the function of an optical system is to 227
ORESTES
N. STAVROUDIS
alter the light emanating from some pattern (called the object) so that a duplicate of the object (called the image) is formed by the light exiting from the system. Such an image may be observed directly by the eye, or i t may be observed indirectly after it has fallen on a screen, or it may be recorded by being made t o fall on a photographic film. An optical system consists of pieces of transmitting or reflecting elements arranged in such a way that the image-forming process takes place. The individual reflecting elements are called mirrors; the individual refracting elements are called lenses. This terminology is abused by applying the term Eens to some optical systems. Often this abuse is mitigated by calling lenses lens elements and by using the term compound lens to mean optical system. I n accordance with accepted terminology we will use these terms interchangeably in this paper, allowing the words to mean what we want them t o mean a t the time we use them. This ambiguity appears to apply to many other fields, including politics, as well as to optics. No apology is necessary, nor is one intended. The quality of a lens depends on the fidelity with which an object pattern is reproduced. No perfect lens exists; indeed no perfect lens can exist, Each optical system is designed for a limited range of functions, and therefore any estimate of its quality must be made in the context of its intended application. The job of a lens designer is to arrange the lens elements and mirrors to form a lens of sufficient quality to satisfy requirements specified in advance, The work is almost entirely numerical and consists in finding values of those quantities which specify an optical system, called design parameters, which when used in certain formulas indicate that the lens design is satisfactory. A lens is indicated in Fig. 1 where the various
FIG.I. Illustrating the design parameters of an optical system. Shown is the profile of a n optical system consisting of two lens elements and a diaphragm or stop. The design parameters are: A, stop aperture; B,stop position; C , clear aperture of a lcns element; D, outside diameter of a lens element; T, thickness of a lens element; S, separation between lcns elements. Not indicatcd are curvature of the surfaces of the lens elements and tho index of refruction of the glass in a lens element.
228
AUTOMATIC OPTICAL DESIGN
design parameters are illustrated. The most important of these in the design process are the separation, the distance between the individual lenses along the optical axis; thickness or central thickness, the distance along the optical axis between the two surfaces of a lens; the stop position, the location of the diaphragm; the stop opening, the diameter of the aperture of the diaphragm; the index of refraction of the glass comprising the lens element; and, in the case of lens elements with spherical sur€aces, either the radius or its reciprocal, the curvature, of each surface. If a lens element has one or more aspheric surfaces their shapes must of course be specified by other means. Less a part of the design process, but vital in preparing a design for manufacture are the following: edge thickness, the thickness of the lens element a t its edge; clear aperture, the diameter of that part of the lens element that is allowed to transmit light; and the diameter of the lens element . Almost since their inception automatic computers have been applied to problems in optical design. Their use has exerted a considerable influence on the techniques of lens design in that computational procedures hitherto much too complicated for practical application are now commonplace. I n addition, there has been an evident over-all improvement in the quality of lenses designed during the last ten years that can be attributed, at least in part, to the use of high-speed computers. Most ambitious of all proposed design routines is that for fully automatic lens design, which is conceived as a program which will lead to a complete design without the intervention of the operator, in which the input would consist only of the design specifications and perhaps, as a starting point, a preliminary design or even an arbitrary set of glasses, curvatures, thicknesses, and separations. Ideally, the input would consist only of the desired characteristics, leaving the task of deciding the number of lens elements, their refractive indices, and their over-all configuration to the computer. Such a fully automatic program, capable of producing a finished lens design without the intervention of a human agent, remains, to many workers in the field, a definite possibility, which in due time will become a reality. To many others, including perhaps those lens designers of greatest renown, such a program is but a tantalizing will-o’-the-wisp. To them the vital ingredient in a successful lens design program is the experience, the intuition, and the artistry that only the human presence is capable of providing. To those subscribing t o this point of view the machine is ultimately only a tool in the hands of the lens designer, its eficacy depending only on the skill of its user. I n this more conservative 229
ORESTES N. STAVROUDIS
approach to the application of computers to lens design, so far a much more successful one, one usually finds the classical methods of lens design and the intuition of the designer playing the dominant role. The difficulty of the design problem can not be underestimated. Ten years ago the question that haunted many workers in the field of automatic design wits whether a true optimum solution could be obtained. Could the absolute minimum of some numerical criterion be obtained or would the program hang up a t a local minimum? This question has not yet been answered. Today most workers would be elated if they could produce a design which was demonstrably a t a local minimum. The ultimate test of the validity of any lens design program is whether the performance of the finished optical system meets the prescribed specifications. Thus, a>nysystem of judging the quality of an optical system must correlate with a set of factors obtained from physical tests made on the finished lens. However, the problem of lens design, as it is now stated, is entirely within the domain of geometrical optics. Accordingly it is possible (albeit unlikely) that a lens design may fulfill all the requirements imposed by t’he designer and yet the lens itself perform in a manner other than anticipated, due to defects intrinsic in the assumptions of geometrical optics. In the past, exccpting some astronomical telescopes, this has been no grcitt problem. I n the vast majority of lenses, diffraction effects, unaccounted for in geometric optics, have been dominated by the residual geometric aberrations, which were frequently several orders of magnitude larger. In recent years, however, due a t least in part to the use of high-speed computers, diffraction effects have become more noticeable as the residual geometric aberrations were brought to lower values, The problem of controlling the quality of the “diffractionlimited” lens must now be given attention. Another aspect of the problem of evaluating optical systems is the method used in testing lenses; whether, for example, it is best to use the classical resolving power approach in assigning a numerical value to the image quality of a lens, or to apply the newer frequency response techniques to obtain a contrast transfer function. Can diffraction effects and the spreading of the image in a photographic emulsion be incorporated in the design problem? Indeed, is it possible for a n estimate of the cost of manufacture to be included as a, datum in the design criteria? Such questions arc well beyond the scope of this discussion. SuiEce it to say that factors of this sort must be considered in specifying design criteria which are to be a t all realistic.
230
AUTOMATIC OPTICAL DESIGN
2. Ray Tracing 2.1 Description
The ultimate component of any lens design program is a means for tracing rays. Conceptually, ray tracing is very simple. One starts a t an object point which may be either a t a finite distance from the first surface or a t infinity. A ray from this point is specified by selecting a direction, and the point of intersection a t the first surface is calculated. This is the transfer operation. Next, at the point of intersection with the first surface, the angle of incidence i is determined. Using Snell’s law, N sin i = N ‘ sin i’, the angle of refraction i’ is found. Here N and N ’ are the indices of refraction of the media on either side of the refracting surface. This is called the refraction operation, The transfer and refraction operations are repeated alternately in each medium and a t each interface until the ray so calculated emerges in image space. Ray tracing is illustrated in Fig. 2 .
FIG.2. Illustrating rays traced through a lens. Three rays from a n infinite object point 30” off the opt,ic axis ont,cr tjhc lcns at the extreme left,.The upper and lower rays are called marginal since they just clear the edges of aperturcs in the lens. The marginal rays represent the boundary of the bundle of rays transmitted by this lens from this ohject point. The central ray, called the principal or chief ray, is defined as the center of this bundle. The rays shown here are all meridian rays; they lie in a plane determined by the ohject point and the optic axis, which here coincides with the plane of the paper. Skew rays are those rays which do not lie on this plane.
2.2 Requirements
Although the idea of ray tracing is exceedingly simple, the computational processes involved are annoyingly complicated. I n designing a calculating scheme, particularly for use with a high-speed computer, certain requirements must be met. Those given over thirty years ago by T. Smith [ I ] (quoted by Weinstein [ S ] ) are certainly valid today. “In the first place, all points of reference must be a t a finite distance 231
ORESTES
N. STAVROUDIS
from the portions of the surfaces operative in producing refraction; thus, no reference is possible to the center of curvature or to the point of intersection of a ray with the axis, since either may be a t infinity, Again, the radius of curvature may not be used, for this may become infinite, on the other hand its reciprocal, the curvature, may be employed, since it is always finite or zero. More generally, no lengths measured along the axis may be used if high accuracy is desired, because these are so variable in magnitude. Transverse distances, on the other hand, vary within limits fixed by the apertures of the various lenses, and their use will tend to give uniform reliability a t all surfaces. Lastly, if the formulas involve fractions, the denominators must in all cases be essentially constant in sign and large in magnitude.” 2.3 Rotationally Symmetric Systems
The majority of lenses are rotationally symmetric; they are invariant, structurally, with respect to rotation about a line which is termed the optic axis. Exceptions are anamorphotic lenses, such as wide screen camera and projection lenses, and folded systems, such as binoculars and astronomical reflecting telescopes. I n rotationally symmetric systems, rays fall naturally into two categories. These are meridian rays,rays lying in a plane determined by the optic axis and the object point, and skew rays,rays which are not meridional. Meridian rays are much easier to compute. It has been said, perhaps apocryphally, that in the 1920’s many optical firms found it more economical to design lenses using only meridian rays, basing any further design alterations on the performance of a prototype lens fabricated from the design, 2.4 Ray Tracing with the High-speed Computer
With the advent of digital computers it was soon recognized that separate sets of formulas for meridian and skew rays were not only unnecessary but undesirable. Because of the high calculating speed, the difference in time required to trace a skew ray and a meridian ray is too small to warrant the use of the space required to store a second ray-tracing program. Therefore, except for some very special cases, the meridian ray as a coniputational entity has been discarded. A linear approximation to the ray tracing equations, indispensable during the early stages of lens design, is called a paraxial ray (see Section 3.6.2). At some installations, paraxial ray tracing subroutines have been eliminated. Instead paraxial data are obtained by tracing a, my very close to the axis.
232
AUTOMATIC OPTICAL DESIGN
3. Classical Methods of Lens Design
The classical approach to lens design has been described as more artistic than technical. Like artists, lens designers tend to be scornful of attempts to reduce their craft t o mere words. Just as the creator of a painting can be identified by his brushwork, so each lens designer has his own characteristics and peculiarities. 3.1 Groundwork and Terminology
The lens designer begins his work with a set of specifications, a list of properties required of the finished product. These properties are invariably stated as numerical values for the focal length, the focal ratio or f number, and either format size or angular field. I n addition, there are usually requirements in image quality, distortion, and uniformity of illumination over the field. I n addition t o these explicit specifications there are requirements which are implicit, though no less imperative. No glass thickness or separation can be negative; in fact, glass thicknesses less than 0.6 mm are difficult if not impossible to fabricate. The shapes of the individual lens elements must be such that they can be manufactured without too much difficulty or expense by a reasonably competent optician. The indices of refraction and dispersions must correspond to those of real glasses listed in the catalogs of manufacturers and preferably those that are readily available and inexpensive.
3.2 Beginning the Design With the stated specifications in mind, the lens designer draws on his experience and intuition to select a lens type-a configuration of lens elements-that forms a familiar pattern which in his view will best lead to a satisfactory design. Such a choice of lens type includes a tentative selection of glasses. Kingslake [3, 41 has compiled pictorial descriptions of the basic lens types. Cox [ 5 ] has published a compendium of sketches of camera lenses for amateurs.
3.3 Thin Lenses The next step is to use some form of the thin lens equations to arrive a t a preliminary design with numerical values assigned to the various design parameters to assure the proper focal length and a generally favorable distribution of power among the elements. The power is defined as the reciprocal of the focal length. A t this stage of design 233
ORESTES N. STAVROUDIS
chromatic aberration is first considered. The thin lens equations used in this stage of design are simplifications of the finite ray tracing equations obtained by assuming the thickness of each lens element to be negligible [ 6 ] . At this stage the designer is not striving for an optimum solution but only for an initial system which meets the given requirements approximately. 3.4 Lens Bending
One of the important concepts useful in lens calculations is that of lens bending [ 7 ] .With each element is associated a power, the sum of all the powers of the lens elements being the total power of the optical system. Lens bending consists of adjusting the various lens parameters in such a way as to keep the power constant. I n the context of thin lenses, bending is exact. At later stages of design, the concept is [8] frequently used although the application is inexact. 3.5 The Thick Lens
At this stage the assumption that the thickness of the lens elements is negligible must be discarded. The problem now facing the designer is to maintain as well as possible the desirable properties achieved in the thin lens solution while making a transition to the thick lens. Several procedures for making the change have been recommended. Those of Conrady 191 and Berek [lo]are familiar to most lens designers. Herzberger’s method [ I l l is considered by some to be superior. See also Kerber [IZ]. 3.6 Third-Order Design
Now the designer comes to grips with the real lens. He has a layout, a picture showing the constructional details of the system, along with
tentative numerical values for the lens parameters-the indices of refraction, the dispersions, the curvatures, the thicknesses, and the separations. These values must now be adjusted to bring the design in line with the prescribed specifications. He now must discard the relatively simple and simple-minded concepts he has been using in favor of more complicated and sophisticated methods. 3.6. I . Fundomenrol Concepts
The basic ideas go back to Hamilton [13, 141 who first defined the optical characteristic function and derived its properties. His premise was Fermat’s principle, that the optical path length of a ray through 234
AUTOMATIC OPTICAL DESIGN
a lens is an extremal and that the ray path can therefore be determined by variational techniques. The optical path length of a ray is defined as follows. From the object point to the image plane, the ray passes through the various glass elements and air spaces of the lens system. Consider the segments of the ray bounded by the glass surfaces. The optical path length is the sum of the products of the geometric lengths of these segments and the index of refraction of the medium containing the segment. Since the index of refraction of a medium is proportional to the reciprocal of the speed of light in that medium, it can be seen that the optical path length has the dimensions of time and that the variational problem is exactly that of the brachistochrone. Let x, y, z and x’,y’, z‘ be the coordinates of points on the object and image side of a lens, respectively. Then, provided that the two points are not conjugates, i.e., that one is not the geometric image of the other, there is only one ray connecting them. Thus, it is possible to define a function of six variables V ( x ,y, z ; X I , y’, 2‘)) whose value is the optical path length of a ray through the lens, the initial point being (2, y, z ) and the end point being (x,‘y’, 2’). If X, Y , 2 and X ’ , Y‘, 2’ are the direction cosines of the ray in object and image space, respectively, Hamilton showed that
v, = x, vz,= - X I , vv = Y , vu,= - Y ‘ , v, = 2, v,. = -2’. The subscripts denote partial differentiation. Here it is assumed that the refractive index of object and image space is unity. A modern account of Hamilton’s work can be found in Synge [15]. Hamilton went on to apply his ideas to mechanics [16, 17, 181. I n this form his work attracted a great deal of attention, while the original optical application remained virtually unknown. Bruns [19, 201, unaware of Hamilton’s work, applied the general theory of characteristic functions to optics, in effect rediscovering Hamilton’s original approach. He called the optical characteristic function the Eikonal. Meanwhile, Petzval[21] and Seidel [ZZ,231 expanded in a power series what amounted to ray tracing equations. The first term of the expansion leads to the approximate formulas known as Gaussian optics. The second term, of degree three in aperture and field variables, yield five independent coefficients, calculable in terms of the lens parameters, which correlate with observable aberrations of lens systems. These are spherical aberrations, coma, astigmatism, field curvature, and distortion, and are known as the Seidel aberrations. Although they are 235
ORESTES N. STAVROUDIS
not the only factors in lens performance, they are fundamental in the classical approach to lens design. After the turn of the century, Schwarzschild [24, 251 showed that the Seidel aberrations could be obtained directly from the characteristic function or Eikonal. It should be mentioned that this system of aberrations is not the only one in use. Hopkins [26]describes a system based on a measure of the departure of the emerging wavefront from a sphere centered a t the ideal image point. Another method, due to Buchdahl[27], considers aberrations of higher order than the third. 3.6.2 Paraxial Rays
A linear approximation of the ray tracing equations results in a formula valid in a neighborhood of the optical axis (in the case of rotationally symmetric systems) in which sines and tangents are replaced by angles. Such a formula is known as a paraxial approximation or a paraxial ray tracing equation and the fictive rays traced with such equations are known as paraxial rays [28]. Using paraxial ray tracing equations, the Seidel aberrations can be calculated as well as the Gaussian or first-order properties of the lens. An additional important datum, the Petzval sum, together with the Lagrangian invariant, can be computed as well. Feder [29] and Allen and Stark [30] have published formulas designed for the machine calculation of these quantities. The vital ingredient of the Seidel aberration theory is that the system of third-order aberrations of a lens system is formed by the sum of the aberrations calculated for the individual elements. This is not the case for the higher-order aberrations determined by the power series expansion techniques described above. This troublesome detail has been the principal reason why an image error theory for higher-order aberrations has been so long in developing. 3.6.3 General Procedure in Third-Order Design
This stage of design commences with the calculation of the thirdorder aberrations. The designer then proceeds to reduce the aberrations by judiciously varying the lens parameters, the curvatures, the thicknesses, the separations, and, less frequently, the indices of refraction and the dispersions, In this work two paraxial rays are usually used, the paraxial principal ray which passes through the center of the diaphragm plane and originates a t the edge of the object field and the 236
AUTOMATIC OPTICAL DESIGN
paraxial marginal ray which originates at the center of the field and just grazes the edge of the limiting aperture. This process is iterative. The designer estimates from the size of the aberrations which and by how much the parameters will be changed. Then having changed the parameters he recalculates the third-order aberrations which then provide him with the data for his next set of changes. It is vital that throughout these calculations the focal length, the field angle, and the aperture be maintained a t their proper values. It should be borne in mind that there is no point in bringing these aberrations to zero since they are often used to control the higher-order aberrations during the next stage in the optical design process.
3.7 The Final Process Once the third-order aberrations have been brought to sufficiently small values, a process which may take from several days to several weeks, depending on the complexity of the design and the designer’s skill and luck, the arduous part of the task begins. Now finite rays are traced through the system using the exact ray tracing formulas. The results are usually plotted on graph paper to provide a visual display of the total aberrations. Again careful adjustments are made to the lens parameters to reduce these aberrations. The process is then repeated until the designer feels that his goal has been reached. This can go on for months or even years. In this stage of the design there are very few guidelines to follow. One of these is that large angles of incidence must be avoided. Another very common procedure is to introduce third-order aberrations of opposite sign to the total aberration to be corrected. However, in the final analysis the lens designer must rely on his experience, the background of information facetiously referred to as his bag of tricks, that he has spent his professional lifetime acquiring. It is conceivable that a point may be reached where no further improvements are possible, necessitating a drastic change in the overall lens structure. The whole design may even be discarded and a fresh approach made. The designer may introduce one or more aspheric surfaces in an attempt to control some troublesome aspect of the design. Once a satisfactory design has been achieved the work of the designer is not finished. The manufacturer obtains the glass from which the various components are to be made. I n practice the glasses may often have indices of refraction and dispersions which differ slightly from those used in the design calculations, necessitating compensating adjustments in the other parameters. If a large number of lenses are to
237
ORESTES N. STAVROUDIS
be made, especially if they are to be manufactured over a long period of time, minor changes may be made almost continuously, consistent with modern concepts of quality control. 4. The Computer Applied to Lens Design
4.1 Background
The developrnent of the high-speed computer proceeded hand-in-hand with its application to problems in optical design. I n fact there is evidence that procedures for handling optical problems on computers were in existence before computers were ready for them. The earliest report of the application of punched card accounting equipment to optical design problems was made by Grosch 1311 in 1945. Four years later he presented a similar paper outlining a ray tracing procedure for the IBM Selective Sequence Electronic Calculator [32]. In 1951, Feder's ray tracing program was coded for the Standards Eastern Automatic Computer (SEAO)and was used to check the performance of the machine [ 3 3 ] .Berning and Finkelstein [34] reported on the use of the IBM 604 in ray tracing and other optical calculations. In Great Britain, Black [ 3 5 , 361 reported on the application of the Manchester University computer to ray tracing. Some of the more recent publications on the subject are by Herzberger [ 3 7 ] , Laikin [ 3 8 ] , and Ford [39]. For a n exhaustive bibliography up to 1956, see Weinstein [ 4 0 ] . 4.2 Spot Diagrams
Lens design can be thought of as a feedback process. The lens at any stage of design is analysed by ray tracing, providing the designer with the information he needs to improve it. One of the major problems here is the interpretation of the ray tracing in a realistic manner. One such method is the spot diagram. A spot diagram is formed by the intersection of a number of rays with an image plane; the rays originating from a single object point and distributed uniformly over the aperture of the lens. I n doing a spot diagram analysis of a lens, several object points distributed over the field are used, resulting in a set of spot diagrams each describing the image of a n object point. One of the objects of lens design is to make each spot diagram as small as possible. The spot diagram is the analog of the star image used formerly t o test the image-forming properties of lenses. It is similar to the photogram, the plate obtained from the application of the Hartmann test [ 4 1 ] to a 238
AUTOMATIC OPTICAL DESIGN
lens. I n the Hartmann test, light from an object point is passed through a diaphragm consisting of a set of regularly spaced pinhole apertures placed before the lens. After passing through the lens the light falls on a photographic plate placed either behind or in front of the focal plane. From measurements made on these plates, called photograms, data on the image-forming quality of the lens under test can be obtained. Here, the diaphragm plays the role of the aperture with its uniform distribution of rays, the lens is analogous to the computer, and the photogram corresponds to the spot diagram. The earliest report on the use of spot diagrams in lens design was by Hawkins and Linfoot [42] in 1945 in which the properties of a Schmidt telescope they had designed were described. Later Herzberger [43] reported making and using spot diagrams as early as 1949. I n his method a relatively small number of rays were fitted by least squares to a pair of polynomials which were then used t o compute a large number of rays. Stavroudis and Feder [44] used a similar technique except that interpolation techniques rather than least squares were used. However, as the speed of computers increased it soon became apparent that it was more convenient and less expensive t o trace all of the rays required for a spot diagram rather than to introduce an intermediate step of polynomial fitting. Nevertheless, some approaches to automatic lens design use performance functions based on fits of ray tracing data to polynomials. Although computers very soon were able to compute spot diagrams in a matter of minutes, the task of plotting them often took weeks. was developed a t the National Several years ago, a program using SEAC Bureau of Standards for displaying spot diagrams on the face of a cathode ray tube. An important innovation in this program was a provision for changing both the location of the focal plane and the f number of the lens under study, which permits the operator to observe the effects of the change immediately [as]. A number of systems for evaluating a lens by means of the spot diagram in terms of familiar lens testing criteria have been proposed. The earliest of these, due to Murcott and Gottfried [46] was an ingenious device by means of which a chart used to test resolving power was projected through each point of a spot diagram. The union of these images results in a blurred picture of the resolving poser chart, the amount of blurring depending on the spread of the points of the spot diagram. An estimate of the resolving power can then be made by reading the image of the resolving power chart in the usual manner. Further work along this Iine has been reported by Keim and Kapany [471. 239
ORESTES N. STAVROUDIS
Hopkins, Oxley, and Eyer [as]use a numerical method for predicting resolving power which is based on a measure of the distribution of the points in a spot diagram. Resolving power is a measure of the ability of a lens to image two points which are very close to one another as two points. The relationship between resolving power and the size of the image of a single point is such that the utility of spot diagrams in predicting resolving power is clear. Let d, be the diameter in microns of the smallest circle containing a fixed proportion of the total number of points in a spot diagram. Then the resolving power in lines per millimeter is given by
R
= K / dd, g
- dt,
where d, is a, number representing the mean grain size for a particular emulsion and K is a number determined empirically. Another method for estimating resolving power is described by Stavroudis and Feder [as].The plane of the image is subdivided into a number of parallel strips either parallel to or perpendicular t o the meridian plane. The number of points of a spot diagram lying in each strip is plotted against the distance of the center of the strip from, say, the location of the Gaussian image. A smooth curve through these points provides a rough approximation to the energy distribution profiIe of the image of a straight line. Estimates of resolving power can then be made based on the half-widths of the peaks of the curves so obtained. Lucy [50] and Lamberts, Higgins, and Wolfe [51] have done further work along this line, the latter having obtained some correlation between predictions made in this manner and experimental measurements. See also LaBauve and Clarke [52]. Predictions of resolving power made in this way are based on the assumptions of geometrical optics and therefore take no account of diffraction effects. As a consequence they tend to be in error for high values of resolving power. Linfoot [53] discusses this problem in considerable detail. 4.3 James G. Baker
The most intensive study of the application of high-speed computers to optical design was undertaken by the Perkin-Elmer Corporation under the direction of James G. Baker during the years 1951-1955, with financial support from the U.S. Air Force. The results of these experiments were published in thirteen lengthy volumes [54] which were unfortunately classified until 1'359. Today much of this work has little other than historic value, the machines used having long since passed into obsolescence. 240
AUTOMATIC OPTICAL DESIGN
One of the first problems studied was how a lens designer, using traditional methods, could best use a high-speed computer. One of the approaches tried out by Baker was multiple design. Like the chess player who takes on a large number of opponents simultaneously, Baker attempted to design twenty-five optical systems a t the same time, His plan was to study the results of the calculations for one system while coinputations on the other twenty-four were in progress. When he had completed the analysis of the one lens and had prescribed the changes that the machine would next compute, another design problem would be ready for his attention. [ 5 5 ] . Using this sort of approach, an exhaustive study of all possible thin lens solutions of a lens configuration known as the Cooke triplet design was made. Subsequently these solutions were studied extensively from the point of view of third-order aberrations and later on in terms of finite ray tracing. Tse Yung Sung of Baker’s group developed a program for automatic design [ 5 6 ] .The merit function used was based on broad general criteria laid down by Wachendorf [57, ,581, and makes use of wavefront aberrations given in the form of measures of the departure of the emerging wavefront from a sphere. The program begins by tracing rays from the entrance pupil of the lens, taking into account the location of the object plane. The intersection points of the principal rays with the Gaussian image plane are determined. With these points as centers, spheres are constructed with radii such that their surfaces are near the plane of the exit pupil. Other rays are then traced through the system, the optical path length of each being calculated. With this information, the wavefront tangent to the proper sphere can be constructed. A necessary and sufficient condition for all rays from a single object point to converge to the Gaussian image point is that the wavefront be a sphere centered at that point. The measure of the departure of the calculated wavefront from the sphere is a measure of the aberrations of the lens. The quantities calculated are the path difference errors, the distance along the ray between the wavefront and the sphere, and the deviation of the principal rays from the ideal image points. Path difference errors are computed for a number of object points and for three colors. Distortion errors are determined from the deviations of the principal rays and are also calculated for three wavelengths. Additional quantities controlling the physical properties of the lens are calculated. These assure that no lens gets too thin a t its center or edge. If each of these quantities, the path difference errors, the distortion 241
ORESTES N. STAVROUDIS
errors, and the edge and central thickness errors, is designated by Ki, then the merit function is defined by
4
=
C Ki2.
The problem of designing a lens is then that of finding a minimum for 4. The program devised by Sung and Baker was for use on the Harvard Mark I V computer. Two versions of the method of steepest descent were used, I n addition, minimization by varying one parameter a t EL time and by varying groups of parameters in a manner akin t o lens bending was also used. This program required six minutes t o calculate one value of #I. The running time for adequate convergence for one type of triplet was about four days. During the life of this program, many lenses were designed, using, a t least in part, high-speed digital computers. Some of these are reported in detail in the Perkin-Elmer publications cited above. 4.4 Gordon Black
Another of the early efforts to deveh.Jp a completely automatic program for optical design was by Gordon Black a t the Computing Machine Laboratory of the University of Manchester under the auspices of the British Scientific Instrument Research Association [59, 60, 611. Typical of fully automatic methods, Black’s program makes use of‘ a single merit function constructed by forming a weighted sum of squares of aberrations obtained by ray tracing. The ray tracing procedures are essentially trigonometric with distinct programs for meridian and skew rays. The computation of the aberrations follows H. H. Hopkins’ [G2] scheme of wavefront errors and is similar to that of Sung, cited above. Paraxial rays are used to compute the primary or third-order aberrations, and finite rat traces are used to compute the total aberrations. Many aspects of this program resemble Baker’s. The automatic design portion of this program seeks a minimum in the merit function by altering the design parameters of the lens. Four distinct types of alteration procedures are used. Vuriable-by-variable minimization is used to find a minimum with respect to only one of the many parameters, making use, essentially, of Newton’s method. This is applied sequentially t o each parameter of the lens. Block operations or block relaxation is a procedure whereby a group of parameters are changed together. A special case of a block operation is lens bending. One infers that block operations include changing groups of parameters not belonging t o the same lens element, and thus can be thought of as EL 242
AUTOMATIC OPTICAL DESIGN
generalized lens bending. A group operation consists of an interpolation or an extrapolation from several lenses with known merit functions to a new design with a lower value. A random operation consists of random alterations to the parameters of a tentative design, making use of the random number generator of the Ma.nchester computer. Its purpose was t o determine whether a change of parameters results in an improvement when the other three methods have failed. If random operations fail to produce a design with a lower merit function then the design is presumed to be complete. If on the other hand a lens with a lower merit function is found, then a group operation is performed followed by variable-by-variable minimization and block relaxation. Conspicuous by its absence is a minimization routine making use of steepest descent. Black viewed this method too slow and too indirect for his purposes. On the whole, this program was very imaginative and, in this author’s opinion, very promising in spite of some initial blunders. It is regrettable that the program was dropped before really fast computers became available. 4.5 Donald
P. Feder
One of the principal proponents of the completely automatic approach t o optical design is Donald P. Feder. His premise is that anything a lens designer can do a machine can do better. Moreover, he is confident that automatic methods are potentially capable of producing a truly optimum design [63]. I n Feder’s earlier work, the merit function approach is used in the usual way [64].Aberration functions fi are defined in terms of design parameters X,. The merit function
+
=
Ctdi‘
is formed, where the pi are factors chosen by the designer to emphasize the contribution of those aberrations judged to be important from the standpoint of the lens specifications. The problem is to find a minimum for I n what follows it will be assumed that pi is incorporated in f i . To minimize a vector G is computed:
+.
+
G
=
+grad+,
where
a
a
ax,’ ax,’. - ’ *
243
ORESTES N. STAVROUDIS
N being the total number of design parameters in the system. By differentiating the expression for +, we obtain =
xfi(afi/axk).
If A = ( aft/ax).), then G = Af, where f is the appropriate column vector. Then a necessary (though not sufficient) condition for to be at a minimum is that G = 0. Let
+
L
=
(
a”/axidX,).
If L is positive definite for a set of values of X k for which G = 0, then is a t a minimum. The method of steepest descent is used. Starting a t some initial value of the X,, either arbitrary or as the result of some initial design procedures, the total differential is calculated:
+
d$
= x(a+/aXk)dX, =
2G
*
dS.
Since d+ is largest when ds is parallel to G , ds is made proportional to.G , ds = -hG,
resulting in d+
-2hG2.
The merit function 4 can now be taken to be a function of h, the step size. If h is sufficiently small, +(h) < +(O). Under fairly general conditions this procedure converges to a minimum [65, 661. The optimum gradient method is used to determine h. G is considered as a function of h and a value of h is found by trial and error for which G(0) G ( h ) = 0. At this point the gradient a t h is perpendicular to that at 0 and +(h)has a stationary value. Since +(h)< +(O), the value arrived a t can not be a maximum. The construction of the merit function 4 and its derivatives forms a vital part of the technique [67]. The aberration functions fi fall into several distinct categories. These can be described as (1) image error, ( 2 ) chromatic aberration, ( 3 ) distortion and field curvature, (4)focal length, (5) restrictions on focal length, and ( 6 ) boundary conditions. Essentially, the basic expression for image error is a measure of the departure of a wavefront from a sphere. I n brief, Hamilton’s characteristic function is used to derive an expression for the wavefront defect. This can then be determined for a given ray, using the principal ray as a comparison. The number of the image error functions present in is therefore dependent on the number of rays traced. Chromatic aberration is treated somewhat similarly, making use of Conrady’s D - d method [68]. 244
-
+
AUTOMATIC OPTICAL DESIGN
Distortion and field curvatures are controlled in the traditional way by comparing the difference between the principal ray and the position of the ideal image. Each principal ray contributes a term t o the merit function. An additional term in 4 assures that the proper focal length is maintained. Other terms restrict the variation of the indices of refraction. The terms designated as boundary conditions prevent the separations and the central and edge thicknesses from going below preassigned minimum values. More recently, according to personal communication, Feder has abandoned the straight method of steepest descent and has turned to the “conjugate gradient” method and to damped least squares [63]. 4.6 Procedures of Andre Girard and C. A. Wynne
Girard and Wynne, working independently, Girard in France and VC’ynne in Great Britain, arrived at similar procedures for applying automatic computers to optical design [69, 701. Their approach to the problem is more conservative than that of Feder in that their work is based on the premises of classical aberration theory rather than on the assumption of an empirically determined merit function. Nevertheless, their results bear a remarkable similarity to those of Feder. Suppose we have an optical system which is t o be improved. Let the system be specified by n design parameters xj (j= 1, 2, . . , , n ) , and its performance be characterized by m aberrations Ai (i = 1 , 2, . . . , m). Improving the design consists in altering the n parameters xj resulting in a new system which is characterized by aberrations which are smaller (in absolute value) than those for the original design. Starting for the initial design a sequence of new systems is obtained by altering each parameter. The chazge in the ith aberration due to a change in the j t h parameter defines a change matrix (aij). Assuming a linear relationship between the parameters and the aberrations, a linear matrix equation obtains,
+
(aij) (dxj) ( A < )= (Ai’), where Axj is the change in the j t h parameter and where Ai‘is the new ith aberration. Since the goal of any design procedure is to reduce the aberrations to zero, the governing equation is @,j)
(OXj)
+ (A<)
=
0.
If m = n, the solution of this equation requires the inversion of (aij). However, more often than not, this matrix is so ill conditioned that this approach is impracticable. 245
ORESTES N. STAVROUDIS
However, with n > m, it is possible t o apply a least squares technique to the equation. Solving the normal equations for the above equation leads to (AXj)
(%j)
+
(UiJ
(4) = 0.
The trouble with the least squares approach is that the solutions are too large to be consistent with the assumptions of linearity. On the other hand if the A x j are simply scaled down, some of the values are forced to values too small for effective operation. So, rather than limit the magnitude of the A x j , the size of CAxjzis restricted instead. This is accomplished by adjoining to the original matrix equation a set of equations stating that each Ax,, multiplied by some factor P,is zero. It is to this system, (aij)
(Ax,)
+ (4= 0 , PAX,
=
0
(j= 1, 2) . . . ,n),
that a least squares solution is sought. Suppose (Alx,) constitutes a least squares solution to the system of equations, and suppose in addition that the residuals of the first the aberrations of the revised system. The residuals equation are of the remaining equations of the system are simply P Ajxj. The sum of the squares of the residuals of the system is therefore 1+
+ P2zAixj2,
where l+ = ZlA,2. The quantity ,TA,xj2 is called, for obvious reasons, the step size. For a step size small enough to be within the range of linearity of the system, the sum of squares of the residuals must be smaller than that for the initial system, # = ZAi2, for which the Axj = 0. Thus I+
+ P2zA1xj2<#,
and therefore < +. This process has been designated by Wynne as a succewive linear approximation with maximum steps (SLAMS).~ Nunn and Wynne [ 7 l ] describe a method for use with SLAMS to prevent the parameters representing separations and edge and center thicknesses from becoming smaller than their minimum allowable values. When this happens, the computer is programmed to use values of all the lens parameters determined in the preceding iteration. The troublesome parameter is held fixed during the next several applications of SLAMS, allowing of course all other parameters in the system to 'See also [78], [80], and the discussion on p. 251.
246
AUTOMATIC OPTICAL DESIGN
vary. This procedure evidently perturbs the system sufficiently so that when the constraint is removed and SLAMSis applied to the full set of parameters, the separation or thickness parameter that had been troublesome assumes more reasonable values. 4.7 Joseph Meiron
Joseph Meiron and his associates, making use of the WEIZACcomputer located a t the Weizmann Institute of Science, Rehovoth, Israel, have developed several systems of programs for optical design. The earliest procedure reported is for reducing the residual aberrations of a lens in its final stages of design and makes use of the method of steepest descent [72]. The merit function used is based on the distribution of traced rays on the image plane. The rays from each object point are traced through the system and the differences between the principal ray and the other rays, measured on the image plane, are calculated. The merit function is basically the sum of the squares of these deviations calculated for each object point. Three object points are taken: one at the edge of the field, an axial point, and a third intermediate to these two. Other conditions assuring that physical limitations are not exceeded are included in the merit function as well. Meiron assumes that the design is so near completion that only changes in curvature need be considered and that these changes are small and are made in the form of lens bendings. Since the focal length is invariant under a lens bending, the only physical conditions which he felt were important were limitations on the size of the curvatures and control of the back focal length. Considering the merit function $ to depend upon the design parameters ul, u 2 , .. . , u,, a change in $, A$, due to changes in the u,, Au,, is given by
A$
=
C(a$/Wdu,,
provided that the Au, are sufficiently small. To assure this the du, ere restricted by
C(~iAui)'=
€3
where E is a preassigned constant and where the pi are weighting factors which serve to equalize the intervals of variability of the u,. It is shown that the values of the AU, which cause the greatest decrease in $ are given by AU, =
- iqa$/aU,)/p,Z. 247
ORESTES
N. STAVROUDIS
Here k is a positive constant. The merit function 4 can now be thought of as a function of a single variable k along its path of steepest descent. An additional condition, assuring that the back focal length 1 remain constant, is added. The system is now divided into rn components, each component being an element or group of elements to be treated together. The differential coefficients are evaluated as follows. A small change 6ci is added to all curvatures of the ith component, constituting a bending of this component, and the resulting value of is computed. This is done for - Sci as well. From the values of 4 so obtained a4/aci is obtained. A further reduction of 4 may be obtained by adjusting the position of the focal plane in effect, finding the plane of best focus. Let the intersection height of a ray on the image plane be H j and let U, be the angle between the ray and the axis. Then the merit function obtained by shifting the image plane an amount d is
4'
=
C [ ( H j - H O )- d(tan Uj - tan Uo)]2,
where the subscript 0 denotes the principal ray. Differentiating this expression and setting it equal to zero yields d
=
c
(Hj - Ho) (tan u,- tan UO). C (tan U, - tan Uo)2
In a later paper [73]Meiron describes a least squares methodof optical design which is of sufficient generality to be applied to the earlier stages of lens design as well as to reducing residual aberrations. Here the design parameters ul, u2,. . . u,represent thicknesses, separations, refractive indices, and dispersions as well as curvatures. The quantities fi,fi, . , . ,finrepresent either aberrations as usually defined in geometrical optics or the difference between the height of a ray and that of the appropriate principal ray, as described above. Also included among the f,are the focal length, back focal length, and other applicable physical restrictions. Considering the f, as functions of the u3,
Sfi where A ,
=
=
cA,
6Uj,
afi/auj. The Suj are chosen t o minimize
c CAij (
i
j"
z
- Yi)2,
the normal equation of which is
AtA 62c '248
=
At Sf,
AUTOMATIC OPTICAL DESIGN
where A = (Aij),At is the transpose of A , and 6u and Sf are appropriate column vectors. The merit function is defined as = c a i ( 6 f i ) 2where , the a<are a set of weighting factors. Note that the procedure for reducing the merit function does not operate directly on but on the functions from which it is formed. The procedure is used in both third-order design, where the fi are the Seidel aberrations, and during the final stages of design. I n practice, the design data are read into the computer, the various required quantities are calculated, and the normal equation is solved, leading to a new system. Let +* represent a value of the merit function which would correspond t o a satisfactory design, and let +,, and dl represent the values of the initial and altered merit functions, respectively. If +1 < +* the design is completed and the machine halts. If +,, > +1 > +*, the new system replaces the old and the cycle is repeated, If, on the other hand, +1 > +o and > +*, then the step size is scaled down before the cycle is repeated. I n addition to the procedures described above, Meiron and Volinez [741 described a method for variable-by-variable minimization by means of parabolic interpolation.
+
4.8 Robert E. Hopkins
Under the direction of Robert E. Hopkins, the staff of the Institute of Optics of the University of Rochester, Rochester, New York, has been engaged in research on the application of computers t o lens design since 1953 “751. Their approach has been conservative; they have concentrated on applying procedures used by successful lens designers rather than developing new techniques. They now have a set of individual programs which can be put together to suit the requirements of most designers. The emphasis has been on the elimination of routine computing tasks. Human decisions are necessary a t critical points in this design process. Machine programs for the IBM 650 computer are available a t nominal cost as is instruction in their use. I n the lens precalculation program, a tentative thin lens solution together with the required field angle and aperture are read into the computer. The surfaces can be specified by numerical values for curvatures or by the angle of the height of either the axial or the principal paraxial ray after refraction. The separations are defined by a numerical value or by specifying the height of either the axial or the principal paraxial ray on the following surface. The program provides a tentative thick lens design complete with the clear apertures for each surface. 249
ORESTES N. STAVROUDIS
The most important program is the automatic Jirst- and third-order correction (AUTHOR),the purpose of which is to reduce the Seidel aberrations to preassigned target values [76]. The parameters are altered one a t a time. With each alteration a new set of aberration coefficients is computed. Then, by making use of finite difference methods, the partial derivative of each aberration coefficient with respect to each design parameter is calculated. New values of the design parameters are obtained by what amounts to Newton’s method. This process is iterated until the target values are reached. An earlier version of this program [77, 781 used approximate formulas for the partial derivatives of the aberration coefficients with respect to the curvatures only. By applying lens bending, the aberrations were brought to the desired values. The equations used were
zazjAcj rn
j-1
= Ai
(i = 1, 2, . . . ,n),
where a , represents the partial derivative of the i t h aberration with respect to the j t h curvature, Acj is the change in t h e j t h curvatures, and Ai is the difference between the current value of the ith aberration and its target value. If m > n, m - n conditions between the curvatures were adjoined to the system of equations. The augmented Bystem of linear equations was then solved, yielding new values for the curvatures. The process was iterated until the target values were achieved. The general ray tracing program permits the tracing of either a single ray or a fan of rays from an object point. I n addition one can compute with this program the shape of the vignetted aperture and interpolation polynomials of seventh degree relating the entrance pupil coordinates with those in the image plane. By evaluating these polynomials, using as arguments the coordinates of a system of points distributed evenly over the entrance pupil, one obtains the coordinates of the points of a spot diagram. A radial energy distribution program enables one t o determine the number of points of a spot diagram which lie in annular regions formed by a series of concentric circles centered on the principal ray. This provides an estimate of the energy distribution in the image of a point, These data can be used to estimate resolving power. 5. Conclusion
We have reviewed most of the experiments in the field of applying computers to optical design. They fall into two distinct categories. One attempts to apply computers to the traditional lens design tech250
AUTOMATIC OPTICAL
DESIGN
niques; the other, bolder approach seeks a fully automatic method. Oddly enough, the differences are principally in the scope of application, reflecting the individual tastes and philosophies of the various personalities involved. The common denominator, common to practitioners of both schools, is the method of least squares which stems, at least in part, from an important paper by Rosen and Eldert [79], in which the basic ideas were laid down. The equations given by Rosen and Eldert recur with only minor variations in paper after paper. Mention must also be made of the method of damped least squares, a technique which has been applied by a t least Baker and Feder. This method was described by Levenberg [80] in 1944. What then is the state of the art Z What does a practicing lens designer do today when he designs a lens? According to Robert Hopkins [ S l ] , almost all optical design in the United States is done on small computers using traditional methods. A few designers use automatic correction of third- and fifth-order aberrations. I n Hopkins’ opinion, no fully automatic lens-correcting programs are available for use in routine lens design. We may well ask of what value to the optical designer is the computer. Again according to Hopkins, machines have made good designers into better ones. By reducing the time involved in computing a design and by making available procedures that until now have been much too complicated to perform, the computer has opened up new vistas for the designer. Rather than replacing the human designer the machine has demanded more of him. Although Hopkins is not a believer in the feasibility of fully automatic design procedures, a t least in the immediate future, he is confident that the larger, faster computers will play an important role within the next few years. With the computer, the designer will be able t o present alternate solutions t o any given problem. Without it he is capable of providing only one solution. I n addition, machines have the capability of storing information on a large number of existing designs, permitting a more efficient review of existing lens systems. A high-speed computer will allow the designer to complete a system in a much shorter time than is now possible. Moreover, designs of much greater complexity can be considered than are now practicable. I n short, Hopkins anticipates great advances in both lens design and in lens quality as the very high-speed, large storage computers are applied to conventional techniques in optical design. I n the review of literature on optical design using computers, no reference has been found in which the application of the methods and 251
ORESTES N. STAVROUDIS
techniques of Operations Research has been made, Although this author is not thoroughly familiar with this subject, he can’t help feeling that some of the problems treated in Operations Research bear more than a superficial resemblance to those of optical design and th’at crossfertilization would be fruitful for a t least one field of research. More fundamental to the improvement of the methods of optical design is the need to know more about geometrical optics and the nature of lenses. What is lacking is a precise and detailed picture of the nature of optical image formation. Until such knowledge is gained, lens design must depend almost entirely on an empirical approach. ACKNOWLEDGMENTS
I acknowledge with gratitude the criticisms and comments of Mr. Donald P. Feder and Mrs. Maxine Rockoff which have been invaluable in preparing this paper.
REFERENCES 1. Smith, T.,On tracing rays through an optical system (second paper), Proc. Phgs. SOC.(London) 80, 221-233 (1918). 2. Weinstein, W., Literature survey on ray tracing, in Optical Design with Digital Computers. Proceedings of a Symposium, pp. 15-18. Technical Optics Section, Imperial College of Science and Technology, London, 1956. 3. Kingslake, R., A classification of photographic lens types, J . Opt. SOC. Am. 36, 251-255 (1946). 4. Kingslake, R.,Lenses for aerial photography, J . Opt. SOC.Am. 32, 129-134 (1942). 5. Cox, A,, Optics: The Technique of Definition, 11th ed., pp. 141-223. The Focal Press, London and New York 1956. 6. Conrady, A. E., Applied Optics and Optical Design, Part 1, pp. 61-68. Dover, New York, 1957. 7 . Ibid., pp. 64-65. 8 , Ibid., pp. 455-6. 9. Ibid., pp. 68-70. 10. Berek, M., Grundlagen der Pruktischen Optik. Synlhese Oytischer Systerne, p. 128. Gruyter, Berlin and Leipzig, 1930. 11. Herzberger, M. Replacing a thin lens by a thick lens, J . Opt. SOC.A m . 34, 114-115 (1944). 12. Kerber, A.,Ein Portrlitobjectiv aus drei getrennten Linsen, 2. Instrumentenk. 36,269-278 (1916). 13. Hamilton, W. R.,Theory of systems of rays, Trans. Roy. Irish Acud. 15, 69-174 (1828). 14. Hamilton, W. R., The Mathematical Pupers of Sir William Rowan Hamilton (A. W. Conway and J . L. Synge, eds.), Vol. I : Ceometrica2 Optics, pp. 1-88. Cambridge Univ. Press, London, 1931.
252
AUTOMATIC OPTICAL DESIGN
15. Synge, J. L., Geometrical Optics. A ~ Introduction L to Hamilton’s Method, Cambridge Tracts in Mathematics and Mathematical Physics (G. E. Hardy and E. Cunningham, eds.) No. 37. Cambridge Univ. Press, London, 1937. 16. Hamilton, W. R., On a general theory in dynamics, . . . , Phil. Trans. Roy. Soc. London. Part I I , pp. 247-308 (1834). 17. Hamilton, W. R . , On the application to dynamics of a general mathematical method previously applied to opt,ics. Brit. Assoc. Rep. 1834, pp. 513-518. 18. Hamilton, W. R . , The iMntheirialica1 Papers of Sir William Rowan Hamilton, (A. W. Conway and J. L. Syngc, cds.), Vol. 11: Dynamics, pp. 103-161, 212-216. Cambridge Univ. Press, London, 1940. 19. Synge, J . L., Hamilton’s Method in Gcomctrical Optics, J . Opt. SOC.A m . 27, 75-82 (1937). 20. Rruns, H., Das Eikonal, Abh. Stichs. Ges. Wiss. Leipzig. Math-phys. K1. 21, 325-436 (1895). 21. Petzval, J., Bericht iiber die Ergehnisse einiger Dioptrischen Untersuchen. C. A. Hartleben, Pesth, 1843. 22. Seidel, L., Xur Dioptrik. uber die Entwicklung der Glieder dritterordnung . . . , Astr. Nachr. 43, 289-322 (1856). 23, Snidel, L., Zur Dioptrik, Astr. Nachr. 37, 105-120 (1853). 24. Schwarsschild, K. Astronomische Beohachtungen mit elementaren Hilfsmitteln, Neue Beitrlge zur Frage dcs Mathernatischen iind Physikalischen Unterrichts, an den Hoheren Schulen (C. F. Klein and C. V. E. Riccke, cds.). Teitbnnr, Leipzig and Berlin, 1904. 25. Schwarzschild, K . , Untersuchunyeri zur Geometrischen Optik. I . , Abh. der Kanigl. Ges. Wis. Qottingen, Math-Phys K1. Nmie Folge Band IV No. 3. Weidmannsche Buchhandlung, Berlin, 1905. 26. Hopkins, H. H., Wwue Theory of Aberrations, Oxford Univ. Press, (Clarendon),London, 1950. 27. Buchdahl, H. A . , Optical Aberration Coeflciercts, Oxford Univ. Press, London, 1954. 28. Conrady, A . E., op. c i t . , pp. 37-40. 29. Feder, D. P., Optical calculations wit,h ailtomatic compiiting machinery, J . Opt. SOC.A m . 41, 630-635 (1951). 30. Allen, W . A,, and Stark, R. H., Ray tracing using the IBM card programed electronic calcitlator, .J. O p t . S o c . A m . 41, 636-640 (1951). 31. Grosch, H. R. tJ., Ray tracing with piinched card equipment (abstract), J . Opt. Soc. A m . 35, 803 (1945). 32. Grosch, H. It. J., Ray tracing with t3heselective sequence electronic calculator (abstract),J . Opt. 8 o c . Am,. 39, 1059 (1949). 33. Solution of skew ray problem, N R S Tech. News Bull. 34, 125, (1950). 34. Berning, J., and Finkelstein, N., Somc applications of the IBM 604 calculator to routine optical calculations (abstract), J . Opt. SOC.A m . 43, 329 (1953). 35. Black, G., Kay tracing on thc Manchester University electronic computing machines, I’roc. Phys. Soc. (London) B67, 569-574 (1954). 36. Black, G., Ultra high specd skew ray t>racing,Nature 176, 27 (1955). 37. Hcrzberger, M., Automatic ray tracing, J . Opt. Soc. A m . 47, 736-739 (1957). 38. Laikin, M., Automatic ray tracing using the IBM 704, J . Opt. Soc. A m . 48, 666-667 (1958). 39. Ford, P. W., Use of digital computers for tho calculation of aberration-coefficients, J . Opt. Soc. A m . 49, 876-877 (1959).
253
ORESTES N. STAVROUDIS
40. Weinstein, W., Bibliography on ray tracing equat>ions, in Optical Design with Digital Computers. Proceedings of a Symposium. Technical Optics Section, Impcrial College of Sciencr and Tcchnology, London, 1956. 41. Hartmann, J., Object,ivuntarsuchungeri,2. Instrumen,tenk. 24, 1-21, 33-47 (1904). 42. Hawkins, D. G., and Linfoot, E. H., An improved type of Schmidt telescope, Mon. N o t . Hoy. Asfron.Soc. 105, 334-344 (1945). 43. Herzbcrger, M., Light distribution in thc optical imago, J . Opt. SOC. Am. 37, 485-493 (1947). 44. Stavrondis, 0. N., and Fedcr, D. P., Automatic computation of spot diagrams, J . Opt. SOC. Am. 44, 163-170 (1954). 45. Lenst,ar: Aids lens design, N R S Tech. News Bull. 43, 191 (1959). 46. Murcott, N., and Gottfried, H. S., Use of spot diagrams to synthesize the image of resolving power tost charts, J . Opt. Soc. Aln. 45, 582 (1955). 47. Keim, R. E., and Kapany, N. S., Image synthesis and lens response using spot diagrams, J . Opt. Soc. Am. 48, 351-353 (1958). 48. Hopkins, K. E., Oxlcy, S.,and Eyer, J., The problem of evaluating a white light iniagc, J . Opt. Soc. Am. 44, 692-698 (1954). 49. Stavroudis, 0. N., and Feder, D. P., op. cit. [ 441. 50. Lucy, F. A., Image quality criteria derived froni skew traces, J . Opt. Soc. Am. 46, 699-706 (1956). 51. Lamberts, R. L., Higgins, G. C., and Wolfo, R. N., Measurement and analysis of t.he distribution of energy in optical imagcs, J . O p t . Soc. Am. 48, 487490 (1958). 52. LaBauve, R . J., and Clarke, R . A., Potentialities for image evaluation of gcorrietric ray t>raccfocal plots, J . o p t . Soc. A m . 46, 677-680 (1956). 53. Linfoot,, E. TI., Convoluted spot diagrams and tjhc qi~alit~y evahiation of phot,ographic images, Oplica, A c t n 9, 81-100 (1962). 54. Bakcr, .J. G. (Project Dirrct,or), T h e U t i l k t i o r i of Autonmtic Calculating Macliinmy in the Field of Opt 1 I ) t - s 7 : p ( 1 3 voliimes). The Yerkin-Elmer Corporat,ion, Norwalk, Connecticut,, 1951 1954. 55. I t i d . , Initial st>agcsof tjhe dosign of 26 optical systems, Tech. Rept. No. 3, pp. 73-86. 56. Ibid., Proccdurcs in automatic optical design, Tech. Rcpt. No. 11, pp. 73-201. 57. Ibid., Tcch. Rcpt,. No. 6, pp. 8--16. 58. Wachenrlorf, F., Dic Bestirriniung eines optimalrn Linsrnsystems, Optik 12,329-340 (1955). 59. Black, G., Use of electronic digitd compii in optical design, Nature 175, 164-165 (1955). 60. Black, G . , On thc autoinat,ic design of optical systems, Proc. Pkys. SOC. (London,)B68, 729--736 (1955). 61. Black, G . , Automatic lens design, in “Opt,ical Design wit>hDigital Computcrs. Proccwdings of a Symposium.” Technical Optics Scction, Impcrial College of Science and Technology, London, 1956. 62. Hopkins, H. H., np. c i t . [ 2 6 ] . 6 3 . Feder, D. P., Aut~omaticlens dosign with tt high speed computer, J . Opt. Soc. A m . 52, 177-183 (1962). 64. Feder, D. P., Automatic lens design methods, J . Opt. Xoc. A m . 47, 902-912 (1 957).
254
AUTOMATIC OPTICAL DESIGN
65. Curry, H. B., The method of steepest descent for non linear minimization problems, Quart. A p p l . Math. 2, 258-261 (1944). 66. Crockett, J. B., and Chernoff, H., Gradient methods of maximization, Pacific J . Math. 5, 33-50 (1950). 67. Feder, D. P., Calculation of an optical merit function, J . Opt. SOC. Am. 47, 913-925 (1957). 68. Fcder, D. P., Conrady's chromatic condition, J . Res. NBS 52, 43-49 (1954). 69. Girard, A., Calcul automatique en optique geometrie, Rev. Opt. (Theor. I n s t r u ~ ~37, . ) 225-241, 397-424 (1958). 70. Wynne, C. G., Lens designing by electronic digital computer, I, Proc. Phys. S O C . (London) 73, 777-787 (1959). 71. Nunn, M., and Wynne, C. G., Lens designing by electronic digital computer, 11, Proc. Phys. SOC. (London) 74, 316-329 (1959). 72. Meiron, J., and Loebenstein, H. M., Automatic correction of residual aberrations, J. Opt. SOC. Am. 47, 1104-1109 (1957). 73. Meiron, J., Automatic lens dcsign by the least squares method, J . Opt. SOC. Am. 49, 293-298 (1959). 74. Meiron, J., and Volinez, G., Parabolic approximation method for automatic lens design, J. Opt. SOC. Am. 50, 207-211 (1960). 75. Hopkins, R. E., and Spencer, G., Creative thinking and comput.ing machines in optical design, J . Opt. SOC. Am. 52, 172-176 (1962). 76. Hennessy, W. P., and Spencer, G. H., Automatic correction of first, and third-order aberrations, J . Opt. SOC.Am. 50, 494 (1960). 77. Hopkins, R. E., McCarthy, C. A., and Walters, R.. Automatic correction of third-order abcrrations, J . Opt. SOC. Am. 45, 363-365 (1965). 78. McCarthy, C. A., A note on the automatic correction of third-order aberrations, J . Opt. s o c . Am. 45, 1087-1088 (1955). 79. Rosen, S., and Eldert, C . , Least squares method for optical correction, J. Opt. SOC. Am. 44, 250-252 (1964). 80. Levenberg, K., A method for the solution of certain nonlinear problems in least squares. Quart. A p p l. Aifath,. 2, 164-168 (1944). 81. Hopkins, R. E., Re-evaluation of the problem of optical design, J . Opt. S O C . Am. 52, 1218-1222 (1962).
255
This Page Intentionally Left Blank
Computing Problems and Methods in X-Ray Crystallography CHARLES L. COULTER National Institutes of Health Bethesda, Maryland
1. Introduction . . 1.1 Historical Background . 1.2 Crystallographic Background 2. General Computational Methods . . 2.1 Structure Factor Calculation 2.2 Fourier Series Calculation . 2.3 Differential Synthesis Refinement 2.4 Least-Squares Refinement . . 2.5 Patterson Methods 2.6 Phase Problem 3. Available Programs
References
. . .
.
,
257 258 259 . 270 . 270 . 273 275 . 276 281 . 282 . 283 . 284
. .
1. Introduction
The development of computers has led t o the alteration and extension of many fields of science. The study of molecular structures by analysis of X-ray diffraction patterns is a noteworthy example of this alteration and extension; a large portion of the research in this field over the last ten to fifteen years would not have been possible without the availability of computers. Thus a general discussion of the computational techniques that have been developed and the problems remaining in crystallography should be of interest and possibly of use to noncrystallographers. This article is not intended to be a comprehensive review of crystallography; such a treatment would require detailed discussion of the chemistry and physics of these systems quite apart from computing methods. Rather i t will be a reasonable sampling of machine techniques, which can be discussed with a minimum of background inhthe field, along with an indication of new ideas which are currently being developed.
257
CHARLES L. COULTER
The study of the chemical structures of crystalline substances through the X-ray diffraction patterns of the crystals has been the principal area of computational research. I shall emphasize this area of X-ray crystallography and exclude the other areas of the disciplinenotably the study of metals and crystal physics-within which computers have been used mainly as convenient tools. To solve most crystal structures the crystallographer postulates an arrangement of atoms in an infinite crystal lattice and compares the calculated diffraction pattern which his postulated structure would give to the diffraction pattern he has observed from the crystal of the unknown. When the observed and calculated patterns agree fairly well, and the postulated Structure is chemically reasonable, the crystallographer has a trial solution for the structure. The parameters defining this trial structure can now be refined against the observed diffraction data to confirm the solution and to extract any other physical information about this particular crystal structure which is contained in the data. Trial structures are evaluated through use of Fourier series summations, for which computers are usually a necessity, and the refinement is done using Fourier series or least-squares methods. The computing aspects of these steps will be discussed within the general framework of the physical and chemical situations encountered in practice. 1.1 Historical Background
Serious efforts to use digital computers in crystallography began before 1950 [ I - a ] . Lipson and Cochran [ 5 ] give a n excellent critical summary of the work on analog machines and punched card methods up to 1963. Pepinsky's ingenious X-RAC and S-FAC machines [6] were indicative of the refined status of the design of analog machines. Optical methods had also been used extensively in solving crystal structures [ 5 ] . This follows from the close similarity of light diffraction and X-ray diffraction [7], and optical methods are still important in themselves and in the development and use of Fourier transform methods (81. Once punched card accounting equipment became widely available, rapid, accurate methods for Fourier series summation using punched cards were developed 14, 51. This led quite naturally to the use of digital computers as soon as they became available, By 1956 routine procedures for the determination and refinement of crystal structures had been developed and programmed [Z, 3, 91. Work over the last few years has been on the optimization and modification of standard computing techniques and the development of new methods for more difficult problems. The accuracy of the data and problems in ade258
PROBLEMS A N D METHODS IN X-RAY CRYSTALLOGRAPHY
quately describing the thermal motion of atoms are the present limiting factors, Research on accurate automatic and semiautomatic data collection devices has expanded rapidly in recent years [ l o ] .The accelerated pace is due to the need for greater accuracy, of course, but is it also a result of the interest in larger structures, such as proteins [ I I ] .Large numbers of intensities must be measured in protein diffraction patterns, and the measurements must be done rapidly and reasonably accurately. Automation is necessary for this speed and to save the operator the problem of handling twenty or thirty thousand numbers. Computer-controlled diffractometers arepossi ble, and Cole, Okaya, and Chambers, for example, have designed a diffractometer linked to an IBM 1620 computer [IZ]. Usually it is more economical to build a small computer into the diffractometer to control the automatic operation, and to make the device compatible with a computer via card, paper tape, or magnetic tape input and output devices. The engineering and computing aspects of diffractometer design, construction, and use are quite sophisticated examples of instrumentation. The diffraction geometry is such that the reflections can be scanned by moving the counter along circular arcs. Multiple-circle crystal mounts and counter holders can be combined to allow one to reach nearly any point on the sphere of reflection, and thus to collect the intensity data. 1.2 Crystallographic Background The regular geometrical form of a crystal is a consequence of the regular arrangement of the molecules of which it is built up. This regularity of packing allows a crystal to be described in terms of the contents of the unit cell, the basic repeat unit for the three-dimensional crystal. The unit cell, in turn, usually contains atoms related by symmetry, and the knowledge of these symmetry elements further reduces the information required to completely describe the system. Specification of the unit cell, the space group, and the positions of the atoms not related by symmetry generally suffices to define a crystal structure. There are cases where lattice defects or disorder make such a description impossible or insufficient without qualifications, but these represent problems of physics rather than computation, and are the exceptions. Geometrically, a crystal is a three-dimensional diffraction grating for radiation of wavelength near one angstrom. X-rays from copper or molybdenum targets are convenient sources of radiation in this wavelength region and are usually used in crystal diffraction experiments. If a monochromatic X-ray beam is directed a t a row of equally spaced 259
CHARLES L. COULTER
atoms, the electrons of each atom will be sources of scattered waves, These waves will reinforce when the path difference for rays scattered by two adjacent atoms in the row corresponds to an integral number of wavelengths. If a. is the angle the incident beam makes with the row, CL the angle the diffracted beam makes with the row, and a the atom spacing along the row, the condition for reinforcement will be a (cos CLcos ao) = hX, where h is an integer. For reinforcement of the scattered radiation in a three-dimensional grating, three such conditions must be met simultaneously: a(cos CL-cos ct0) = hh,
(1.1)
b(cos p-cos Po) = kh,
(1.2)
c(c0s y-cos y o ) = Zh.
(1.3) These are the Laue equations. The integers (h, k, I) are called the indices of the diffracted beam, or alternately the indices of a reflection, since they define a plane which reflects the incident beam into the diffracted beam. The data which the crystallographer requires are the intensities of the diffracted beams for most of the (h, k, I) reflections giving significant scattering. These data range from two or three hundred to twenty or thirty thousand intensities for single crystals of varying degrees of complexity, Symmetry elements in the unit cell often cause the intensities of certain orders of h, k, 1 reflections to be absent, and the space group is deduced from these systematic absences, from the equivalence of the intensities for certain reflections, and from the variation of the average intensity of the beams with increasing order of diffraction. Locating the positions of the atoms in the unit cell is usually much more difficult. Indeed, from a computational point of view, the methods for determining approximate atomic positions in the cell and the refinement of these positions and the atomic shapes comprise most of crystallography. 1.2. I Structure Factors and Electron Density
It is customary to represent the diffracted ray for each hkl reflection by a complex number F = A + iB = jFleia called the structure factor. The absolute value IFI,called the structure amplitude, may be thought of as representing the amplitude of the diffracted wave, the argument (phase angle) cc as representing the phase of this wave, The intensities I of the diffracted rays are then proportional to the squares of the structure amplitudes. 260
PROBLEMS AND METHODS IN X-RAY CRYSTALLOGRAPHY
The proportionality constants are determined by the geometry of the camera or counter system that is used to record the intensities and by the orders of the reflections, and are thus known [13].The structure amplitude is defined as the ratio of the amplitude of the radiation scattered in the order hkZ by the contents of one unit cell to that scattered by a single electron under the same conditions [ 5 ] . The structure factors are determined by
111 1
F(hkZ) =
z=o
1
1
y=o
2=0
Vp ( x y z )exp [2vi (hx
+ k y + Zz)] dxdydz.
(1.4)
I n this equation, p ( x y z ) is the electron density a t point ( x , y , z ) in the cell and V is the unit cell volume; p V dxdydz is thus the amount of scattering matter in the volume element V dxdydz. Equation (1.4) also shows that the structure factors and the electron density are Fourier transforms of one another. The electron density should be zero except near atoms, and the integral (1.4) is actually calculated as a summation over the atomic positions. The pertinent equations are given below: I(hkZ) cx F(hkZ)F*(hkZ)= IFlikl, (1.5)
+ = Cficos 2.rr(hxi + ky, + k,), lFl:kJ
A,,,
=4 k l
(1.6)
BLl,
(1.7)
i
a(hkZ) = tan-l [B(hkZ)/A(hkZ)]. (1.9) Here, F*(hkl) is the complex conjugate of F(hkZ),fiis called the scattering factor for the ith atom, and the summations are over the i atoms in the unit cell. Equation (1.6) is the equation of a circle of radius IF J on an ( A , B ) coordinate diagram; the vector F = A iB intersects this circle a t a point determined by the phase angle a. Estimates of the structure amplitudes for most of the possible hlcZ values are obtained experimentally, and it is the cyrstallographer’s task to get an estimate of the phase angles a(hkZ) for these structure amplitudes. Once the amplitudes and phases of most of the reflections are roughly known, the electron density a t any point ( x , y , z ) in the unit cell can be evaluated from
+
p(xyz) = h
2
F(hkZ) exp [-2d (hz
+ k y + Zz)]
(1.10)
k l=-a
One other type of Fourier synthesis must be mentioned, since it has had such a profound impact on crystallography. This is the IF12 261
CHARLES L. COULTER
synthesis first used by Patterson [14], and now usually called the Patterson synthesis. Patterson defined a function P(uvw) such that
This equation simplifies to
The IP/2 values are experimentally available, so this series can be summed without phase angle information. The result is a map which corresponds to the weighted vector set of a set of point atoms. A peak in the Patterson function thus implies the presence of two atoms in the structure separated by the distance of the peak from the Patterson origin, and oriented in the same direction. The height of this peak is determined by the products of the atomic numbers of the two atoms involved. I n a structure of N atoms, there are thus N 2 Patterson peaks, N of which occur a t the origin since they correspond to the interactions of the atoms with themselves. The remaining N ( N - 1 ) peaks are distributed over the unit cell. For most crystal structures these peaks are not all resolved, and the overlapping of the peaks complicates the derivation of the original set of atoms which gave rise to this particular vector set. The Patterson synthesis contains all the information needed t o solve a crystal structure, but it is often well hidden. Buerger [I51 has discussed most of the methods now in use for unraveling this information. 1.2.2 Solving Structures
The derivation of a satisfactory first approximation to a structure is the most challenging endeavor required of the structural crystallographer. The Patterson function has proved to be a very valuable starting point for deriving trial solutions; indeed only a very few structures have been solved to date without some use being made of the Patterson function, One of the most useful properties of this function is the occurrence of peaks between symmetry-related atoms. The value of these peaks was first pointed out by Harker [16],and such peaks are often called Harker peaks, and the lines and planes upon which they occur Harker lines and Harker planes. Thus if a crystal has a 2, symmetry axis’ lying along b, every atom a t position x, y, z will have a ‘Here a r i d in the following we are using the space group notation as given in the “International Tables for X-ray Crystallography” [27].
262
PROBLEMS A N D METHODS IN X-RAY CRYSTALLOGRAPHY
+
symmetry mate a t -x, 8 y, -z. The Patterson vector between these two atoms will be a t 2x, &, 22. Therefore every atom in this structure will give rise to a Harker peak in the Patterson function at u = 2xi, v = 12 , and w = 22,, where x, and z, are the x and z coordinates of the ith atom in the cell. The section v = 8 in the three-dimensional Patterson function is thus a Harker plane. Similarly, if the space group has a mirror plane perpendicular t o the b axis, every atom at x, y, z will have a mate a t x,-y, x , giving a Harker peak in the Patterson synthesis a t 0, 2y, 0. Obviously, well-resolved Harker peaks provide immediate information about the positions of atoms in the cell. It also follows that the more atoms there are in a unit cell, the more difficult it will be t o find resolved Harker peaks. Frequently the crystallographer finds one or two atoms in his structure using the Harker peaks and then uses this information and the geometry of this particular molecule to unravel the general peaks in the Patterson function, thus deriving position coordinates for the rest of the atoms. An example of this approach for a reasonably complex structure has been provided in the solution of the structure of calcium thymidylate by Trueblood, Horn, and Luzzati [ l 7 ] . The asymmetric unit of this crystal contains twenty-eight atoms excluding the weakly scattering hydrogens. The coordinates of the heavier calcium and phosphorus atoms could be determined from the Harker peaks and the Ca-P peak. The oxygens which tetrahedrally surround the phosphorus were also located by inspection of the Patterson peaks. The remaining twenty-two carbon, nitrogen, and oxygen atoms were located through the Ca-X and P-X peaks, where X represents one of the C, N, or 0 atoms. Ca-X and P-X peaks should be higher than X-X peaks, since the calcium and phosphorus atoms are heavier scatterers of X-rays. By shifting the origin of the Patterson function to the two calcium and two phosphorus positions in the cell (the unique position and the symmetry related position for the space group) and searching for peak coincidences between the original and the shifted Patterson syntheses, with peak heights a t or above the expected P-X height, all the covalently bound atoms in the structure could be positioned. About ten possible locations for the six waters of crystallization were also suggested. This procedure for analysis .of the Patterson function was suggested by Beevers and Robertson [ l a ] .It is quite a powerful technique in structures where one has one or more locatable heavy atoms and a number of light atoms. There was one significant source of ambiguity in the analysis of the calcium thymidylate vector map. The packing of the molecules in the cell is such that the calcium and phosphorus atoms have very 263
CHARLES L. COULTER
nearly the same y parameters. This means that the distribution of these two atoms has the approximate symmetry of P2,lrn, whereas the actual space group of the crystal is P2,. The calculations basedupon the calcium and phosphorus positions thus have a false mirror plane perpendicular t o b. This leads t o an ambiguity in the choice of the sign of y for the other atoms. One sign could be chosen arbitrarily, and the signs of the y coordinates of the other covalently bound atoms could be assigned on the basis of chemical and stereochemical considerations. It was much more difficult to decide upon the proper choice of a sign for y for the water molecules, particularly since there were more plausible positions than there were water molecules. This difficulty led to some interesting problems, and it was not completely solved until the structure was refined. The importance of resolving such difficulties was brought home clearly in this structure, since an incorrect sign on the y coordinate of one water molecule had a significant effect on the bond distances and angles in the covalently bound portion of the structure. This is an instructive example both for the method of solution and for the problem with pseudo-symmetry. The presence of false symmetry elements in the early stages of the analysis is a common situation; solution of the structure frequently depends upon recognizing and resolving these unexpected ambiguities. An alternate approach to solving a structure containing a heavy atom in a general position is to locate the heavy atom using the Harlter peaks in the Patterson synthesis and to proceed via Fourier synthesis methods to derive coordinates for the remaining atoms. The Fourier synthesis calculated using observed amplitudes and phases based upon the heavy atom position alone will contain a high peak a t the heavy atom position and low peaks a t the positions of many or all of the lighter atoms which were not included in the phasing. I n addition such a poorly phased Fourier map will contain a number of spurious peaks. If the peaks which correspond to atomic positions can be distinguished from the spurious peaks, the structure can be solved by iterative Fourier synthesis methods. As more and more correct atoms are included, the phasing of the Fourier function improves and the spurious peaks tend to disappear. The classic example of the power of this Fourier procedure is in the solution of the structure of vitamin B,, by Professor Dorothy Hodgkin and co-workers [19].Vitamin B,, contains one cobalt atom and over one hundred atoms of lesser scattering power in the asymmetric unit. At the time the crystal structure analysis was begun the chemical structure of the vitamin was not known in detail, and the X-ray diffraction analysis proceeded in parallel with the chemical analyses. The results of the two approaches complemented each other, and the 264
PROBLEMS AND METHODS IN X-RAY CRYSTALLOGRAPHY
solution of the structure was thus doubly verified. The cobalt atom could be located from the Harker peaks in the Patterson function, and the structure was derived using Fourier methods, with the initial Fourier map being phased on the cobalt alone. The differentiation of the true peaks from the spurious peaks in the early Fourier syntheses was the crucial step in clearing up the map, and allowing complete solution of the structure; this required a detailed understanding of structural chemistry, a good geometrical intuition, a lot of hard work, and a little luck. Fortunately all these conditions were met, and the structure was solved. The structure of vitamin B,, is the most difficult crystal structure ever solved by routine crystallographic methods. These methods, which are so powerful on small structures, are clearly approaching the limit of applicability in structures of one hundred or more atoms. I n inorganic crystals the atoms are not always covalently bound into molecules, and the chemical information needed to solve these structures is that on coordination numbers and the structures of complex ions, The loss of the molecular rigidity which is present in nearly all organic struqtures-at least in a distance sense-suggests additional complications for many inorganic crystal analyses. Indeed, an inorganic structure with a given number of parameters is often more difficult to solve than the comparably sized organic structure. A recent example of the general techniques is the structure of the 6 phase of Mo-Ni [20]. This crystal gives a diffraction pattern which appears t o be very nearly tetragonal. I n the initial attempts t o solve the structure the symmetry was assumed to be tetragonal, and the slight deviations from this symmetry were attributed to the unequal occupancy of the tetragonally arrayed sites. I n alloys such as this the two metals frequently occupy the same site in different unit cells, thus leading to an occupancy variable for some or all of the positions. This explanation had to be abandoned when it proved impossible to solve the structure under these assumptions. The clue suggesting that the problem was a fundamental one was the failure of the proposed tetragonal structures t o explain some of the high peaks in the three-dimensional Patterson function. Clearly small variations of these structures would not explain these peaks either. Once the true orthorhombic symmetry of the crystal was recognized, the structure was solved by considering the Patterson map and some analogous layered metal structures. I n retrospect, the pseudo-tetragonality of the data could be explained, but the deviation of the structure from tetragonal symmetry was very marked. Solving crystal structures is still somewhat of an art, since nearly every structure poses different problems. The above examples are fairly 265
CHARLES L. COULTER
general, however, and should give a t least an impression of how the first approximation to a crystal structure is derived. 1.2.3 Refinement Procedures
Once one has estimates of the phase angles for most of the structure factors, either through knowledge of approximate atomic positions or through direct phase determination methods (which yield atomic position information via Fourier synthesis), the parameters defining the atoms must be refined versus the experimental data to fully verify the structure. Crystallographic refinement methods are all gradient techniques, so the initial atomic positions assumed must overlap the true positions for refinement to be possible. Atoms have diameters of about I A, so one should be within, say, 0.5 A of the true position to expect refinement. I n practice, initial atom positions are usually 0.1 A to 0.2 A from the true positions. The standard deviations for atomic positions in fully refined organic structures are ordinarily under 0.01 A. The crystallographer normally has about ten times as many observations as unknowns, and this permits the accurate positioning of the atoms in the unit cell. A great deal of computing effort has been directed towards the extraction of the maximum amount of physically significant information from these diffraction data, since they obviously define parameters in addition to those fixing the positions of the electron density maxima. A number of ways for refining crystal structures have been suggested and used, but in recent years the methods of differential Fourier synthesis and least squares have been the most prominent. Lipson and Cochran [ 51 discuss these and a number of the other refinement methods. The usual Fourier synthesis (1.10) is computed using observed structure amplitudes I FI and calculated phases. When the initial atomic positions are slightly incorrect, the maxima in the electron density of this “observed” Fourier sum will occur away from the initial atomic positions in the direction of the true atom locations. Thus Fourier syntheses can be used directly to refine the positions of atoms. The clearest way to see this refinement is by means of a difference Fourier synthesis. This is the Fourier map that one obtains by subtracting a calculated Fourier synthesis, with both the amplitudes and the phases of the Fourier coefficients based upon the approximate atomic positions, from an observed Fourier synthesis, with coefficients formed using observed amplitudes and calculated phases, For the computation the coefficients are subtracted from one another so as to require only one three-dimensional Fourier summation, When the 266
PROBLEMS A N D METHODS IN X-RAY CRYSTALLOGRAPHY
position and temperature parameters for an atom are correct, the difference Fourier will be relatively smooth in the region around the atomic site. Incorrect positioning is indicated when the assumed atomic position lies on a gradient in the Fourier synthesis. Incorrect temperature parameters lead to residual electron density or holes a t the assumed atom positions for isotropically vibrating atoms; in the cases of anisotropically vibrating atoms, one expects subsidiary maxima in the direction of the anisotropy. Figures 1 and 2 are from a paper by L. H. Jensen, published in Acta Crystallographica [Zl],on the refinement
FIG. 1. Difference synthesis showing electronic anisotropy. Differences are from electron densities corresponding t o B values of 4.5 for C1-C, and N, 5.5 for C, and 6.0 for C, and 0. Contours at 0.1 e.A-s, zero contour omitted, negative contours broken. (a)Composite map of sections close to the plane of the molecule. (b) Sections essentially perpendicular to the plane of the molecule.
267
CHARLES L. COULTER
of a long-chain organic compound, and they are excellent illustrations of these points. Figure 1 shows the pertinent portions of a difference Fourier synthesis calculated at an early stage of the analysis, with isotropic temperature factors assigned to each of the atoms. The sections near the plane of the molecule [Fig. l(a)] indicate small errors in many of the temperature parameters, since not enough density has been subtracted away a t most of the atomic sites, but the temperature motion in this plane is quite isotropic. One of the atoms (C-4) is also noticeably away from its correct position. The sections normal t o the
FIG.2. Difference synthesis from which final parameter changes were derived. Contours at 0.05 e.A-S, zero contour omitted, negative contours brokcn. (a) Composito map of sections in or parallel to the (401) plane and close to the plane of the rnolccule. (b) Sections paralle1 to the (201) plane and essentially perpendicular to the plane of the molecule.
268
PROBLEMS AND METHODS IN X-RAY CRYSTALLOGRAPHY
plane of the molecule are illustrated in Fig. l ( b ) , and here a marked anisotropy is evident. This difference Fourier synthesis was used to derive new position coordinates and to assign anisotropic temperature parameters to the atoms, and a new difference synthesis was calculated. After a number of iterations, the map used to construct Fig. 2 was calculated. The contrast between Figs. 1 and 2 is striking, especially since two contours in Fig. 2 correspond to one in Fig. 1. Clearly, the refinement has been effectively completed a t the Fig. 2 stage, and the anisotropy evident in Fig. 1 has been properly accounted for. While it is possible to calculate the slopes and curvatures a t the atom positions in a Fourier map, it is easier in large problems to do this analytically. Booth [22] suggested an analytical method for Fourier refinement which is called the differential Fourier synthesis technique, and this is the usual Fourier refinement method. It involves the analytical differentiation of the Fourier series expression to allow evaluation of the slopes and curvatures in the electron density a t the assumed atomic positions, thus allowing one to predict new positions for the next iteration, in better agreement with the data. Cruickshank [23] has shown this method to be nearly equivalent to the least-squares approach The least-squares algebra is simpler to derive and apply, so I shall restrict detailed discussion to least-squares methods. Hughes [24] first suggested using least-squares techniques to refine crystal structures. Because of the obvious suitability of least-squares computations for digital computers and the great versatility of the method, least-squares refinement has become the usual technique in crystallography. Considerable effort has gone into the development and optimization of crystallographic least-squares procedures. This method can be formulated in the following way [25]. The function to be minimized is (1.13) Here h is the (h, k, I) index of the observation, w,,is the weight assigned t o the observation, IFolhis the observed amplitude, and lFclh is the calculated amplitude. Since the structure amplitude is nonlinear, it is expanded in a Taylor series in the usual way. The observed amplitudes are set equal to the true amplitudes to give the observational equations (1.14) (1.14) 269
CHARLES L. COULTER
where X i is one of the n parameters t o be varied. These observational equations reduce to the least-squares normal equations of (1.15): (1.15)
Equation (1.15) corresponds to the familiar Ax = ZI format. The parameters to be varied are the independent position parameters for each atom and temperature parameters to allow for the vibration of atoms. Since the observed intensities are for a time-averaged structure, the vibration parameters are often called shape or form parameters. The fLin Eqs. (1.7)and (1.8)becomesfto exp( - T J ,wheref,O is the scattering factor for the atom a t rest and T,is as in Eq. (1.16) for atoms vibrating ellipsoidally (as is usually assumed) :
T , = h2Btl
+ k2B,”L+ 12B,LS+ hkBi2 + hlB,”,+ klB&.
(1.16
For spherical atoms, T,is set equal to R,sin2OjP, where (sin 8/h) is readily calculated from (h, k , 1 ) and the cell constants following Bragg’s law [ 5 ] , and B, is now the single temperature parameter. The above provides a skeletal sketch of the data and basic formulas of crystallography which should be sufficient for our purposes. Further information on all these points can be found, e.g., in [ 5 ] ,[TI, and [26]. 2. General Computational Methods 2.1 Structure Factor Calculation
Structure factor calculations require the evaluation of trigonometric functions of the reflection indices and the atomic position coordinates [eqs. (1.7) and (1.8)]. If the atoms are assumed to be isotropic (one temperature parameter per atom) one can take advantage of the symmetry relations between equivalent atoms in the unit cell and condense the trigonometric expressions for A and B considerably. This has been done in the tabulation of formulas for each space group given in Volume I of the “International Tables for X-Ray Crystallography” [27]. Here are found condensed expressions for A , B , and the Fourier series equation for one or more possible origins of each of the space groups. Atoms are seldom vibrating spherically, however, and crystallographic intensity data are usually accurate enough and extensive enough to give information on the degree of anisotropy of the atoms and the 270
PROBLEMS AND METHODS IN X-RAY CRYSTALLOGRAPHY
directions of this anisotropy. To this end, atoms are assumed to be vibrating ellipsoidally in the later stages of the refinement, and, in the general case, six temperature parameters are assigned to each atom, as discussed in Section 1.2.3. I n the initial stages of refinement, atoms usually are taken to be spherical, since the atomic positions are quite rough. The additional temperature parameters have little meaning unless the atomic positions are very nearly correct, and actually tend to slow convergence towards the correct positions if they are used too soon [28]. Considerable care must be exercised in using simplified trigonometric formulas when anisotropic atom shapes are involved, since the temperature parameters must be transformed to those for the symmetry-related atoms along with the position parameters [29, 301. The safest general computational procedure is t o treat all crystals as triclinic (no symmetry or only a center of symmetry) and to derive the temperature factors for the nonunique atoms following Levy’s rules [31]. The main portion of a structure factor calculation which is subject t o optimization, after the inclusion of a fast sine subroutine, is in these transformations. For example, it is faster t o transform the (h,k,I) indices rather than the actual position parameters; since the equations involve hx, ky, and lz, this can be done, and many programs are designed this way [9,321. The space group information needed t o do these transformations is read as code words to be operated upon. In early programs these were often packed binary words [3], but this was a frequent source of error for the user. Recent trends are towards reading the actual alphameric positions (e.g., X, Y , 2 ; 4 +X, - Y , - 2 ; etc.) and interpreting these in the machine or reading the space group number and finding the proper code word in a library built into the program. Calculation of the contribution of one atom to many reflections a t once has been done also [ 2 ] ,but this yields a less versatile program. Structure factor programs are ordinarily part of a larger program for Fourier synthesis or least-squares calculations, and are tailored to suit the over-all logic of the program system. The description of a crystal structure in terms of ellipsoidally shaped atoms leads one to consider the effects of anisotropic vibrations on the atomic positions. Angular oscillations of molecules lead to electron density maxima in the time-averaged structure which are closer to the axis of oscilIahion than they would be for the molecule at rest. The correct interatomic distances are thus not necessarily the same as the distances between the electron density maxima. The coordinates of these maxima are the position parameters which emerge from an X-ray diffraction analysis of a crystal. Accurate location of atomic positions in the time-averaged structure does not imply com271
CHARLES L. COULTER
parable accuracy in interatomic distances. For cases where a molecule can be treated as a rigid body, Cruickshank [33, 341 has developed a method for correcting the position parameters for rotational oscillation effects. Discussion of this method requires a closer look a t the temperature factors mentioned in Section 1.2.3. This analysis follows Cruickshank’s analysis closely [33, 351. The scattering factor for an atom in thermal motion is the product of the scattering factor for the atom at rest multiplied by the transform of the ((snzearing” function. For an atom vibrating isotropically, the smearing function t ( x ) is a Gaussian -
t ( x ) = ( ~ T z L ~ )exp - ~ ”(-z2/23),
(2.1)
where 3 is the mean-square displacement in any direction. The transform of t(x) is p(s) = exp ( - 2 ~ G s 2 ) , (2.2) where s = 2 sin B/h is the reciprocal radius (l/dhklfrom Bragg’s law [ 5 ] ) .The common form for this transform is q(s)
= exp
[ - B( sin B/h)2],
(2.3)
where B = 8n2G is the isotropic temperature factor introduced in Section 1.2.2. Anisotropic vibrations are characterized by a symmetric tensor U ,with six independent components. The six B, in the anisotropic temperature factor expression [Eq. (1. IS)] define the meansquare displacement of the atom in a given direction. Following Cruickshank [33], we assume that symmetric tensors u’ have been derived for each atom r such that
is the mean-square amplitude of vibration of atom r in the direction specified by the unit vector 1 = (l,, l,, l a ) .Now, on the assumption that the molecular vibration can be treated as a rigid body vibration, U, values can be decomposed into translational vibrations and librational vibrations about the center of mam. If T is the tensor giving the meansquare amplitude of the translational vibrations, the translational contribution t o the motion of any atom will be simply 2-1
2 Ti& 3-1
Similarly for the librations we may assume that
272
PROBLEMS A N D METHODS I N X-RAY CRYSTALLOGRAPHY
is the mean-square amplitude of libration about an axis defined by a unit vector t = (tl,t,, t 3 ) through the center of mass. Cruickshank [33] outlines in detail the method for calculating the T and w tensors from the U' tensors for the atoms. The T and w values with respect to orthogonal crystal axes or inertial axes can be used t o obtain a set of UijcalC; comparison of the UijCaiCand Ui;bs allows the error to be estimated and tests the validity of the rigid body assumption. To the extent that the atoms are vibrating independently, the UijcaLc and Uiybswill disagree. If the molecule in question has an obvious set of inertial axes, the T and w tensors are likely to be nearly diagonal along these axes, so the tensors are often transformed to the inertial axes. Alternately, the T and w matrices can be diagonalized to determine the actual axes of maximum translational and librational motion, and the eigenvectors defining the axes of maximum libration compared with the direction cosines of the molecular axes. The packing of the molecules in the crystal, rather than the geometry of an isolated molecule, often defines the axis of maximum libration in cases where one direction is obviously favored. The reason for this favored libration axis should be explainable on the basis of the structure, and for obviously rigid molecules such as benzene [36] and the cyclophanes [37] with significant librational motion about one particular axis this can be done very satisfactorily. The librational oscillations for rigid molecules are used t o correct the position parameters [34] by moving the atoms radially out from the center of mass. I n molecules which are obviously not rigid bodies these corrections cannot be applied with any confidence. The only way to insure accurate bond distance information in such systems when atomic vibrations are markedly anisotropic is t o collect the diffraction data a t low temperature, where the vibration is less [38]. For rigid or nearly rigid molecules, Cruickshank's [9] program for analyzing these motions is the most complete, and has been generally followed by others. 2.2 Fourier Series Calculation
Once a set of estimated phases is in hand, either through a structure factor calculation or by other means, a Fourier series is calculated t o allow examination of the variation of the electron density throughout the unit cell. Erroneous phases generally give electron density distributions which are clearly wrong, usually because they suggest atomic positions that are chemically implausible. I n addition, Fourier series methods can be used to refine structures or to monitor least-squares refinements. Attempts to evaluate the Fourier series without actually summing it have not been successful, and the main efforts have been 273
CHARLES L. COULTER
directed towards doing this large scale computing operation efficiently, I n a normal Fourier series calculation it is necessary to evaluate the contribution of one or two thousand coefficients to each of about fiftyfour thousand points a t which the electron density is sampled. By taking advantage of the symmetry of the sine and cosine functions, and by grouping terms carefully to allow full use of table look-up procedures, these summations can be done quite rapidly. Before electronic computers became available a number of methods for calculating two-dimensional Fourier syntheses had been developed. The strip methods of Beevers and Lipson and of Patterson and Tunell were-the most useful; these are discussed by Buerger [39]. All of these procedures treated the two-dimensional summation as two one-dimensional summations. Beevers and Lipson suggested this method and proposed specific ways of grouping terms to give an efficient calculation. Bennett and Kendrew [l] and others have concluded that their computational layout seems the best general method of calculation. An example of a Fourier program is the easiest way to illustrate the procedures involved. Some time ago the author was asked to write a Fourier synthesis program for the IBM 7090 suitable for use with the data from crystals of the protein myoglobin [40].The unit cell was divided into 80ths along the b and c directions and 1 6 0 t h ~along the a direction to sample the electron density about every f A. The space group of myoglobin is P2,, so half the cell must be examined, the other half being related by symmetry. The Fourier map was brought out in 40 Y sections, each of 160 by 80 points. This section-by-section technique is slower than the calculation of the entire three-dimensional map a t once, but it allows one to interrupt the calculation a t will and gives a more versatile program. There were 17,000 Fourier coefficients. The Fourier series expression for P2, is given in Eq. ( 2 . 7 ) :
cos 27rkY
+ 8,cos 27r(hX + 12)sin 27rkY
k-2n+1
(Ahsin 2 4 h X
+ ZZ)sin 27rkY
-
m
-
m
C C
h - - a
k=O
m =O
+ EZfcos ZnkY)] Tables of sin 27r(hX + 12)
B, sin 27rfhX
The coefficients were sorted on the k index. and cos 27r(hX EZ) for all values of the arguments were calculated and saved on tape for subsequent restarts. For a particular Y section, the k even reflections were used to generate a (2h 1) by (I + 1) array of z,,(A,,cos 27rkY B,, sin ZrkY), and for the k odd reflections 274
+
+
+
PROBLEMS AND METHODS IN X-RAY CRYSTALLOGRAPHY
+
a similar array of Z,,( -Ah sin 27rlcY Bh cos 27rkY) was calculated. The a direction required a 2h array to allow for the negative h indices; these were converted to positive integers to allow use of the h and 1 indices as subscripts in a table look-up. The addition of one to all indices was done to avoid zero subscripts. One octant of the 160 by 80 point section was calculated, and the other seven octants were generated using the sine function symmetry. This required evaluation of the h and 1 index contributions on an even-even, even-odd, odd-even, odd-odd basis, since the contribution to the various octants depended upon the parity. This was quite easily done using an increment of two in the loop. program written by Coulter and Watson [ 4 l ] on this The FORTRAN basis calculates a 160 x 80 section with 17 000 Fourier coefficients in 8 min on the IBM 7090. Programs of similar efficiency, suitable for all space groups, have also been programmed, and are distributed by the authors [ 4 2 ] . A good example of an efficient general Fourier program is Sly and Shoemaker’s MIFRl for the IBM 704 [43] and the corresponding 7090 program [ 4 2 ] .Excellent Fourier programs for the IBM 1620 and LGP-30 are also available, t o mention two smaller machines. Obviously, fast Fourier syntheses depend upon efficient use of tables, which in turn implies a strong dependence on memory size and speed of access to auxiliary stores. Some work has been done on weighting Fourier synthesis coefficients to give clearer maps [44, 451. Care must be used here to be sure of the reason for weighting the coefficients. By weighting the coefficients that are very nearly correct more heavily than those that, are still in error, a sharper Fourier map can be obtained but this is a t the expense of damping the tendency for the atoms to move towards the correct positions. Also, once a set of Fourier coefficients is weighted, the end result is no longer an electron density map. I n general, weighting Fouriers has not proved useful enough to have become a standard technique. 2.3 Differential Synthesis Refinement
Refinement by means of repeated Fourier syntheses was the first refinement method used, and Booth [22] codified the procedure to make it semiautomatic. His differential Fourier synthesis refinement technique has been the main Fourier refinement method used. It was discussed briefly in Section 1.2.3. Fourier refinement has the advantage of being directly concerned with the electron density, whereas methods such as least squares work with the structure amplitudes directly. I n difficult refinements chemical knowledge can be used along with math275
CHARLES L. COULTER
ematical knowledge to diagnose the problems. I n the early stages of refinement, Eichorn [2R] has shown that differential synthesis is just as efficient and somewhat safer than the usual least-squares techniques. The disadvantages of Fourier refinement, compared with least-squares refinement, are the lack of control over the weighting, the corrections needed to compensate for the early termination of the infinite series, and the rather cumbersome algebra of the error analysis. All of these problems except that of the weights have been solved, and differential synthesis is often used to refine crystal structures. Least-squares methods are computationally simpler and more versatile, however, and Sparks and Cruickshank [46]have shown that the two methods do yield the same answers, as suggested by Cochran [ 4 7 ] , Sparks 1481, and Cruickshank [23]earlier. ‘Thus, least-squares refinement has gradually replaced differential Fourier synthesis refinement.
2.4 Least-Squares Refinement The algebra of the least-squares method as it is used in crystallography was developed in Section 1.2.3. Hughes [24] was the first t o use least-squares to refine a crystal structure. He set all the off-diagonal terms in the matrix to zero and refined the structure of an organic compound, melamine, using two-dimensional X-ray diffraction data. Two-dimensional refinements are feasible when the atoms in the projected structure are well resolved, but become difficult when the atoms overlap each other in the projection. The results of two-dimensional refinements are usually less accurate than full data refinements, and two-dimensional work is now done more to confirm or deduce chemical structures in a general way rather than to derive precise structural information. The following discussion applies to general three-dimensional least-squares refinement and may not always hold in special situations or in projection. The development of least-squares methods in crystallography has been closely parallel to machine development. Hughes’ two-dimensional diagonal refinement could be done by hand, and early computers were programmed for three-dimensional diagonal and block diagonal leastsquares refinements [3]. Full matrix programs, which take account of the cross terms between all parameters, require more storage, and were not extensively used until Busing and Levy programmed and distributed a full matrix least-squares program for the IBM 704 [32, 421. Extensive successful use of this and other full matrix programs has verified the validity of least-squares procedures in crystallography. I n particular, these experiences suggest that the higher-order terms in 276
PROBLEMS A N D METHODS I N X-RAY CRYSTALLOGRAPHY
the Taylor series [Eq. (1.14)] can be safely neglected in full matrix least-squares refinements. Eichorn [28] suggests that this nonlinearity problem is important when approximate matrices are used in place of the full matrix of the normal equations. Nonlinear least squares is quite involved, however, and rather than include another term in the Taylor series when approximate linear refinements become difficult, crystallographers go to a full matrix linear least-squares program. Hughes’ original weighting scheme [a41 or slight variants of it are still the most commonly used ones. The correct weight for an observation is the reciprocal of the variance [as].The variance of an X-ray diffraction intensity is mainly a function of the magnitude of the intensity, so individual weights are usually replaced by functional weights based upon the magnitudes. There is a n inconsistency here in that Eq. (1.13) to (1.15) deal with structure amplitudes rather than their squares, and the squares of the amplitudes-the intensities-are what are observed. The algebra of Section 1.2.3 can be developed equally well in terms of IFI2 to permit technically correct weighting [50], but with functional weighting schemes this does not seem to matter. Some recent programs allow either IF(or IFI2 refinement. Hodgson and Rollett I511 point out that different weighting schemes can give significantly different results. Fortunately, this seems to be the exception rather than the rule for position parameters. Temperature parameters are more sensitive to the weights used, but even here the variations seldom affect the over-all structural picture [as]. 2.4. I M a t r i x Approximations
Sparks [SS] has published a rigorous mathematical analysis of the validity of the various approximate matrices used in place of the full matrix of the normal equations by crystallographers. Full matrix least-squares procedures are certainly safest for general use. They are very time consuming, however, and are usually inconvenient for problems of over 100 parameters because of machine size limitations and time considerations. Crystallographic experiences have suggested that one can get more convergence per unit time using approximations to the matrix of the normal equations. Sparks [25] discusses the most frequently used of these approximations, their validity, and their usefulness. The price of this more rapid convergence is a less stable mathematical system, which must be monitored carefully to insure convergence. The most common approximate least-squares methods are the block diagonal technique, in which the cross terms among the three position parameters for each atom and those among the six temperature 277
CHARLES L. COULTER
parameters for each atom are calculated [3] ( 3 x 3 and 6 x 6 blocks down the diagonal) and the 9 x 9 block diagonal technique, in which all cross terms among the parameters for a given atom are calcblated [52]. Both these methods are widely used, and both are considerably better than the diagonal approximation. The block diagonal methods and the diagonal method usually require an acceleration device to speed up convergence and sometimes to prevent divergence. This takes the form of a shift factor less than one, which is used to damp the predicted shifts. Sparks 1.251 has discussed the theory behind these factors, and Hodgson and Rollett [51] propose a specific acceleration device applicable to a block diagonal method. I n practice, the shift factors are still determined empirically by most workers. The main problem that has arisen with use of these approximate methods is an occasional case of very slow convergence. Unless the iterations are continued until the shifts are negligible compared with the standard deviations, one is likely to stop before convergence [25, 511. 2.4.2 Additional Restraints
As crystallographers gained confidence in the applicability of standard least-squares methods to their systems, they attacked more difficult and larger problems. The striking early success of least-squares in crystallography was due mainly to the fact that the systems were about ten times overdetermined. As the degree of over determination decreases the method becomes less straightforward, and additional restraints are often convenient and sometimes necessary to insure efficient convergence. Since crystals are usually made up of molecules about which a great deal of independent information is available, some additional restraints are easily developed. Rollett [53] has discussed some of the possible chemical and mechanical restraints. The most obvious condition is to require the intramolecular distances to remain within certain set limits. This constraint is often informally applied t o determine empirical shift factors in complicated structures. For example, in the least-squares refinement of myoglobin [41],the mean C-C, C-N, and C - 0 bond lengths in the peptide bonds were examined as a, function of the shift factor, and a factor which kept these distances reasonable was applied. In this case there was actually an optimum region with good average bond distances and standard deviations below those for the distances of the starting parameters. Above this optimum the average distances and the spread in distances both worsened, and for lower shift factors the mean distances remained satisfactory, but again the standard deviations increased. Ordinarily there are not enough 278
PROBLEMS AND METHODS IN X-RAY CRYSTALLOGRAPHY
independent distances to warrant this detailed a statistical analysis, but in these smaller systems a rather loose least-squares fitting of the known distances can be done. In at least one case where distance constraints were formally incorporated in a least-squares program, they led to convergence for a structure which had not converged rapidly when refined in the usual way [54].
2.4.3 large Systems New methods are clearly necessary for refinement of very large systems such as proteins, or for solution of smaller problems with very little data. I n these cases, the most likely course is to redefine the systems in terms of fewer parameters. Thus for proteins the fifteen parameters needed to position the five atoms in a planar peptide group can be replaced by the six parameters needed to orient a peptide plane of known dimensions in space [ a l l . This simplification can also be applied to small molecules of known dimensions [55], and makes exhaustive search techniques more practical as methods for solving unknown structures [56]. The extension t o systems with some axes of free rotation also follows fairly readily. The large number of papers concerned with applications of this technique presented a t the Rome meeting of the International Union of Crystallography is indicative of the current interest in these methods. Arnott and Coulter [57] have developed such a method for use in refining the crystal structure of deoxyribonucleic acid (DNA), and this is representative of the general technique. DNA is a biological polymer of cardinal importance in life processes. It is made up of two helical strands entwined together, and running antiparallel to one another. Each strand is made up of a “backbone” of phosphate groups linked to molecules of the sugar deoxyribose, and a purine or pyrimidine base is bound to each of the sugars. The sugar-phosphate backbones of the two strands are on the outside of the molecule, and each purine or pyrimidine base in one strand is hydrogen bonded to a pyrimidine or purine base in the other strand, thus holding the two strands together. The lithium salt of DNA forms highly crystalline fibers under suitable conditions, and the X-ray diffraction patterns of these fibers have been extensively studied by M..H. F. Wilkins’ group for some years [58]. X-ray diffraction data to a resolution of 1.6 A have quite recently been obtained for the crystalline lithium salt of the B form [58] of DNA, and these extensive data introduced the possibility of refining the structure by a least-squares method in parallel with the more usual refinement methods for fibrous structures. Roughly 300 intensities are available, 279
CHARLES
L. COULTER
and these should be sufficient to define about thirty parameters fairly accurately. Lithium (B) DNA crystallizes in space group P2,2,2,, and there are two doubly stranded tenfold helical molecules per cell, The helix axes lie along the c axis of the crystal, coincident with one of the 2, axes. The second strand is related to the first by a twofold axis along b, perpendicular to the helix axis. Since the helix axis is along a 2, screw axis, there must be an even number of nucleotides in length c, and this number is easily deduced as ten. The angular turn per nucleotide and the translation per nucleotide are thus fixed at 36O and c/10. The simplest description of this system consists of assigning six parameters to each of the three chemically rigid groups which, when linked and helically repeated, make up DNA. The structures of the phosphate, the sugar, and the base pairs can be predicted from work on these groups in smaller molecules [59]. The base pairs in the crystal are disordered, since the sugar-phosphate backbones are the repeat units in the chosen unit cell, and a representative base pair is used. We fix the chemical structure of each group by assigning position parameters ui,, vij,and wij to theith atom of thejthrigid group in a known, arbitrary coordinate system. Following Frazer, Duncan, and Collar [SO], three angular parameters d,, I,$ and &, and three translational parameters aj, hi, and cj are assigned to orient the groups properly in the cell. The U, are transformed to Y, = (qj, yij, zij) by rotating by 0, about the w axis and by #ii and ti successively about the carried positions of the v and u axes, and translating these new axes by f = (aj, bj, cj). Thus
+
Y, = (ABC),U, tj, (2.8) where (ABC), is the rotation matrix corresponding to the above rotations; Eq. (2.9) is the matrix for rotation about w,and the A, and Bj matrices are similar.
Having fixed the first phosphate, sugar, and representative base pair in their initial positions in the cell, the helical repetition can be expressed as another rotation matrix H, and translation vector T, where COB
7, =
((
( n - l)p, -sin ( n - 1 ) ~0 - 1 ) ~ cos (n - 1)rp 0 0
:
n - 1). 280
),
n
=
1 , 2 ) . . . 10.
(2.10)
(2.11)
PROBLEMS A N D METHODS I N X-RAY CRYSTALLOGRAPHY
Here y is the turn angle per nucleotide (36O) and 7 is the translation per nucleotide (c/lO). The Cartesian coordinates for the atoms in one strand of the Li(B) DNA helix can thus be represented by Eq. (2.12):
+
(2.12) n = 1, 2, . . . 10, Y, = H,(RjIJij + tj) T,, where Rj = (ABC),.The second strand is related to the first by a dyad along b, which implies that for every atom a t x,y, 2 , there is an atom in the other strand a t -x, y, - 2 ; the second molecule in the unit cell is related to the first one by means of a translation which introduces one additional parameter [58].This parameter can be incorporated in the structure factor expression quite simply. Thus the specification of the positions of the three groups in one strand of the helix suffices to define the entire structure. Having expressed the DNA structure in terms of the eighteen parameters defining the positions of the three rigid groups and one translation parameter, we can now set up a normal least-squares refinement of these parameters. The derivatives are calculated using the method of Eq. (2.13):
This procedure has been programmed for the IBM 7090, and is being used to refine the structure of Li(B) DNA [57]. 2.5 Patterson Methods
Buerger [I51 has published a book on the methods in use to solve crystal structures from the Patterson function. Superposition programs are the most widely used. These represent attempts to derive the original point set from the vector set by transposing the Patterson function origin and examining peak coincidences between the old and the new orientations. Superposition [61] and image seeking [I51 programs are computationally straightforward, although some care must be used in storing the maps. Rossmann and Blow [62] have developed a very interesting Patterson technique for use when the asymmetric unit of the crystal contains identical subunits-a common situation with proteins. The Patterson function is rotated about its origin until one gets maximum peak coincidence between the rotated and the original maps within a specified volume around the origin. The volume is chosen so as to contain most of the vectors due to the individual subunits and very few of the cross vectors between subunits. The rotation is done on the Patterson coefficients rather than on the summed function, by making use of an interference function which tends to be small except a t points of high peak coincidence. The maximum in this 281
CHARLES L. COULTER
function indicates the axis of rotation and the amount of rotation needed to bring the two subunits into rotational coincidence. Operations in reciprocal space, on the actual structure amplitudes, rather than in Patterson or Fourier space, are computationally more convenient and often give new insight into the methods under study. Not all problems, however, are suitable for such treatment. 2.6 Phase Problem2
Efforts to directly derive an approximate set of phases corresponding to a set of structure amplitudes for a given crystal structure began not long after structural crystallography itself. Lipson and Cochran [5] and Bucrger [39] summarize the work in this field, and the monograph of Hauptman and Karle [G3]and the book by Woolfson [64] provide more detailed analyses of particular approaches to this problem. Equation (2.14), derived by Sayre [65],is representative of the types of calculations required. For a structure composed of identical resolved atoms, Sayre showed that Eq. (2.14) should hold: 1
f
P(hkZ) = - C h‘
F(h’k’2’)P ( h - h’, k
-
k’,1 - Z’).
(2.14)
k‘ 1’
Here f is the atomic scattering factor of an atom for this hkl, and g is the scattering factor that the atom would have if the electron density were squared. This equation relates a structure factor to all others or to all in one zone. The sum on the right-hand side or a summation very similar to this one must be evaluated in nearly all phase determination methods. Some programs for direct crystal structure solution are available (e.g., [MI), but no one method has yet been found to be generally applicable. The probability method of Hauptman and Karle has been the most successful direct approach to solving centrosymmetric crystal structures. General use of this procedure has been limited, principally because of the necessity for large-scale computer use and programming a t an early stage of the analysis. An alternate somewhat simpler procedure has now been developed which seems very reliable and avoids the earlier problems. A recent rather difficult structure solved with this approach was that of cyclo (hexaglycyl) heinidydrate [67]. Hauptman and Karle are currently trying to extend their methods to rioncentrosymmetric structures. Direct methods are certainly a region of computational interest, but extensive discussion of them requires a sound, prior understanding of crystallography and other solution methods. 2 See Karle, J., The determination of phase angles, in Advances in Structure Research by Diffraction Methods, VoZ. I (R. Brill, ed.) Interscience, New York, 1964.
282
PROBLEMS AND METHODS IN X-RAY CRYSTALLOGRAPHY
3. Available Programs
Over the years, crystallographers throughout the world have coded and used crystallographic programs in conjunction with nearly every computer u hich has ever existed. To help avoid needless duplication
of effort, the International Union of Crystallography recently compiled a world list of crystallographic computer programs [ 4 2 ] . This publication lists programs, written for the various commercially available machines, which have been used by the contributing people or laboratories. The layout and description are similar to those of the SHARE system. b u t the programs are obtained from the authors. Unfortunately, most of the calculations of structure analysis are complex enough to make thorough checking of program accuracy very difficult; thus crystallographers still tend to code the important, routinely used programs theinselves rather than rely on distributed programs. This is not a good thing for the science, since the end product is a large numher of poorly tested programs rather than a few versatile programs which have been exhaustively tested. There are some notable exceptions t o this, a good example being the least-squares program of Busing and Levy for the IBJl 704. The main prerequisite for a successful program package seems to be a very complete and well-written description accompanying a well-tested program written by a recognized authority. These conditions hold for all fields, of course, and are rarely met. Some comments should be inade here on the timing of crystallographic computer operations. Times obviously depend upon the problem size arid the machine size, and to a lesser extent on machine speed. Nearly all crystallographic problems can be conveniently done on a 32K memory machine. With smaller machines the program times increase sharply and the versatility of the program is usually lower. The programs described a t the Glasgow Computing Conference in 1960 [43] included time estimates. As a very general estimate of current program times, i t may be assumed that a problem with seven to ten atoms in the asymmetric unit of a monoclinic crystal would require ten minutes for a structure factor calculation and a Fourier synthesis, and about twenty minutes for one cycle of least-squares on a 32K machine of the IBM 7090 class. A minimum of three Fourier synthesis calculations and the equivalent of four cycles of full matrix least squares are required for the analysis of most crystal structures. On slower machines, and especially on smaller machines, these times would go up unless less general programs were used. A careful job of determining and refining a crystal
283
CHARLES L. COULTER
structure of intermediate difficulty would consume about five hours of IBM 7090 time, all told. There has not been enough experience with the newest generation of computers, siich as STRETCH and ATLAS,to evaluate their performance on crystallographic problems. As the preceding paragraphs indicate, the more commonly available machines of the 7090 class are adequate for today’s problems, both in speed and memory capacity. But it may well be that the availability of more advanced computers, especially when coupled with automatic data taking, will lead to substantial further advances. In conclusion.1 should like to thank Dr. S. H. Peiser for his most helpful comments and criticism. REFERENCES
1. Bcnnett, J. M., and Kendrew, J. C., The computation of Fourier syntheses with a digital electronic calculating machine. Acta Cryst. 5, 109-116 (1952). 2. Ahmed, F. R., and Cruickshank, D. W. J., Crystallographic calculations on the Mark I1 computer. Acta Cryst. 6, 765-769 (1953). 3. Sparks, R . A., Prosen, R . J., Kruse, F. H., and Trueblood, K. N., Crystallographic calculations on tho high-speed digital computer SWAC. Acta Cryst. 9, 350-358 (1956). 4. Ordway, F., Crystallographic calculations by high speed digital computers, in Computing Methods and the Phase Problem in X - r a y Crystal Analysis (R. Pepinsky, ed.), pp. 148-165. Penn. State Univ. Press, University Park, Pennsylvania, 1952. 5. Lipson, H., and Cochran, W., The Determination of Crystal Structures. G. Bell and Sons, London, 1953. 6. Pepinsky, R., X-RAC and S-FAC, electronic analogue computers for X-ray crystal analysis, in C0mputin.g Methods and the Phase Problem in X - r a y Crystal Analysis (R. Pepinsky, ed.), pp. 167-390. Penn. State Univ. Press, University Park, Pennsylvania, 1952. 7. James, R. W., The Optical Principles of the DiSfrction of X-rays. G. Bell and Sons, London, 1958. 8. Lipson, H., and Taylor, C. A,, Fourier Transforms and X-ray Diffraction. G. Bell and Sons, London, 1958. 9. Cruickshank, D. W. J., Pilling, D. E., Bujosa, A., Lovell, F. M., and Truter. M. R., Crystallographic Calculations on the Ferranti Pegasus and Mark I Computers, Paper 6, in Computing Methods and the Phase Problem (R. Pepinsky, J. M. Robertson, and J. C. Speakman, eds.), Pergamon Press, New York, 1961. 10. Abrahanls, S. C., Automation in X-ray crystallography. Chem. and Eng. News 41, No. 22, pp. 108-116 (1963). 11. Arndt, U. W., and Phillips, D. C., The linear diffractometer. Acta Cryet, 14, 807-818 1961). 12. Cole, H., Okaya, Y., and Chambers, F. W., Computer controlled X-ray diffractometer. IBM Research Paper No. RC-890, 1963.
204
PROBLEMS AND METHODS IN X-RAY CRYSTALLOGRAPHY
13. International Tables for' X - r a y Crystallography, Vol. I I , Mathematical Tables (J. S . Kasper and K. Lonsdale, eds.). The Kynoch Press, Birmingham, 1959. 14. Patterson, A. L., A direct method for the determination of the components of interatomic distances in crystals. 2.Krist. 90, 517-542 (1935). 15. Buerger, M. J., Vector Space. Wiley, New York, 1959. 16. Harker, D., The application of the three-dimensional Patterson method and the crystal structures of proustite and pyrargyrite. J . Chem. Phys. 4, 381-390 (1936). 17. Trueblood, K. N., Horn, P., and Luzzati, V., The crystal structure of calcium thymidylatc. Acta Cryst. 14, 965-982 (1961). 18. Beevers;C. A., and Robertson, J. H., Interpretation of the Patterson synthesis. Acta Cryst. 3, 164 (1950). 19. Hodgkin, D. C., Kamper, J., Lindsey, J., Mackay, M., Pickworth, J., Robertson, J. H., Shoemaker, C. B., White, J. G., Prosen, R. J., andTrueblood, K. N., The structure of vitamin B,, I. An outline of the crystallographic investigation of vitamin El*. Proc. Roy. SOC.(London) A242, 228-263 (1957). 20. Shoemaker, C. B., and Shoemaker, D. P., The crystal structure of the 6 phase, Mo-Ni. Acta Cryst. 16, 997-1009 (1963). 21. Jensen, L. H., Refinement of the structure of N,N'-Hexamethylene-bispropionamide. Acta Cryst. 15, 433-440 (1962). 22. Booth, A. D., A differential Fourier method for refining atomic parameters in crystal structure analysis. Trans. Faraday SOC.42, 444-448 (1946). 23. Cruickshank, D. W. J., On the relation between Fourier and least-squares methods of structure determination. Acta Cryst. 5, 51 1-518 (1952). 24. Hughes, E. W . , The crystal structure of melamine. J . Am. Chem. SOC.63, 1737-1752 (1941). 25. Sparks, R. A . , Comparison of various least-squares refinement techniques, Paper 17, in Computing Methods and the Phase Problem (R. Pepinsky, J. M. Robertson, and J. C. Speakman, eds.). Pergamon Press, New York, 1961. 26. Robertson, J. M., Organic Crystals and Molecules. Cornell Univ. Press, Ithaca, New York, 1953. 27. International Tables for X - r a y Crystallography, Vol. I , Symmetry Groups (N. F. M. Henry and K. Lonsdale, eds.). The Kynoch Press, Birmingham, 1952. 28. Eichorn, E. L., Refinements of least-squares and differential synthesis algorithms. Acta Cryst. 15, 1215-1219 (1962). 29. Rollett, J. S., and Davies, D. R., The calculation of structure factors for centrosymmetrical monoclinic systems with anisotropic atom vibration. Acta Cryst. 8, 125-128 (1955). 30. Trueblood, K. N., Symmetry transformations of general anisotropic temperature factors. Acta Cryst. 9, 359-361 (1956). 31. Levy, H. A., Symmetry relations among coefficients of the anisotropic temperature factor. Acta Cryst. 9, 679 (1956). 32. Busing, W. R., and Levy, H. A., Least-squares refinement programs for the IBM 704, Paper 13, incomputing Methods and the Phase Problem (R. Pepinsky, J. M. Robertson, and J. C. Speakman, eds.). Pergamon Press, New York, 1961. 33. Cruickshank, D. W. J. The Analysis of the anisotropic thermal motion of molecules in crystals. Acta Cryst. 9, 754-756 (1956).
285
CHARLES L. COULTER
34. Cruickshank, D. W. J. Errors in bond lengths due to rotational oscillations of molecules. Acta Cryst. 9, 757-758 (1956). 35. Cruickshank, D. W. J. The determination of the anisotropic thermal motion of atoms in crystals. Acta Cryst. 9, 747-753 (1956). 36. Cox, E. C., Cruickshank, D. W. J. and Smith, J. A. S. The crystal structure of benzene at -3’. Proc. Roy. Soc. (London)A247, 1-21 (1958). 37. Coulter, C. L., and Trueblood, K . N., The crystal structure of the diolefin of [2.2] paracyclophane. Acta Cryst. 16, 667-676 (1963). 38. Marsh, R. E., Structure refinement, 1963, Abstract G4, VI Intl. Congress, Intl. Union of Crystallography, Rome [Acta Cryst. 16, A2 (1963)l. 39. Buerger, M. J., Crystal Structure Analy.sis. Wilny, New York, 1960. 40. Kendrcw, J. C., The three-dimensional structure of a protein molecule. Sci. A m . 205, No. 6, 96-110 (1961). 41. Watson, H. C., Kendrew, J. C., Coulter, C. L., Branden, C. I., Phillips, D. C., and Blake, C., Progress with t,he 1.4 A resolution myoglobin structure datermination. Abstract 7.20, VI Intl. Congress, Intl. Union of Crystallography, Rome [Acta Cryst. 16, A81 (1963)l. 42. Shoemaker, D. P. (ed.) .‘Inti. Union of Crystallography World List of Computer Programs,” first edition Sept., 1962. [See Acta Cryst. 15, 1190 (1962) for information on availability.] 43. Sly, W. G., and Shoemaker, D. P., MIFR1: a two- and three-dimensional crystallographic Fourier summation program for the I B M 704, Paper 11, in Computing Methods and the Phase Prohlem (R. Pcpinsky, J. M. Robertson, and J. C. Speakman, ods.). Pergamon Press, New York, 1961. 44. Vand, V., and Pepinsky, R., W c i g h h g of Fourier scrics for improvement of efficiency of‘ convergence in crystal analysis: space group P I . Actu Cryst. 10, 563-667 (1957). 45. Sim, G. A,, The distribution of phase angles for structures containing heavy atoms. Acta Crlyst. 12, 813-815 (1959). 46. Criiickshank, 13. W. J., and Sparks, R. A., Experimental and theoretical determinations of bond lengths in naphthalene, anthracene and other hydrocarbons. Proc. Roy. Soc. (London)A258, 270-285 (1960). 47. Cochran, W., The Fourier method of crystal structure analysis. Nature 161, 765 (1948). 48. Sparks, K . A., Ph.D. thesis, University of California, Los Angeles, California, 1958. 49. Whittaker, E. T., and Robinson, G., T h e Calculus of Observations, Chapter VIII, 4th ed. Blackie, London, 1944. 50. Shoemaker, D. P., Donohue, J., Shomakcr, V., and Corey, R. B., Tho Crystal Stmctiire of L,-threonine. J . Am. Chem. Soc. 72, 2328-2349 (1950). 51. Hodgson, L. I., and Itollctt,, J . S., An acceleration device and standard deviation estimates in least-squares rcfinemcnts of crystal structures. Acta Cryst. 16, 329-335 (1963). 52. Rossmann, M. G., Jacobson, R . A., Hirshfeld, F. L., and Lipscomb, W. N., An account of some computing experiences. Acta Cryst. 12, 530-535 (1959). 53. Rollett, J. S., Least-squares refinement with chemical and mechanical constraints. Abstract S1.24, VI Intl. Congress, Intl. Union of Crystallography, Rome [Acta Cryst. 16, A175 (1963)l. 54. Waser, J., Least-squares with subsidiary conditions. Acta Cryst. 16, 1091-1095 (1963).
286
PROBLEMS AND METHODS IN X-RAY CRYSTALLOGRAPHY
55. Nordman, C. E., and Nakatsu, K., Interpretation of the Patterson function of cryst,als containing a known molecular fragment,. J . A m . C'heva. SOC.85, 353--355 ( 1 963). 56. Sparks, K. A . , A survey and evaluation of exhaustive search techniques. Ahstxact, S1.21, V I Intl. Congress, Intl. Union of Crystallography, Home [ A c f a Cryst. 16, A1 74 (1963)j. 57. Arnott, S., and Coiilter, C. L., Rigid hotly 1east-sqiiarc:srrfincmrnt of nucleic acids. Abstract S1.27, VI Intl. Congress, Intl. Union of Crystallography, Rome [Actu Cryst. 16, A175 (1963)l. 58. I,nngridgc, I<., Marvin, I). A., Seeds, \V. E., Wilson, H. R., Hooper, C. W., Wilkins, M. H. F., and Harnilton, L. D., The molecular configuration of tic:oxyriboniiclaio acid. J . Alol. Uiol. 2, 38-64 (1960). 59. Spencer, M., The stereochcmist,ry of deoxyribonucleic acid. Actu Cryst. 12, 59-80 (1959). 60. Fmeor, 1%. A . , Duncan, W. .J., and Collar, A. R., EZementury Matrices, pp. 24ti-256. Cambridge, Univ. Press, London and New York, 1938. G 1 . Shocmakcr, D. P., Boricaii, R. E., Donohna, J., and Lu, C., The crystal strrictrire of DL-serine. Acta Cryst. 6, 241-256 (1953). 62. Rossinann, M. G., and Blow, 1). M. l'hc detection of sub-units within the cryst,allographic asymmetric unit. Acta Cryst. 15, 24-31 (1962). 63. Harcptman, H., and Karle, J., Solutiori of tire l'iiuse Problem. I . T h e Centrosynirnetric Or!gstal. American Crystallographic Assoc. Monograph No. 3, The Lctter Shop, Wilmington, Del., 19.53. 64. Woolfson, M. M., D%rect llletliotls irk Cr!/stallogruph?i. Oxford Univ. Press, 1,ontlon and Ncw York, 1961. 65. Sayrt,, D., The squaring method: a new method for phase determinat,ion. Actn. Cryst. 5, 60-65 (1952). 66. Woolfson, M. M., and Sparks, R. A., Computer Uetermiiwtiort of Crystal Structures. 1 R N Math. and Applns. Dcpartmcnt, New York, 1961. 67. Tiarle, I . L., and Karle, J . , An applicatlion of a nr\v phase determination procedure to the structure of cyclo (hexaglycyl) hernihydrate. Acta Cryst. 16, 969-975 (1963).
287
This Page Intentionally Left Blank
Digital Computers in Nuclear Reactor Design ELIZABETH CUTHILL David Taylor Model Basin Washingion, D.C.
1 . Introduction . 2. Development and Classification of Nuclear Reactor Codes 3. Neutron Transport Equations . 4. Solution of the Neutron Transport Problem 4.1 Diffusion Theory . 4.2 Transport Theory . 4.3 Monte Carlo . 5. Other Calculations References .
.
.
,
289
. .
306 309 319 323 326 326
. 291 . 291 . . . .
1. Introduction
The application of high-speed digital computers to nuclear reactor calculations dates from the beginning of the strikingly parallel development of both high-speed digital computers and nuclear chain reactors [ I , 21. Certain individuals, von Neumann, for example, played major roles in both developments. Digital computers have become a basic tool in nuclear reactor design. Their use has greatly accelerated progress in this field. Calculations performed range from those arising in fundamental studies of nuclear physics to applied studies of structural stresses, from calculations required to establish a preliminary design to calculations required for construction details, from reactor stability studies involving times of the order of fractions of a second t o lifetime behavior studies involving times of the order of thousands of hours. To quote from a recent review by A. Henry [ 3 ] ,“A new power reactor can be built without using a computing machine (several were), but it will be either a very dangerous device or one very much over designed.” I n a paper presented at the Second Geneva Conference on the Peaceful Uses of Atomic Energy, E. Gelbard stated that in the design of watermoderated reactors, for example, the use of digital computers allowed the nuclear reactor designer to “examine a great many different 289
ELIZABETH CUTHILL
designs quickly, optimize pararncters accurately, and appraise advanced design concepts with some confidence” [a]. Some idea of the broad scope of computer usage in the nuclear reactor field can be obtained from a perusal of the program and published proceedings of the Seminar on Codes for Reactor Computations sponsored by the International Atomic Energy Agency 151, a five-day conference a t which 20 nations were represented. A glance at the transactions of any of the recent meetings of the American Nuclear Society will also show that a significant percentage of the papers presented consist of the descriptions of high-speed digital computer programs, the results of digital computations, and research on computational methods. A symposium on the Role of Computers in Nuclear Power Plants was held in February 1962 at which the use of digital computing systems for monitoring and evaluating operating data and in controlling nuclear power plants was discussed [GI. Nearly every paper presented at an Aincricari Nuclear Society meeting on nuclear performance of power reactor cores held in Seytemher 1W3 contains some mention of results of calculations performed on high-speed digital computing systems 171. An entire session at that meeting was devoted to a comparison of observed performance with calculated performance. I n general, in view of the many approximations which must he made in such calculations, this agreement is surprisingly good, within experimental error in many cases. T n order to discuss the use of computers in nuclear reactor design, a broad range of fields must be covered including the basic physical processes occurring in a reactor, the development of mathematical relationships between basic physical parameters associated with these processes, and the behavior of actual reacting systems, the development of useful methods for the numerical determination of the behavior of such systems, and t,he preparation of computer programs making use of these methods. I n this chapter a brief survey of some aspects of these problems will he covered. Ry 1952 the comniercid availability of UNIVA(* I had given considerable impetus to nuclear codes development arid interchange of such codes among installations. This development and interchange was considerably stimnlated by the widespread use by 1960 of the IBM 704 for nuclear reactor calculations. At the present time one of the outstariding features of this codes development work is the nearly universal as the basic programming language for nuclear reactor use of FORTRAN codes. Many different types of nuclear reactors have been designed to satisfy a variety of needs: power production, research, isotope produc290
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
tion, for example. Well over 500 digital computer codes have been written to assist in various aspects of this design work. Section 2 contains a survey of types of problems solved by nuclear codes, of nuclear codes catalogues, and of developments leading to the establishment of a nuclear codes center a t Argonne National Laboratory. Sections 3 and 4 are concerned with some aspects of the basic problem of determining the neutron density distribution in a nuclear reactor. It is the neutron population of a nuclear chain reactor which supports the fission reaction. I n turn it is the fission reaction which is the primary source of energy production in such a reactor. Therefore, in order to determine the properties and behavior of a nuclear reactor, detailed information concerning the neutron population is required. The neutron population can be described in terms of a neutron density function which satisfies the basic neutron transport equation. It is this basic equation and its relationship to various widely used approximations which will be considered in Section 3. Section 4 will be concerned with methods and computer programs which have been developed to obtain approximate solutions to the basic density distribution problem. Of the many computer codes which have been written and which are in use for reactor calculations, those involving solution of the neutron transport problem have in the past accounted for more usage of computer time than any others. This has been a stimulus to the development of improved methods and codes for the solution of this problem, and has stimulated many developments in numerical analysis [&lo]. Section 5 contains a survey of some of the programs which have been written for the solution of problems associated with other aspects of reactor design such as heat flow, stress analysis, fluid flow, economics, etc. 2. Development and Classification of Nuclear Reactor Codes
I n order to orient the subsequent discussion on the cataloging of nuclear reactor codes, a brief statement of the physical background is given here [11-131. The functioning of a nuclear reactor is based on the fission process. The fission reaction occiirs when particular atomic nuclei (e.g., U233, P5, or Pu239)on capturing a neutron split into smaller nuclei while releasing both energy and additional neutrons. Thus, neutrons induce the basic energy -liberating reaction and are themselves produced by it, making self-sustaining chain reactions possible. I n addition to the fission reaction, other neutron interactions occur 291
ELIZABETH CUTHILL
with the nuclei present in a reactor. These are, in general, nonproductive, in the sense that they do not contribute a significant amount of energy, nor do they result in the release of additional neutrons. A reaction of this type is the radiative capture of neutrons resulting in the emission of y-rays. Other important interactions are elastic and inelastic scattering of neutrons by the various nuclei of the system. Since the rate a t which heat is produced in a nuclear reactor depends on the number of fission reactions occurring, and since the occurrence of fission reactions depends on the distribution of neutrons in the system, it is a goal of reactor analysis to trace the life history of the neutrons df a reacting system and of the neutrons they produce through successive generations. The ratio of the expected number of neutrons in successive generations is called the multiplication factor. Since the mean lifetime of neutrons is very short (less than 0.1 msec usually), the multiplication factor must be very nearly unity for a reactor to be a stable source of power. When the multiplication factor is unity, the reactor is said to be critical. A small fraction of the fission fragments remain in an excited state for times ranging from sec to 1 min. before releasing neutrons. The existence of these delayed neutrons helps in stabilizing reactors but complicates the problem of calculating the time-dependent behavior of the neutron population. There are many configurations of materials which can be used to produce chain-reacting systems. The determination of such configurations and of the neutron density distribution in such systems depends on the nuclear properties of the particular materials used and on their distribution. Primary components of a reactor include its core, reflector, and shield. The core contains the fuel (fissionable material), coolant (any of a variety of fluids), moderator (any light element that has a low probability of absorbing neutrons), control elements, structural materials, etc. The reflector surrounds the core. Its purpose is to scatter neutrons back into the core, The entire configuration is surrounded by a shield to absorb hazardous radiation. The neutron transport equation is used as the basis of the description of the distribution of neutrons in a nuclear reactor as a function of position, momentum, and time. This relation equates the rate a t which neutrons are introduced into a neighborhood of any location in the reactor to the rate a t which neutrons are lost from that neighborhood. The processes considered include the motion of neutrons as well as the fission reaction, the scattering reaction, and the various nonproductive reactions. The course of these processes depends on properties of the materials which make up the nuclear reactor and on their geometric distribution. 292
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
To solve directly the transport equation in seven independent variables for nearly any reactor configuration of physical interest is not a practical undertaking with present day computers and methods. I n order to treat problems involving the various types of reactor configurations, a variety of approximations are made to the transport equation and the parameters which enter into i t to take account of the physical processes which are occurring. The diffusion approximation to the transport equation is the one most widely used, Much of nuclear reactor analysis is concerned with developing and applying methods which yield results adequate for various engineering needs. A few of these will be discussed in Section 3. Usually it is assumed that changes in the materials with time-fuel depletion, fission product buildup, etc., are sufficiently slow so that fuel depletion calculations (also called burnup calculations) can be performed in cycles. The solution of the neutron transport problem is made assuming that the materials are fixed, and then the change of materials is calculated on the basis of the neutron population given by the transport solution. Nuclear reactors are classified in a variety of ways. They are classified according to their purpose: research, power production, isotope production, etc. They are also called heterogeneous or homogeneous according to the geometric arrangement of fuel and moderator in the core. They are classified as fast, intermediate, or thermal according to the average kinetic energy of the neutrons primarily responsible for the fission reaction. In a fast reactor, for example, the bulk of the neutrons causing the fission reaction have energies in excess of 0.1 Mev (million electron volts), while in a thermal reactor, the bulk of the neutrons responsible for the fission reaction are approximately in thermal equilibrium with their surroundings, i.e., they have energies of the order of 0.025 to 0.1 ev, depending on the temperature of the reacting system. Neutrons produced by the fission reaction have high energies, usually of the order of 1 to 2 MeV. Therefore, in a thermal reactor, and also to some extent in an intermediate reactor, the presence of a moderator is needed t o degrade the neutron’s energy without removing too many of them from the system. Materials consisting of elements of low mass number such as water and carbon are used for this purpose. They are effective in reducing the neutron’s energy through scattering collisions. Cost and safety considerations are paramount in the design of a nuclear reactor. Explosion hazards, radiation hazards, accidents must be considered. I n the over-all design of a reactor, problems must be solved in many areas including nuclear physics, heat transfer, radiation 293
ELIZABETH CUTHILL
shielding, mechanical and nuclear stability, chemistry, metallurgy, mechanical engineering, etc. Computer programs have been written to attack problems in most of these areas. The classification arid cataloging of digital coniputer codes which have been developed for nuclear calculations is a considerable undertaking. Sets of‘ programs have already been developed for several generations of digital computing machines. Many digital computer programs in current use are direct descendants of early programs. Many of these were poorly documented and are no longer in use. The early programs were written in machine code for nearly unique machines. is becoming a nearly universal programming language Today FORTRAN for nuclear reactor codes, a t least in the United States. Improvements in this language, the relative case of programming in this language, aiid compilers for nearly every commercially the availability of FORTRAN available high-speed digital computer have all contributed to this trend. The first bibliography of digital computer programs for nuclear reactor calculations wits published i n 1955 by A. Radkowsky and R. Brodsky [ 141. This bibliography contains summaries of 10%codes t,hen in use or in preparation. Many of these codes are still of interest, since many codes in current use are their direct descendents. Computers for which prograins arc listed in this hibliography include CPC, UNIVAC I, 650, 701, 702, SEA(’,AVIDAC(GEORGE),NAREC, and NORC.The categories used as the basis of the classification of nuclear codes were Reactor and Niiclear Physics, Reactor Survey, Shielding, Reactor Kinetics, Reactor Burnout, Reactor Engineering, and Miscellaneous. Since later classification systems were based on this one, we shall describe each of these categories in more detail. It can be noted that problems associated with the design of the nuclear system were subdivided into a number of categories, while those associated with the design of other systems-heat removal, structural, mechanical, etc. are all lumped into a single category-Reactor Engineering. This is a reflection of the past emphasis in t,he use of computers on problems associated with nuclear design. There has been increasing work in recent years on the development of computer progranis to treat the engineering aspects of reactor design. To return to this first bibliography of reactor codes, the category of Reactor aiid Nuclear Physics includes codes required for calculating the effects of fine structure in reactor lattices and for the determination of parameters for reactor survey calculations. For example, programs are described for the Monte Carlo calculation of the resonance escape probability (i.e., the probability that a neutron will be reduced in energy from fission to thermal without being lost to the system) [15], 294
DIGITAL COMPUTERS I N NUCLEAR REACTOR DESIGN
and for the calculation of a neutron slowing down distribution from which parameters for reactor survey calculations could be obtained [16]. Descendents of these programs are described in [I?’, 191, for example. Reactor Survey codes are based on the solution of the multigroup neutron diffusion equations in one, two, and three spatial dimensions. More than one-third of the programs are in this category. I n general, a criticality factor and the neutron flux distribution for a given reactor configuration are calculated. Twenty programs are listed for solving the one-dimensional problem, three for the two-dimensional problem, and one for the three-dimensional problem. Typical one-dimensional programs are described in [ZO]. The programs for solving the twoI dimensional problem are described in [21-231. These were UNIVAC programs for solving five-point difference approximations t o the diffusion problem over several thousand network points. They were ambiI. More will be said about these codes and tious programs for UNIVAC their successors in Section 4. The three-dimensional program was an experimental one written for SEAC[24]. The codes designed priniarily for shielding calculations are for the calculation of the neutron and y-ray distribution in and around nuclear reactor shields, In some cases these calculationswere based on the numerical solution of approximations to the neutron transport equation, [25] in others on Monte Carlo methods. Under the Reactor Kinetics category, programs are included which were developed to determine the shortterm kinetic response of nuclear reactors to specified changes such as would occur during the operation of a reactor. The Reactor Burnup category includes programs for the calculation of the long-term fuel depletion and fission product buildup. I n 1956, shortly after the appearance of the nuclear codes bibliography mentioned above [ 1 4 ] , the Nuclear Codes Group was organized to facilitate the interchange, on a nationwide basis, of information on codes for nuclear reactor calculations. The Nuclear Codes Group had members from more than forty installations. Its semi-annual meetings were held concurrently with meetings of the American Nuclear Society. The group published a quarterly newsletter containing information on the status of nuclear codes in the various installations and summaries of nuclear codes. This newsletter was compiled, edited, and distributed by the AEC Computing and Applied Mathematics Center a t New York University [26]. I n 1959, Nather and Sangren, with assistance from members of the Nuclear Codes Group, collected and published a set of abstracts of nuclear codes then in preparation or in use 1.273. Additional code 295
ELIZABETH CUTHILL
abstracts were published in 1960 [28]. Nearly 300 code abstracts were included. Computers now included the 704, 705, 709, 1103, NORC, LARC,MERCURY,and BESK.The categories for classification DATATROW, are those of Radkowsky and Brodsky [ l a ] ,except that Reactor Survey and Shielding are omitted as categories and Group Diffusion, Monte Carlo, and Transport are introduced. The Transport category comprises codes which can be used to obtain approximate solutions to the neutron transport problem which account more accurately for the angular dependence of the neutron flux than diffusion theory. Reactor Survey and Shielding programs are now classified according t o the basis of the calculation of neutron distribution, and can end up under any of the new categories. I n 1961, a summary listing of nearly 400 of the most useful codes was published [29] indicating the originating company, the computing machine for which the code was written, and published report numbers where these were available. An addendum to this list was published in 1962 [30] bringing the total to about 500 codes. A cumulative summary is given in Table I of the number of codes in the various categories included in these lists. TABLEI NUMBER OF NUCLEAR REACTOR CODESLISTEDIN VARIOUS CATEGORIES ~~
Burnup Engineering Group Diffusion Kinetics Miscellaneous Monte Carlo Physics Transport
1959 [27]
1960 [28]
1961 [29]
1902 1301
17
22 39 85 25 16 26 73 40
32 48 92 28 22 30 85 50
37 72
28 73 23 5 20 38 32
105 34 22 33 121 68
I n 1959 the Nuclear Codes Group petitioned the American Nuclear Society for the organization of a Division of Reactor Mathematics and Computations. This division of the American Nuclear Society has over 900 members today. A divisional codes center was set up a t the Argonne National Laboratory. This codes center is maintained by that laboratory with the cooperation of the computer manufacturers. The Argonne Code Center maintains files of code abstracts and serves as a n information exchange center on reactor codes. To date the center has distributed six sets of code abstracts describing a total of 130 codes in current use 296
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
[31]. These abstracts are also published in Nuclear Science and Engineering, the monthly journal of the American Nuclear Society. The computers represented include the IBM 704, 709, 7094, the Philco S-2000, and the CDC 1604. Other current sources of information on nuclear codes development are Nuclear Science Abstracts published by the Division of Technical Information of the U.S. Atomic Energy Commission and List of References on Nuclear Energy published by the International Atomic Energy Agency. The Radiation Shielding Information Center (RSIC) has recently been established a t Oak Ridge National Laboratory [31a] t o “collect, organize, evaluate, and disseminate information on radiation shielding.” A special section of this center is devoted to the collection, evaluation, and comparison of digital computer codes which are useful for shielding calculations. 3. Neutron Transport Equations
The calculation of the neutron distribution in a nuclear reactor is ultimately based on transport theory, which is essentially statistical in nature. Transport theory is an outgrowth of Boltzmann’s basic work on the kinetic theory of gases [32, 331. Neutron-neutron collisions can be neglected for systems considered in nuclear reactor design since there is such a small number of neutrons compared to the number of nuclei present a t any time in any region considered. On the other hand, the number of neutrons is assumed to be large enough so that only their average behavior need be taken into account, and their distribution is adequately described by a density function which is continuous for neutron beams and has a continuous time derivative. Additional physical assumptions made in neutron transport theory are described in [32, 34, 351. The neutron distribution can be described in terms of the neutron density function N ( r , E , 51,t ) , the probable number of neutrons a t position r with energy E traveling in direction 51 a t time t per unit volume, energy, and solid angle about SZ. If v is the neutron speed corresponding to energy E , then the neutron flux function @(r,E , 51, t ) can be defined as
@(r,E , 51, t ) vN(r, E , 51, t). (3.1) In defining neutron flux, other independent variables are often used, for example the velocity v in place of E and 51. I n place of E , lethargy defined as u = In E,IE may be used. Here E,, is usually chosen to be 297
ELIZABETH CUTHILL
a t least as great as the maximum energy considered in the system. I n any case, there are seven independent variables, three to specify position, one for time, and either three for velocity, or one for energy (or lethargy) and two for direction. An analytic formulation of the conservation law of neutron transport theory is given by the transport equation
D@ + a@ = S ,
(3.2)
where D@ represents the total derivative of @ in the direction of a, u represents the collision probability per unit distance in the direction a, so that the term u@ accounts for the loss of neutrons of energy E and direction S2 as a result of collisions. (A collision changes the direction of motion or the energy of a neutron.) S represents the source term and accounts for the gain in neutrons of energy E and direction S2 introduced by external sources, and also those emerging from interactions with nuclei. These neutrons include scattered ones as well as neutrons liberated as a result of the fission process. The total derivative in the direction Q can be written in the form
where v represents the neutron speed determined by energy E , and V represents the gradient operator. I n rectangular coordinates, (3.3) becomes
1 a@ a@ + Q -+Q,-, a@ D @ = - - - $52,v at ax Y ay
a@ az
where SZz, SZv, SZ, are the components of the unit vector S2 with respect to the coordinate axes. They are direction cosines with respect t o the x-,y- , x-axes, respectively. Table 2.1 of [36] contains representations of D@ in the commonly used coordinate systems. Various forms of the neutron transport equation are also discussed in [34-371. In general, the source term S of (3.2) will have the form
S(r, E , Q, t )
=
I 1 dt’
dE‘
d52‘ P(r, E‘ + E , Q‘
-+a, t’, t )
- @(r,E’, a‘,t - t ’ ) + Q(r, E , S2, t ) , where the integrations are over all energies, all directions, and all delay times t’. Here, /3 represents a “transference” function giving, a t time t , the probable number of neutrons per unit length which emerge with 298
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
energy E per unit energy and direction C Z per unit solid angle about SL as a result of interactions of a neutron of energy E' and direction of motion S2' a t time t - t' with nuclei in a neighborhood of r. Physical processes which contribute to the transference function include scattering and fission. Scattering collisions will in general change a neutron's direction and energy. A neutron which initiates a fission reaction will cause the release of a number of neutrons with varying energies and directions. Not all such neutrons are ejected promptly. The delay time t' is introduced to take account of this. Q represents any additional source term. The boundary condition usually imposed a t the outer boundary is zero incoming neutron flux, i.e.,
@(r, E , S2, t) = 0 for $2 * n < 0, (3.4) where n is a unit vector outward normal to the outer surface. To take advantage of symmetries in a reactor core, other conditions are often imposed on particular surfaces; these conditions include reflective boundary conditions @(r,E , S2, t ) = @(r,E ,
- SL, t )
(3.5)
and periodic conditions of the type @(r,E , SZ, t ) = @(Tr,E , S2, t )
where Tr represents either a translation of r or a rotation of r through a fixed angle. I n reactors of physical interest there are interfaces between regions containing different materials. The condition usually imposed at such interfaces is continuity of flux along each direction S2. The basic problem of neutron transport theory is the determination of the neutron flux @(r,E , 9, t ) which satisfies (3.2) within each material region and appropriate interface conditions between regions of differing materials, given specified boundary conditions, initial conditions, neutron sources, and functions u and for the materials present. If the fission source due to delayed neutrons is to be accounted for, either the neutron flux a t times earlier than the initial time can be specified or an appropriately modified external source term can be used. Existence and uniqueness theorems for the solution of various forms of the neutron transport problem are considered in [41-431. I n this connection, it should be pointed out that the transport equation is not self-adjoint unless simplifying assumptions are made, since neutron life-histories are not reversible. In general u and /3 are time-dependent functions, in fact, they are 299
ELIZABETH CUTHILL
functions of the neutron flux, so that the general time-dependent problem of determining the neutron flux is a nonlinear one. During the operation of a nuclear reactor, the energy being evolved will affect the physical properties of the materials of the system. For example, when a steady state has not been achieved, the evolution of energy will result in changes in temperatures which will be reflected in changes in collision probabilities. Thus the removal cross section u is a function of the energy evolved which in turn is a function of the neutron flux. As a result of the fission reaction, fuel nuclei are split, producing fission product nuclei; also various other nuclear reactions result in the modification of atomic nuclei; thus, the materials in a reactor will change with time. These long-term changes in materials imply corresponding changes in a and fl. A survey of methods in use for describing the time-dependent behavior of reactors is given by Henry [38]; see also [38a]. Stationary or time-independent problems are also of interest, For such problems, ci, fl, and Q are assumed to be independent of time t . One basic stationary problem is the determination of whether or not a given configuration of reactor materials can represent a critical reactor configuration in the absence of external sources. This is usually set up as an eigenvaliie problem as described below. The transference function fl is written as a sum of transference functions for scattering p8 and fissioning Pf,and the eigenvalue v is introduced as a constant multiplier of &,
B
= ?/4
+ 4.
Then the eigenfunction (the neutron flux function), which is nonnegative, and the corresponding eigenvalue v are the desired solutions. I n general, the reciprocal of this value of v will be real and will be larger than the real part of any other eigenvalue [34, 391. When v = 1 the reactor configuration will correspond to a critical one. Note that v times the number of fission neutrons actually produced would be required to maintain a critical configuration. If the configuration is not a critical one, it is usually required to determine a set of changes which will make it critical. Many approaches to the solution of various forms of the neutron transport problem have been made. Because of the complex geometric configurations and the variety of materials used in the design of nuclear reactors, those approaches which have been most generally useful have made use of numerical methods and high-speed computers. The number of independent variables is reduced whenever possible, and a variety of simplifying assumptions are made depending on the particular problems a t hand. A few of the widely used approximations for the energy and 300
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
angular dependence of the neutron flux will be described here. These are primarily for the linear problem in which a and p are assumed to be independent of the flux. Additional approximations are surveyed in [9, 34, 401. For computational purposes, the energy dependence is usually accounted for by using a multigroup approximation. Lethargy (i.e., In E,/E) is normally used in place of energy E in this approximation. The lethargy interval is replaced by a set of subintervals in each of which v, a , @, and /3 are assumed to be constant in lethargy. These quantities are then represented by vg,mg,Qq,and respectively, where g and g’ are indices representing lethargy intervals. Neutrons with lethargies in the 9th lethargy interval are said to constitute the 9th lethargy group, thus the subscripts g are also called group indices. I n general the index g = 1 will represent the group of lowest lethargy (highest energy) while g = G will represent the group of highest lethargy (lowest energy). For each group) an equation can be obtained of a form similar to (3.2): /3gg+g,
+ agQg= a,,
DgQg where
DgQg= -1 a@, ~
vg at
+Q
*
(3.7)
VQg.
The group equations are coupled through a source term of the form
where the integrations are over all delay times and all directions, and the yg are weights depending on the intervals chosen. These group equations can be obtained in various ways. One can simply integrate (3.2) over each lethargy interval and apply a mean value theorem. An estimate of the flux spectrum (i.e., the energy dependence of the flux) can be used profitably in this connection. A considerable amount of effort has gone into the development of methods and computer programs for the calculation of appropriately averaged group parameters yg,vg, etc. from physical data. See for example [18,19, such as as,,13~/+.~, 44-58]. Different methods are used to calculate group parameters for different energy ranges because of the difference in behavior of high energy neutrons as compared to low energy neutrons. When neutrons have much higher kinetic energies than the nuclei of the medium they are traversing, they will, in general, lose energy on collision. When neutrons are nearly in thermal equilibrium with the nuclei of the 301
ELIZABETH CUTHILL
medium, they may gain as well as lose energy on collision. For particular energy ranges and isotopes, resonance absorption of neutrons must be taken account of. Even though only a few energy groups are used, and despite the irregularities in the energy dependence of the physical quantities involved, good representations of the physical system can often be obtained 14, 511. More generally, the energy dependence can be treated by assuming that the energy dependence of the flux can be approximated by using a function of the form
Starting with specified functions C,(E), variational principles have been applied to obtain appropriate equations for the determination of the @g [59-651. The equations obtained are in general of the form (3.10)
where S , is appropriately defined. When the Gg ( E ) are chosen t o be functions which vanish outside of the gth energy interval, the simpler form of these equations given by (3.7) is obtained [37]. The moat widely used approximation for the angular dependence of the neutron flux is the diffusion theory approximation given by 1
@(r,E , 8,t ) = 47r [rp(r, E , t )
+ 3Q - J(r, E , t ) ] .
(3.11)
Here the scalar flux q~ is defined by
and the current J by
I
J(r, E , t ) = SZ @(r,E , 8,t ) dQ,
(3.13)
where the integrations are taken over all directions 8. When (3.11) is substituted into (3.2) and the resulting equation integrated over all directions assuming that u does not depend on 8, (3.14)
where
S, = J S d Q
302
(3.15)
DIGITAL COMPUTERS I N NUCLEAR REACTOR DESIGN
is obtained. When (3.3) is substituted into (3.2) and the resulting equation multiplied by S2 and integrated over all solid angles, one obtains 1 1 aJ - - - vp? - aJ s,, (3.16) o at 3 where
+
s, = j- sS2 dSZ.
(3.17)
Normally when the diffusion theory approximation is made, the source is assumed isotropic, so that S, = 0. If in addition aJ/at is neglected, and J is eliminated between the above equations, an equation of the form
_Iv at ?!?
=V
DVp? - up?
+ S,
(3.18)
is obtained where D is the diffusion coefficient 1 1 3 ~ .I n rectangular coordinates (3.18) becomes v at
ax
- clp +S,.
(3.19)
When the multigroup approximation is used to account for energy dependence and appropriate assumptions are made concerning the dependence of a and @ on S2, multigroup diffusion equations are obtained from (3.18) of the form (3.20)
where (3.21)
Detailed derivations and discussions of the range of applicability of various forms of the multigroup diffusion approximation are given in [11,12, 34, 66-69], for example. The boundary conditions used with the diffusion equations are usually either vanishing flux at outer (extrapolated) boundaries or appropriate symmetry conditions. At interfaces between different materials, conditions of continuity of scalar flux and current are imposed. The diffusion approximation can give a good representation of the physical problem, in general, when the mean free paths of the neutrons are short compared to other distances involved, the medium is weakly 303
ELIZABETH CUTHILL
absorbing, and positions of interest are not too near interfaces between materials with markedly different nuclear properties. By relaxing the interface conditions so that continuity of flux and current are not required and by modifying the diffusion coefficient, the diffusion approximation can be improved considerably for many problems; see [70, 711, for example. Another approach is presented in [72, 731. These references also contain comparisons of calculations based on diffusion theory with the more accurate transport theory. When more accurate neutron flux distributions are required than can be obtained using diffusion theory, in particular, when more accurate approximations to the angular dependence of the neutron flux are required, a variety of approaches can be taken. One approach is to subdivide the range of the angular variable so that only a discrete set of directions are considered. Approximations of this type are called discrete ordinate approximations [34, 751 or discrete S, approximations [36, 761. These methods will be considered further in the next section. An apparently distinct approach is to write the neutron flux as a polynomial in the components of 51 or as a set of polynomials in these components, using different polynomials for various ranges of Q, Such approximations are related to discrete ordinate approximations as indicated in [ 3 4 ] , [75], and [75a] when the discrete ordinate quadrature formulas used to approximate integrals over SZ are actually exact formulas for sets of polynomials. The most widely used polynomial approximation for angular dependence of @ is obtained by truncating the spherical harmonics expansion : (3.22)
+
+
where Pfl,m (Q) with m = - n, - n 1 . . ., n represents the 2% 1 spherical harmonics of order n, and for the corresponding moments, Fn,m
(r, E , t ) =
J ~ i ( 8, r , Q, t )
pa,m
(a)d f ~
(3.23)
the integration being taken over all directions. Apparently Jeans [77] first suggested using such an expansion for obtaining solutions to a transport problem in connection with problems of stellar radiation transfer. When moments for which n > N are neglected, this is called a P, approximation, and the ( N 1)2 equations obtained from the transport equation to be satisfied by the moments rpn,mare called P, equations. See [34] for a derivation of these equations. PN equations can also be derived using a variational principle, as indicated in [74]. One of the difficulties in the use of the P, approximation, except in the 304
+
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
simplest cases, is in the determination of appropriate interface and , ~ ,Diffusion theory is a boundary conditions for the moments y ~ ~[34]. P,approximation. Most applications of the P, equations have been made where sufficient symmetry could be taken account of so that @ could be considered to be a function of one component of 51 only. This is the case, for example, when the configuration treated can be assumed to have plane symmetry, i.e., every plane parallel t o the yz-plane has uniform composition. This is often called “slab geometry.” Only one spatial variable x and only one component of 51,namely, 51,’ need be considered. The component 51, is the direction cosine of 51 with respect to the x-direction and will be designated by p. The spherical harmonics expansion (3.22) reduces to an expansion in terms of Legendre polynomials : (3.24)
with (3.25)
+
For a P, approximation only N 1 moments vn have to be determined. A similar simplification occurs when the configuration considered has spherical symmetry. At interfaces between different materials and a t free surfaces the neutron flux can in general be discontinuous for p = 0. This suggests approximating @ by a function which can be discontinuous for p = 0. The double P, method (also called Yvon’s method [ 3 4 ] ) which has been applied t o problems with plane symmetry uses two separate expansions for the flux, one for p > 0 in terms of P, (2p - 1) and the other for p < 0 in terms of P,,42p 1). Generalization of the double P, method to treat problems in two spatial dimensions is considered in [78, 791. Expansions of 0 for problems with plane symmetry in terms of other sets of polynomials in p, have been considered. I n particular Chebyshev polynomials [80] and more generally Jacobi polynomials [81] have been applied. The Jacobi polynomials are orthogonal over the interval p = - 1 to p = 1 with respect to the weight function ( 1 - p)%( 1 - p)B. For a = ,8 = 0 they reduce to Legendre polynomials, and for a = fl = - $ they reduce to Chebyshev polynomials. Another approach t o the solution of the transport problem has been
+
305
ELIZABETH CUTHILL
suggested recently by Kopp [82].Here a “synthetic method” applicable t o linear problems is used. The method is based on a successive approximation algorithm of Cesari [83] and applied to the solution of a timeindependent neutron transport problem with promising results. I n contrast to the orthogonal polynomial approximations which become more complex as higher order approximations are considered, the method applied here is an iterative one. An operator which approximates the desired one but which is more easily inverted is used as the basis of the iterative scheme. We have formulated the neutron transport equation in terms of neutron flux. An alternative formulation is in terms of transmission and reflection properties of the various regions making up the nuclear reactor. Such an approach is discussed in [84]through [88],for example. A computer program for two-dimensional reactor criticality calculations based on the setting up of response matrices which relate incident, reflected, and transmitted neutron currents has been described by Shimizu [89]. The adjoint problem is also of interest, especially in connection with perturbation calculations performed when effects of small changes of various parameters are to be assessed. The formulation of adjoint equations for various forms of the transport equation can be found in [34], [go], and [91], for example. Because the form of the adjoint equation is so closely related to that of the normal transport equation (where SZ is replaced by -a, where the scattering matrix is transposed so that downscattering is replaced with upscattering, etc.), programs set up for the solution of the normal problem usually require only minor modification to treat the adjoint problem as well. 4.
Solution of the Neutron Transport Problem
Over the years, as design requirements have become more stringent, there has been an increasing need for more accurate and detailed predictions of the lifetime behavior of nuclear reactors. Such predictions are ultimately based on solutions of the neutron transport equation which satisfy appropriate auxiliary conditions. The importance and complexity of the problem of determining such solutions have led to the development and exploitation of a considerable number of mathematical methods and programming techniques. This development has somewhat tempered the need for computers of ever increasing speed and larger memory capacity for treating problems in this field. For example, the use of synthesis techniques for generating solutions to higher dimensional problems by appropriately combining solutions to 306
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
lower dimensional ones has in many cases nearly eliminated the need for programs to treat the higher dimensional problem directly, except for checking the accuracy of the synthesis methods in key areas. The use of programming techniques which make the most efficient use of available information, for example, the performance of a number of iterations simultaneously when using line relaxation methods in the solution of finite difference approximations to the diffusion problem [92], has removed an input-output bottleneck for many computers. The use of FORTRAN for more sophisticated programs has been facilitated by the development of special programming packages which readily permit more efficient use of the input-output equipment of modern computers [931* This section contains a survey of some of the methods and computer programs which have been used for the solution of problems based on the time-independent transport equation. Actually, many programs that include options for solving time-dependent problems treat such problems by solving the transport problem a t each of a series of time steps in the form of a time-independent problem [51, 94-1021. Then it is usually assumed that the flux shape does not change for a period of time while a specified power level is maintained. This permits changes in isotopic composition to be calculated which will be reflected in altered coefficients for the diffusion equations a t the next time step. As the number of dimensions increases, the solution of the neutron transport problem becomes rapidly more expensive. I n many cases, the solution by direct application of finite difference methods becomes impractical with available computing equipment. For this reason, a considerable effort has been devoted to the study of methods that can be used to approximate solutions of more complex problems by appropriately combining solutions of simpler problems. Surveys of work involving the application of variational principles in determining appropriate combinations of such solutions have been made by Selengut [I031 and Kaplan [104]; in this connection, [I051 and [I061 also merit attention. One of the first detailed three-dimensional fuel depletion calculations was carried out on the NORCcomputer [a, 1071. The predicted and observed positions of programmed control rods for a highly enriched experimental reactor depleted at full power were reported t o agree t o within 4% of core height throughout the life of the core. The calculation was based on a synthesis technique suggested by J. Meyer for generating three-dimensional flux shapes a t each of a set of time steps. Results of one- and two-dimensional fuel depletion calculations based on 2group diffusion theory were combined. For each of a set of axial zones 307
ELIZABETH CUTHILL
defined by 0 = zo < z, < . . . < z,, the solution was assumed to be separable in the z-direction and an approximation of the form
Y, 2) = Zi(Z)Vi(X,Y) (4.1) for zi-l < z 5 zi,i = 1, 2, , , . , n was used. A number of subsequent Ti(.,
computer programs have been based on variants of this method; see [I&-1151, for example, Wachspress has generalized this “singlechannel” synthesis method to a “multichannel” one [116, 1271. A more general class of synthesis methods is considered by S. Kaplan [I181 based on approximate solutions of the form
c z,i (z)d (x,Y) Ki
Ti(., Y, 2)
=
k-1
with trial functions p:) (2, y) satisfying the specified boundary conditions in the xy-plane. A variety of methods for determining the “mixing” functions z k i (2) are discussed in [118],and results of calculations using them are compared. Synthesis techniques have also been applied to space-time problems [116,119-1211. A comparison of results obtained by applying synthesis techniques in treating both space and time variables with results of a conventional three-dimensional finite difference calculation were reported in [122]. The agreement was quite good, the calculated eigenvalues remained within 0.2 % throughout 5800 full power hours, while the fraction of the power in the blanket remained within 0.5%. The appropriate formulation for the neutron transport problem to be used in any given calculation depends upon the purpose of the calculation as well as the detailed makeup of the reactor. Because of their relative simplicity, the neutron diffusion equations are used whenever their solution will give an adequate approximation to the required flux distribution. These equations are usually solved by numerical methods, but consideration has also been given to the use of analytic methods [123, 1241. Because of the speed with which solutions can be obtained to the diffusion problem with one independent spatial variable, such solutions are used extensively for survey work. For this reason, many elaborate computer programs have been built around the solution of one-dimensional diffusion problems. For the case of two and three independent spatial variables, considerable effort has gone into the development of methods for the efficient solution of finite difference representations of the diffusion problem. With currently available methods and computers, the two-dimensional problem can be solved readily, and the solution of the few-group three-dimensional problem is practical. As 308
DIGITAL COMPUTERS
IN NUCLEAR REACTOR DESIGN
already indicated, methods for synthesizing solutions t o two- and three-dimensional problems from one- and two-dimensional ones have been developed. Where these give satisfactory results, considerable savings in computer time can often be realized. Section 4.2 will be devoted to the solution of neutron transport problems formulated in terms of representations of the transport equation in which more accurate account is taken of the angular dependence of the neutron flux than in diffusion theory. Such solutions are required, for example, when fine structure effects within lattice cells are to be studied and there are steep neutron flux gradients. Programs which treat very accurate representations of the transport equation in one independent spatial variable have been developed. Within the last few years, programs for solving the two-dimensional problem have also become available. Finally, Section 4.3 will contain a brief discussion of the Monte Carlo solution of the neutron transport problem and of some of the interesting computer programs which have been based on this method. This is one method which can be used to solve the neutron transport equation in all three dimensions taking careful account of the angular and energy dependence of the neutron distribution. However, it is usually an expensive method in terms of computer time required. 4.1 Diffusion Theory
The multigroup diffusion equations can be solved more readily than other approximations to the transport equation. Multigroup diffusion theory calculations based on skillfully averaged cross sections give sufficiently accurate results for a large class of problems. Programs have been written for solving the neutron diffusion equations with few and with many groups, with a variety of coupling schemes between groups, with various symmetries, coordinate systems, and boundary conditions, using a number of methods of solution. Most of these programs are quite flexible in the distribution of materials which they permit. Many of them are designed so that they can become part of a routine for calculating the fuel depletion in a reactor core. This usually entails permitting pointwise variation of materials. Tables I1 and I11 summarize information about a number of computer programs for solving the group diffusion equations in two- and three-independent spatial variables. Among the references, in addition to descriptions of the codes themselves, a few applications of some of the programs are included. References [ZO, 98,140, 142, 169-1961 contain considerable information about one-dimensional codes. 309
ELIZABETH CUTHILL
I n most cases, the group equations are used in the general form
where E, is the group source due to scattering from the other groups, and F is the fission source with Xy specifying the fraction of the fission source entering group g. The coupling of the group equations depends on the specific form of the terms E, and F . It is often assumed that the fission source has the form
i.e., that any group can contribute to the fission source. A variety of forms for the scattering term E, of (4.2) are used. When a full scattering matrix is permitted, neutrons may be scattered from any group to any other group. The scattering source term for group g can then be written
c B
E,
=
i-1,
iz,
Ei,yVi*
(4.4)
If the thermal energy range is to be treated in a detailed fashion, a term of this form is required, since thermal neutrons may gain as well as lose energy upon collision. The usual “multigroup” approximation allows the scattering source entering any group to come from any group of higher energy, i.e.,
while the “few)’ group approximation includes a term for scattering to any group from the next higher energy group only, Eg =Ey-1,gvg-l‘
(4.6)
Tables I1 and 111 indicate the coupling schemes used in each code tabulated. I n most cases, departures from the above schemes are mentioned explicitly. The usual two group equations have fission neutrons entering the group of higher energy only, while scattered neutrons enter the group of lower energy only. I n a few cases, equations of the form (3.10) have been incorporated. These are described as “overlapping’) group equations. In particular the RAUMcode [I911 is based on this general form of the diffusion equations. The coordinate systems which are normally used are the rectangular 310
TABLEI1 THREE-DIMENSIONAL GROUPDIFFUSION THEORYCODES
Name
Maximum number Groups Pointsb
Materials
Group coupling
References
Computer
n Time estimate
? P
n
-
0
5
FLAME
4
230,000 _ ~ _ G
500
Few
[ 4 , 97, 1251
Lmc
0.2 msec/point/inner iteration; 3 to 4 hr for %group, 5o,ooo-point problem.
TKO-1
4
4750
511
Few
[95,126-1281
704
TNT
4
100,000
150
Few
[99, 1221
5-2000
2.5 msec/point/inner iteration; 2.7 hr for a 2-group, 10,000-point z problem. z C 0.8 mseo/point/inner iteration 6 to n 10 hr for 3-group, 6200-point problem.
150,000
100
Few
[129]
704
12 to 15 min/source iteration for 12,000-point, 3-group problem. 6 msec/point/iteration; 1$ t o 4 hr 7090 rn FORTRAN for 10,000-point problem. E
TRIXY
-
~-
UFO
5
G 30,000-7Q
512
Few
[130,131]
WHIRLAWAP
2
12,750
100
Few
[132-1351
704
;;i
z5
F7
-
z
"Available through the Argonne Code Center W represents the number of groups; Q represents the number of point types.
TABLEI11 TWO-DIMENSIONAL GROUPDIFFUSION CODES
Name
-
Maximum number ~ _ _ _ _ - ~-~ Groups Pointsb Materials
AML-54 (Cuthillj 9-ANGIE
3 2 18
7,000 19,000 2,000 3,900 10,000
CRAM"
26 100
CURE' CUREM
-
7,475 -
2
576
4K-ANGIE
EQUIPOISE'
50
9 40
40 40 1000 40
57
2,100
50
FLEER^
3
14,000
250
5
2
1,650
-
(HASSITT)
99
Coordinates
Group Coupling Few Few Two down (One up General
Time References [ 2 2 , 1361 [137, 1381 [139, 1401 [141, 1651 [142-1441
Few Multigroup
[145,146] [146a, 1671
Few
[133,147] [134,148]
[149l
Few
General
Computers
UNIVAC I NORG 7090
estimate 8 hr for 2500-point problem
7090 0.5 min/100 points/ (FORTRAN)group. 704 0.3 min/source iter(7090 ation/1300 points for 2 groups on 7090. 704 10 min/100 source ( F O R T R ~iterations. ) 7090 1.7 msec/point/itera(FORTRAN)tionlgroup. 704 40 min/1000 points.
SC 4020 [163,163a, 166, Mercury 1671
12 min/1000 points. 1 hr €or v's to 1% €or 700 points.
TABLEI11 (Continued) ~~~~~
~~
Maximum number _ _ . ~ Group Points’ Materials
_
Name
Coordinates
Group coupling
References
Computers
Time estimate
KARE
5
20,000
511
Few
MUG PDQ-2 PDQ-3 PDQ-4
4 4 4
-
6,500 7,500 20,000
35 255
Few Few Few
UNIVACI 704 704 S-2000
PDQ-5
5
250,00O/G
100
&ED
2
2,704
19 6
-
General
s-2000 (FORTRAN) 7 04 1 hr for 2500-point problem. UNIVACI 7090 3.5 msec/ point/ (FORTRAN) iteration/group (30 min for 1000point, 4-group problem).
Tospy TWENTY GRAND^
3,000
9
50
“Program available through Argonne Code Center. bG represents the number of groups.
[21, 137, 1501 1151-155, 1671 [156, 1681 [101, 155, 157, 1581 Few with 2 1159, 1601 thermal [1611 Few ~231 [I621
s-2000
Q -I
P
~~
~981
n
0.2 to 5 min/100 0 points/group.
$ z-
2 msec/point/ inner 4 m iteration. 0.7 msec/point/ z inner iteration.
z C
P m
G P
F
2 $ Y2 A
ELIZABETH CUTHILL
(z, y, z ) , the cylindrical ( r , 0, x ) , and the spherical ( r , 8, y ) systems. See Fig. 1.
Cylindrical
Spherical
FIG.1. Coordinato systems.
For codes based on the group diffusion equations in one independent variable, the form of the diffusion equation most generally used is
where p = 0 for plane symmetry, here r x, p = 1 for cylindrical symmetry, and p 2 for spherical symmetry. These symmetries correspond, respectively, to planes parallel to the yz-plane, cylindrical shells concentric with the z-axis, and spherical shells concentric with the origin being homogeneous in composition. In the case of cylindrical symmetry, for example, if the cylinders are assumed to be finite, extending from z = 0 to z = L with vanishing flux at the ends, a nonnegative solution of the form I=
with
can be assumed, and (4.1) will reduce t o a one-dimensional equation of the appropriate form with the ag coefficient modified by (4.10)
314
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
B2 is called the geometric buckling. Actually, B2 is often permitted to be group and region dependent so that it can be adjusted in synthesis calculations. I n the case of plane symmetry, similar assumptions about the behavior of vg in the y or z direction also lead to the introduction of buckling terms. In two dimensions, the independent variables most frequently used are (2,z ) and ( r , z ) . I n this case, Eq. (4.2) can be written as
where p = 0 for rectangular coordinates, r G x, and p = 1 for cylindrical coordinates. The independent variables (r, 0) are also in a few codes. I n this case the origin must be treated carefully. To account for the finite extent in the z-direction a buckling term is again introduced. All of the three-dimensional codes listed in Table I1 are based on a rectangular coordinate system. The codes listed in Tables I1 and 111 make use of finite difference methods for solving the group diffusion equations. This is primarily because of the simplicity, flexibility, and adequacy of these methods for the problems a t hand. Usually, a sequence of intervals in each of the coordinate directions is taken and a network obtained by considering the points of intersection of coordinate surfaces through the end points of these intervals. I n nearly all cases, for each group a t each network point, the differential equation is approximated by three-point difference equations in one dimension, five-point difference equations in two dimensions, and seven-point difference equations in three dimensions. Several two-dimensional codes are based on triangular networks so that six-point difference equations are used [149, 1971. I n nearly all cases interfaces between different materials are permitted along or half-way between mesh planes. The setting up of the difference equations must take account of the appropriate interface and boundary conditions. Many techniques have been developed for setting up the difference equat,ions [ l o , 198, 1991. Particular methods which have been applied here include taking an appropriate combination of the first few terms in the Taylor series expansions of the flux about a point and its neighbors [150, 1771, fitting an appropriate polynomial [142], integrating over appropriate regions and applying the divergence theorem to express a volume integral as a surface integral [145, 160, 1611, and applying variational techniques [197].For each group, when one treats the E, and vxgF terms as sources into that group, Eq. (4.2) represents a self-adjoint equation so that the difference equations can be set up with a symmetric coefficient matrix. 315
ELIZABETH CUTHILL
The problem on which criticality calculations are based is a linear, homogeneous one, and it is the maximum eigenvalue h = 1/v and the corresponding nonnegative neutron flux functions yg which are sought. Birkhoff and Varga [ZOO, ,2011have shown that the usual difference equation formulations of the multigroup problem are well set, and that the standard iteration procedures used for their solution are convergent. The procedure most often used is to treat each group separately starting with a guess for the flux values in each group, and then calculating the fission source using this guess. The group equations are solved in sequence starting with the group of highest energy, using as the group source the appropriate fraction of the fission source together with the scattering source based on the most recently calculated approximation to the flux values. A pass through all of the groups is called an outer iteration or a source iteration. When new estimates for all of the group fluxes have been obtained, a new approximation for the fission source is calculated and the eigenvalue h estimated by appropriately averaging pointwise values of the fission source values. This is essentially a matrix power iteration method. A number of extrapolation techniques have been used to accelerate it. Probably the most widely used is based on the use of Chebyshev polynomials [ l 4 5 , 201-2031. A variety of methods have been used for solving the group equations. In one dimension, since the coefficient matrix is nearly always set up to be triple diagonal (the equations relate flux values a t each network point preceding and following network points on a line), the standard Gauss elimination procedure can be used effectively. I n higher dimensions since the coefficient matrix may be of the order of 1000 t o 100,000, and the difference equations couple points to neighbors in adjacent lines, and in three dimensions, in adjacent planes as well, iteration methods are nearly always used to solve these equations. An excellent general reference on these methods is [lo].The difference equations are usually ordered in the natural way, for points numbered as in Fig. 2, for example. Starting with a flux guess, one can solve each difference equation in turn for a new flux value at each point based on best available values at neighboring points. This is often called the GaussSeidel iteration method, and will converge for the difference equations as they are usually set up. A considerable increase in the rate of convergence can be obtained by making the current change in the flux value an appropriate multiple of the change given by the Gauss-Seidel method. Young [ZO4] has shown how this “successive overrelaxation factor” should be chosen. The method as discussed by Young has been used in many codes [21, 22, 126, 129,151, 1631. 316
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
1
2
3
4
I
t Y
FIG.2. Ordering of network points.
It was noted that one could readily solve for a line of flux values a t a time using the best available flux values for points not on the line. TOSPY[23] was the first nuclear reactor code to incorporate this technique with an overrelaxation factor similar to Young’s for point relaxation. Arms et al. [a051generalized Young’s results to show how this factor could be chosen in an optimum way. It was subsequently shown that by appropriately factoring the coefficient matrix, the line relaxation method could be carried out using the same number of operations per point during the iteration procedure as the point relaxation method while still taking advantage of the symmetry of the coefficient matrix to reduce storage requirements [206]. It should be noted that if the coefficient matrices are not modified during a problem, the factorization must be carried out only once per problem. This method has been used in a number of programs, [97,149,156].An extension of this method t o a two-line method which requires storage of an extra coefficient and also requires extra operations, but with improved convergence [207-2091 has been used in several programs [99, 1571. Golub and Varga [210] have shown how a n optimum sequence of successive overrelaxation factors can be generated and applied. This sequence approaches the optimum constant overrelaxation factors of Young, Arms, Gates, and Zondek asymptotically. The use of the optimum sequence of factors entails reordering the difference equations. 317
ELIZABETH CUTHILL
I n general, the gain in the rate of convergence seems to be offset by extra data handling. Also, the symmetry of the coefficient matrix can no longer be exploited as effectively. A variant of the successive overrelaxation method was developed by Hageman 1160, 2111 for solving the system of two-dimensional five-point difference equations which uses the property that when the coefficient matrix is squared, five-point equations are replaced by nine-point equations which relate the flux values a t alternate points, only. The problem can then be solved with only one-half of the points. A three-line relaxation scheme can be applied to solve for flux values a t these points. The rate of convergence of this method is very good, but this is somewhat offset by the extra calculation required between groups. A variety of methods have been used for the calculation of the type codes optimum relaxation factor. In the MUG [I501 and EQUIPOISE [I331 an estimate is obtained using st similar, simpler problem. I n AML54 [I361 and FLEER [149], results of integrating with estimated relaxation factors arc used to obtain improved estimates. Estimates are often based on a special set of iterations performed using the coefficient matrix as in PDQ [150, 151, 157, 15.91; FLAME [97],TKO-I [126],TNT [9.9]. Alternating direction iterative methods which in two-dimensions involve two passes through the network, first a line a t a time in one direction, and then a line at a time in the perpendicular direction, have been very successfiilly applied in a number of two-dimensional proE KARE1981, and SUNRISE [212]]. grams: [CURE [145, 1/16,] A N ~ I [139], An excellent review article on the alternating direction methods has recently been published by Birkoff et al. [213].Comparisons with the successive overrelaxation methods are included. When fine mesh intervals are used and the parameters well chosen, an appropriate alternating direction method is invariably superior to the successive overrelaxation methods considered. The alternating direction method has been generalized to three dimensions [214, 2151. Although the alternating direction methods will usually converge faster than line successive overrelaxation methods, data handling for the latter methods is simpler when auxiliary storage must be used for the coefficient matrix and flux values. A number of iterations can be performed simultaneously while the required data is in the high-speed computer memory. For example, when a line relaxation method is used, as soon as improved values have been obtained for two lines of flux values the next iteration for the first of these lines can be performed [92]. This technique is used in PDQ-5 [159]. 31 8
DIGITAL COMPUTERS I N NUCLEAR REACTOR DESIGN
The set of iterations used to improve the flux guess for a group are usually called inner iterations. As long as the source is not accurate, one feels that the error criteria used, as the flux in each group is calculated, should not be too stringent. Some discussion of the question of adjusting the inner-outer iteration strategy can be found in [146a, 1601. I n a recent series of two- and three-dimensional programs, Fowler and Tobias [133] perform the equivalent of only one inner iteration per group using a point successive overrelaxation method and report excellent convergence for many problems. A possible danger in the use of this method has been pointed out by Christiansen [134]. Fowler and Tobias have also shown the importance of proper normalization of the flux values so that a neutron balance is maintained for each group [216]. The importance of such a scaling process has also been mentioned by Lee [36] and Carlson [246]. When upscattering is permitted, i.e., provision is made for several thermal groups, another iteration cycle among these groups is often introduced before completing a n outer iteration cycle [I@,1771. Wachspress [217] has introduced an iteration method that treats all groups a t once and uses Wielandt’s method to extrapolate the source. The number of operations in this method increases as the cube of the number of groups. Arai et al. 112121suggest a modification of the method for which the number of operations increases linearly as the number of groups, and include comparisons of the number of outer iterations required by their method with versions of the PDQ and CUREcodes which use Chebyshev extrapolation. Hassitt works with all groups a t once in CRAM[142], iterating for each line among all groups. This appears to be a very effective method when several thermal groups are to be provided for. CRAMis a very well designed program with considerable flexibility in the treatment of output. Most programs which have been set up to solve the normal problem can be used to solve the adjoint problem as well. The equations have a similar form. However, downscattering in the normal p r o b h is replaced by upscattering in the adjoint problem, so that it is important to reverse the order in which groups are treated. For the adjoint problem, calculations using the standard method start with the thermal groups and work u p through the higher energy groups. 4.2 Transport Theory We turn now to some of the numerical methods and computer programs which have been developed for the solution of problems in 319
ELIZABETH CUTHILL
which more accurate approximations to the angular dependence of the neutron flux are required than are a.ssumed in diffusion theory. When energy dependence is to be taken into account, it is the multigroup approximation which is usually used, I n solving the eigenvalue problem corresponding to a criticality calculation, an outer iteration procedure is again carried out which starts with an assumed fission source and follows the direction of neutron travel in energy from higher to lower energy groups. As indicated in Section 3, the time independent transport equation for each group can be written in the form Q
*
VQq (r, Q)
+ aq(r,Q) = Bq@,Q),
(4.12)
where the source term has the form
where the integration is taken over all directions. Also, as was mentioned in Section 3, the angular dependence of the neutron flux is often accounted for by assuming a truncated expansion of the flux in spherical harmonics:
+
+
where Pn,=(Q) for m = - n , - n 1, , , . , n represents the 2n 1 spherical harmonics of order n , and ' p g , n , m is the corresponding moments. This is called a P, approximation and the system of equations obtained for the moments are called P, equations [ 3 4 ] . Disadvantages of this type of approximation are that the complexity of the equations obtained for the moments depends strongly upon the coordinate system used and the number of independent variables. Also, difficulties are encountered in determining appropriate interface and boundary conditions to be satisfied by the moments [34, 2181. On the other hand, an advantage of this approach is that anisotropic scattering can be readily accounted for. I n the case of plane symmetry (slab geometry), there are only two independent variables-one spatial and one angular. I n this case (4.14) reduces to an expansion in terms of Legendre polynomiels in the cosine of the angular variable, and a system of N 1 coupled first-order equations is obtained for the N 1 moments. Gelbard [218a] has shown that by introducing appropriate combinations of the moments, this system of coupled first-order equations can be replaced by a system of coupled second-order equations of exactly the form of the 320
+
+
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
group diffusion equations (4.2) with fission source term (4.3)and group source (4.4) with the moment index replacing the group index. Therefore, programs and methods developed for the solution of the diffusion equations which were discussed in the previous section can be adapted to obtain solutions of the P, equations as well. This is also true for the double-P, equations. Computer programs developed for solving various order P, approximations based on this procedure include SIMPL[219], FLIP[220], SLOP-1 [221, 2821, M 0150 [223],and M 0 1 7 6 [224]. The P3 equations for the case of cylindrical symmetry have also been written as a set of coupled second-order equations. I n this case, however, the coupling terms involve derivatives of moments as well as the moments themselves [225, 2261. These equations have been used in the CLIP program [227].This approach has also been extended to solve problems involving the P , equations in rectangular coordinates in two independent spatial dimensions [225, 2261. The computer code TRIP [228] is based on these equations. I n this case large blocks of coding from twodimensional diffusion theory calculations could be adapted. To avoid some of the complexity of the P, equations when more dimensions and moments are taken into account, Gelbard [225]suggests a simplified form of the P, equations and discusses their range of applicability. Dawson [721 considers an application of this simplified form for the P, equations in two independent spatial dimensions. Kofink [229] studied the convergence of the spherical harmonics method. A modification of the method suggested by the results of this study is proposed by Sauer [230]. One of the most versatile methods for the solution of problems based on the transport equation involves the introduction of a discrete set of directions for the angular variable. Integration over the angle is then approximated by an appropriately weighted sum over the set of directions, or equivalently, a set of points on a unit sphere. The angles are usually chosen so as t o avoid directions along mesh lines introduced by discretizing spatial variables. The Wick-Chandrasekhar method of discrete ordinates [34, 231-2331 applied in the case of plane symmetry (slab geometry) trsnsforms the integro-differential equation t o a system of ordinary differential equations, one for each selected direction, with the integral replaced by an appropriate quadrature formula. Gauss quadrature formulas are often used. I n this case N directions are determined by the N zeros of the Nth order Legendre polynomial. The method is then a P N - l method. Corresponding to Yvon’s double-P, method [34],the integral over all directions can be separated into two parts, one for the range of angles corresponding to forward moving neutrons, and the other for the range corresponding to neutrons moving 321
ELIZABETH CUTHILL
in the reverse direction, Separate Gaussian quadrature formulas are used for each of these intervals. A number of very accurate programs to solve numerically the system of ordinary differential equations obtained in this way together with appropriate interface and boundary conditions has been developed: RDR4 [4, 234, 2351, RDR 5 [236, 2371, RDR 6 [238], TET [239, 2401, and RANCH [241]. One advantage of this formulation is the ease with which one quadrature formula can be substituted for another. The NIOBEprogram [242] for solving the neutron transport problem for spherically symmetric systems is based on a modification of the discrete ordinate method. The most widely used form of the discrete ordinate method has been the Carlson S , method. This method was set up initially to treat problems with plane and spherical symmetry, again so that there would be only two independent variables, one spatial and one angular. For the angular variable, a direction cosine is normally used. A set of discrete directions and of spatial intervals was selected. The flux was assumed to be linear in each of the spatial and angular intervals. The method is set up by Carlson and Bell [243] in a form so that neutrons are automatically conserved in cells of the network. An extension of the method to treat problems with cylindrical symmetry is also included. SNG codes based on this method, which treat systems with plane, spherical, or cylindrical symmetry, are described in [244, 2451. For higher dimensional problems, this method becomes complicated and expensive. Carlson [246] developed a simplified formulation of the method which preserves the accuracy of the original equations but which can be used readily for higher dimensional problems as well. This method has been called the discrete 5, method (DSN) and has been used very successfully in a number of computer programs to solve problems with both one and two spatial dimensions [247-2521. More accurate account of anisotropic scattering in DSN calculations is considered in [252a]. Programs based on variants of the Snmethod are described in [253, 2541. Still another approach to the solution of the neutron transport equation is used in [254a]. Underhill in an interesting survey paper [37]reports using the discrete 8, method to determine the critical size of a simple system using a spatial network of 24 x 36 intervals and 12 directional zones on the IBM 7090 in 25 min. It was a 4-group calculation, and the error due to the finite difference approximation was not more than 2%. The selection of appropriate directions for problems in three spatial dimensions so that desirable symmetries will be preserved is considered in some detail by Carlson [76] and Lee [36]. A variety of procedures have been used for solving the difference
322
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
equations set up as a result of applying the various discrete ordinate methods. Direct, numerically stable methods are described in [253,255]. Usually iterative methods are used which follow the general direction of neutron travel, thus, for a given group, the mesh is usually swept 2n times per iteration, where n is the number of spatial dimensions. Methods of accelerating the convergence of the iterative methods used are considered by Carlson [76],Lee [36], and Blue and Flatt [256], for example. Keller, in a series of papers [257-2591, has considered the convergence of the discrete ordinate method. Many studies of the accuracy of various formulations of the PN and S,, methods have been made; see [36,235,237, 243,260-2681.
4.3 Monte Carlo
The calculation of the neutron density distribution is ultimately based on a statistical theory. In principle, if one could use the probability laws, which describe the behavior of neutrons present, to follow the behavior of a sufficiently large number of neutrons, and then subject the results to appropriate statistical analysis, one could solve any neutron transport problem. This method of approach is called the Monte Carlo method. Some of the iteration procedures mentioned in the earlier sections also follow the neutrons, in direction and energy, and in this respect bear some resemblance to a Monte Carlo calculation, however, they are deterministic procedures. I n practice, the physical model of the problem is often replaced by another which has the same expected results, but which will yield a satisfactory solution more efficiently. I n general, any calculation which involves the use of random sampling can be referred to as a Monte Carlo calculation. Ulam and von Neumann introduced the application of model sampling methods for the solution of problems involving nuclear chain reactions using a digital computer [Z69]. The descriptive term “Monte Carlo” by which they referred to these calculations has since replaced the term “model sampling” in most references to physical applications of the method. General descriptions of the Monte Carlo method and its application to neutron transport problems can be found in [270-2801. One of the main advantages of the Monte Carlo method is its versatility. I n principle, any neutron reactions which are physically understood can be treated by this method. Monte Carlo methods can be used for calculations which cannot be reasonably performed in any other way. Results of Monte Carlo calculations are often used to check the validity of approximations developed for use with other numerical
323
ELIZABETH CUTHILL
methods. Most of the general references on Monte Carlo [270-2791 as well as [31, 263, 281-2851 contain comparisons of the results of Monte Carlo calculations with other types of calculations and with experiment. I n addition, a variety of applications are given in [286-3031. Fortunately, those problems which are most difficult to solve with Monte Carlo methods, such as problems in which each neutron has a long history, i.e., is expected to make many collisions before it is absorbed or escapes from the system, are often those problems for which diffusion theory is most reliable. Although, in principle, one can solve any physically understood neutron transport problem using Monte Carlo methods, in practice, a prohibitive amount of computing time may be required to obtain sufficiently detailed information concerning the neutron flux distribution. Usually, numerical methods such as those described in the preceding sections tend to be more economical than Monte Carlo methods. However, the amount of work in using Monte Carlo methods increases roughly in proportion to the number of dimensions. For other numerical methods it usually increases much faster. The basic features of a Monte Carlo calculation include the choice of model, geometry, and physical events to be considerad, methods for generation of random variables, and variance reduction techniques to be used. The calculation might proceed somewhat as follows. Neutrons from a source representative of,the correct one are introduced into the system. The behavior of these neutrons is followed based on information supplied concerning the geometric distribution of materials which make up the system and the properties of these materials which determine the course of the reactions t o be taken into account. Usually this includes probabilities per unit distance traveled (macroscopic cross sections) as functions of neutron velocity which are supplied for the reactions considered-absorption, elastic scattering, inelastic scattering, and fission, for example. Also the velocity distribution of scattered neutrons and of those emitted as a result of fission together with the average number emitted per fission may be required, Since tracing a neutron’s history is a t the heart of the program, ingenuity in the programming is important in determining answers most efficiently to such questions as the set of regions the neutron will pass through, the path length in each region, where a crossing of an interface will occur, where the neutron will be a t the end of its path and what its velocity will be. After tracing a set of neutron histories for a specified period of time, or to their conclusion while collecting data on this sample population, statistical analyses can be performed to estimate values of desired parameters associated with the parent population. I n setting up such 324
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
a calculation it is important to take account of symmetries of the system (just as in other methods of solution already discussed). Since random sampling is a basic part of any Monte Carlo calculation, a t least one procedure is required for obtaining random variables having a specific distribution. Usually, only a procedure for generating numbers uniformly distributed on the unit interval ( 0 , l ) and satisfactorily independent is needed. Then random variables having a specified distribution are obtained by making use of these numbers. The procedure most often used to generate appropriate sequences of numbers uniformly distributed on the interval (0,l) is a method based on a congruence recursion relation described by Taussky and Todd [304].Since sequences of numbers so generated have the properties of random numbers, although they are predetermined sequences, they are called pseudo-random numbers. Kahn [271] considers transformations of random variables, and their generation. A very fast technique for generating random variables with a specific distribution was introduced in TRAM[306]. Here a set of precalculated numbers with the desired distribution are stored, and the random numbers generated are used as relative addresses to select a random sequence of the precalculated values. I n order to perform Monte Carlo calculations more economically, many procedures have been used and suggested, which will modify the process in such a way as to reduce the variance while maintaining the same expectations. To quote from Hammersley, “However paradoxical this may seem, it is a fundamental precept of Monte Carlo work to regard randomness as a nuisance to be avoided, suppressed, or reduced as much as possible.” I n general, it seems worthwhile to eliminate as much of the randomness as possible by using conventional methodsanalytical or numerical, though not beyond the point where the increase in the computer time per trial and the added complexity of the program offset the reduction in variance achieved. A considerable variety of techniques for improving the efficiency of Monte Carlo calculations can be found in [306-3151, as well as in the general references [270-2801. The Monte Carlo codes for neutron transport catalogued at the Argonne Codes Center include general purpose codes GMCM-9 [316], TRAM[305, 3171, and TRAC-1[318], codes for slab (PLMCM-1 [319]) and sphericaI (SPMCN-1 [320])geometries, codes specifically for shielding calculations TRG-RS(N) [321],M-1 [322],and also ABCD [323] for the Monte Carlo calculation of neutron doses inside shielded cylindrical crew compartments. Other interesting programs are described in [3243341. RBU [I021 combines the use of the Monte Carlo method t o determine
325
ELIZABETH CUTHILL
three-dimensional flux distributions which are averages over space and energy to obtain neutronic properties with one-dimensional diffusion calculations based on finite differences methods. Fuel depletion calculations are included which are based on the results of the diffusion calculations.
5. Other Calculations
As has already been indicated, in order to predict the behavior of a nuclear reactor a considerable number of interconnected problems must be solved covering ;t broad range of physics and engineering fields. Despite the short history of both nuclear reactors and high-speed digital computers, a considerable body of literature has emerged on work related t o the development of computer programs for treating various aspects of the nuclear reactor design problem. No attempt has been made to give an exhaustive coverage of computer applications in reactor design, but representative developments on certain aspects of this problem have been presented. As illustrative of the variety of other miscellaricous computer applications in reactor design, such as heat transfer, stress analysis, hydrodynamics, economics, data reduction studies, references [335-3861 are cited. REFERENCES 1. Sangren, W. C., Thc role of digital computers in nuclear design. Nucleonics 15, NO.5, 56-60 (1957). 2. Sangren, W. C., Digital Computers and Nuclear Reuctor Calculations. Wiley, New York, 1960. 3. Henry, A., Nud. Sci. Eng. 9, 521-522 (1961). 4. Gelbard, E. M., Habetler, C . J ., and Ehrlich, It., Thc role of digital computers in the design of water moderatled reactors. in Proceedings of the Sec0n.d United Nations Intern,ation,al Conference on the Peaceful Uses of Atomic Energy, Val. 16, pp. 473-482. CERN, (kncva, Switzerland, 1959. 5. Codes for Heactor Computations. International At,ornic Energy Agency, Vienna, 1961. 6. On-line computers for power roactors. Nuc1eo)tics 20, No. 6, 51-74 (1962). 7 . Summaries of papers for Tho ANS San Francisco Topical Meeting “Nilclear Performance of Power Reactor Corcs.” Nucl. News 6, No. 8 , 35-42 (1963). 8. Birkhoff, G., Some mathematical problcms of nuclear reactor theory, in Frontiers of Numerical Mathematics ( H . E. Langcr, ed.). Univ. of Wisconsin Press, Madison, Wisconsin, 1960. 9. Proceedings of Symposia in Applied Mathematics, Volume X I , Nuclear Reactor Theory ( G . Birkoff and E. P. Wigner, eds.). American Mathematical Society, Providence, Rhode Island, 196 1. 10. Varga, R. S., Matrix Iterative Analysis, Prentice-Hall, Englcwood Cliffs, New Jersey, 1962.
326
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN 1 1 . Weinberg, A. M., and Wigner, E. P., The Physical Theory of Neutron Chain Reactors. Univ. of Chicago Press, Chicago, Illinois, 1958. 12. Meghreblian, R. V., and Holmes, D. K., Reactor Analysis. McGraw-Hill, New York, 1960. 13. Glasstone, S., and Sesonke, A., Nuclear Reactor Engineering. Van Nostrand, Princeton, New Jersey, 1963. 14. Radkowsky, A., and Brodsky, R., A bibliography of available digital computer codes for nuclear reactor problems. U.S. Atomic Energy Com-
mission Report AECU-3078. Technical Information Service, Oak Ridge, Tennessee, 1955. 15. Richtmyer, R . D., Monte Carlo study of resonance escape in hexagonal reactor- lattices. A E C Computing Facility Report N YO-6479. Institute of Mathematical Sciences, New York University, 1955. 16. Hellens, R . L., Long, R. W., and Mount,, B. H., Multigroup Fourier transform calculation-descript>ion of the MUFT 111 code. Bettis Atomic Power Laboratory Report W A P D -T M - 4 . Pittsburgh, Pennsylvania, 1955. 17. Richtmyer, R. D., Van Norton, R., and Wolfe, A., The Monte Carlo calculation of resonance capture in reactor lattices, in Proceedings of the Second United Nations International Conference on the Peaceful Uses of Atomic Energy, 1958 Vol. 16, pp. 180-186. CERN, Geneva, Switzerland, 1959. 18. Bohl, Jr., H., and Hemphill, J. P., MUFT-5- a fast neutron spectrum program for the Philco-2000. Bettis Atomic Power Laboratory Report W A P D - T M - 2 1 8 . Pittsburgh, Pennsylvania, 1961. 19. Grimesey, R. A., Muellen, F. E., Cannon, L. J., MUFT revision-a fast neutron spectrum code for the IBM-650. Atomic Energy Division Report 100.16735. Phillips Petroleum Company, Idaho Falls, Idaho, 1962. 20. Butler, M. and Cook, J., UNIVAC programs for the solution of one-dimensional multigroup reactor equations. Argonne National Laboratory Report A N L - 5 4 3 7 . Argonne, Illinois, 1955. 21. Stark, R. H., Rates of convergence in numerical solution of the diffusion equation. J . Assoc. Computing Machinery 3, 29-40 (1956). 22. Varga, R . S., Problem 54-The Cuthill Code, Bettis Atomic Power Laboratory Report W A P D - L S R ( P ) - 3 0 .Pittsburgh, Pennsylvania, 1955. 23. Archibald, Jr., J. A., TOSPY, a two-dimensional multigroup program for UNIVAC. Knolls Atomic Power Laboratory Reports K A P L - M - J A - 2(1956), K A P L - M - J A - 4 (1957), K A P L - M - J A - 5 (1957). Schenectady, New York. 24. Hoehstrasser, U., unpublished work, 1’355. 25. Keller, H. B., and Heller, J., On thc numerical integration of the neutron transport equation. A E C Computing Facility Report N YO-6481. Institute of Mathematical Sciences, New York University, 1955. 26. Schechter, S., ed., Nuclear Codes Group Newsletter Nos. 1-10. AEC Computing Facility, New York University, New York, 1956-1959. 27. Nather, V., and Sangren, W., Abstracts- nuclear reactor codes. Communa Assoc. Computing Machinery 2, 6-32 (1959). reactor codes. Com28. Nather, V., and Sangren, W., Abstracts-additional mum. Assoc. Computing Mackinery 3, 6-19 (1960). 29. Nather, V., and Sangren, W., Codes for reactor computations. Nucleonics 19, No. 1 1 , 154-158 (1961). 30, Roos, B. W., and Sangren, W., Codes for reactor computations. Nucleonics 20, NO. 8, 132-133 (1962).
327
ELIZABETH CUTHILL
31. Butler, M., ed., Distribution of Code Abstracts Nos. I to 6. Argonne Code Center, Argonnc National Laboratory, Argonne, Illinois, 1961-1963. 31a. Neutron Physics Division annual progress report for period ending August 1 , 1963. Oak R i d y e National Laboratory Report ORNL-3499, Vol. 1. Oak Ridge, Tennessee, 1963. 32. Wignar, E. l’., Mathematical problems of nuclear reactor theory, in Proceedings of Symposia in, Applied Mathematics, Volume X I , NucEeur Reactor Theory (G. Birkhoff and E. P. Wigncr, eds.). Amer. Mathematical Society, Providence, Rhodc Island, 1961. 33. Boltzmann, L., b’orleszrngen uber Gastheorie. Bart,h, Lcipzig, 1896. 34. Davison, R., Neutron Transport Theory. Oxford Univ. Press, London and New York, 1967. 35. Woinberg, A. M., Keactor types, in Proceedings of Symposia in Applied Mathematics, Volume X I , Nuclear Reactor Theory (G. Birkhoff and E. P. LVignor, eds.). American Mathematical Society, Providence, Rhode Island, 1961. 36. Lee, C. E., The discrete S, approximatlionto transport theory. Lo8 Alamos Scientijic Laboratory Report LA-2595. Los Alamos, New Mexico, 1962. 37. Underhill, L. H., Thc linear transport equation in one and two dimensions, in Nuinericul Solution of Ordknury and Partial Differential Equations (L. Fox, ecl.). Addison-Wesley, Rcading, Massachusetts, 1962. 38. Henry, A. F., X review of methods for describing tho detailed behavior of reactors. Trans. Am. Nuc.1. SOC.4, 20.5 (1961); for complete paper see Beltis Atomic Power Laboratory Report W A P D -T-1376. Pit,tsburgh, Pennsylvania, 1961. 38a. Proceedings of the 1960 conference on reactor kinotics. Atomic Energy Division Report IOO-16791. Phillips Petroleum Company, Idaho Falls, Idaho. Positivity and criticality, in Proceedkgs of Symposia in Applied , I’olume X I , Nuclear Heuctor Theory (U. Birkhoff and E. P. Wigner, edn.). American Mathematical Socicty, Providence, Hhode Island, 1961. 40. Honeck, H., A review of methods for computing thermal neutron spectra. Trans. A m . Nucl. SOC.6, 16 (1963). 41. Case, K. M., and Zweifel, P. F., Existence and uniqueness theorems for the ncutron t,ransport equation. J . Math. Phys. 4, 1376-1385 (1963). 42. Douglis, A . , The existence and calculation of solutions of certain integrodifferential equations in several dimensions. U.S. Naval Ordnance Laboratory Report N O L T R 62-193. White Oak, Maryland, 1962. 43. Habetler, G. J., and Martino, M. A , , Existence theorems and spectral t,heory for t,he multigroup diffusion model, in Proceedin.gs of Symposia in Applied Mathematics, Volume X I , Nuclear Reactor Theory (G. Birkhoff and E. P. Wigncr, cds.). American Mathematical Societ,y, Providence, Rhode Island, 1961; Chambers, W. G., A note on the mathematical propert,ies of t h c Roltzmann equation. Proc. Phys. Soc. (Lomdon) 81, 877-882 (1963). 44. Parker, K., ecl., Machine production of group cross-sections for use in Carlson S, and other neutronics calculations on high spced computers. Atomic Weapons Re.search Establishment Report A WRE-0-1161.Aldermaston. Berkshire, England, 1961.
328
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
45. Moldauer, P. A., Calculation of fast reactor cross scctions, in Physics of Fast and Intermediate Reactors I . International Atomic Energy Agency, Vienna, 1962. 46. Zwcifcl, P. F., and Ball, ( 2 . L., Group cross-scctions for fast reactors, in Physics of Fast and Intermediate Reactors I . International Atomic Energy Agency, Vienna, 1962. 47. Joanou, G. D., and Dudek, J. S., GAM-I, a consistent P I multigroup code for t,hc calculation of the fast neutron spectra and multigroup constants. General Atomics Report GA-1850. San Dicgo, California, 1961. 48. McGoff, D., FORM, a Fourier transform fast spectrum code for the IBM-709. Atomics International Report NAA-S'R-ME'MO 5766, North American Aviation Company, Canoga Park, California, 1960. 49. Canficld, E. H., and Pettihone, J . , SOPHIST 11, a group cross section IBM 709/7090 code. Lawrence Radiation Laboratory Report UCRL-6912 University of California, Livcrmore, California, 1962. 50. Ball, G . L., Nicolson, R. B . , and Zweifel, P. F., Scattering resonances and fast reactor calculations. Trans. Am. Nucl. SOC.5, 54 (1962). 51. Hicks, D., Few group nuclear design methods for heavy water reactors. Atomic Energy Establishment Report A E E W -R249. Winfrith, Dorset, England, 1963. 52. Amster, H. J., and Suarez, P., The calculation of thermal constants averaged over a Wignar-Wilkens spectrum; description of the SOFOCATE code. Bettis Atomic Power Laboratory Report W A P D - T M - 3 9 . Pittsburgh, Pennsylvania, 1957. 53. Amster, H. J., and Callaghan, J. B., KATE-1, a program for calculating Wigner-Willtins and Maxwellian averaged thermal constants on the Philco2000. Bettis Atomic Power Laboratory Report W A P D - T M - 2 3 2 . Pittsburgh, Pennsylvania, 1960. 54. Reno, Jr., T. J., and Federighi, F. D., SWAK, a thermal cross-section program. Kn,olls Atomic Power Laboratory Report K A P L - M - R P C - 6 , UC 32. Schenectady, New York, 1963. 55. Joanou, G. D., and Kaestner, P. C., GATHER-11, a consistent B , code for calculation of thcrmal nciitron spectra and associated multigroup constants. Trans. Am. Nucl. SOC.5, 95 (1962). 56. Shudde, R., and Dyer, J., Tempest-11, a neutron thermalization code. Atomics International Report A M T D - I l l , North American Aviation Company, Canoga Park, California, 1962. 57. Leslie, D. C., Calculation of thcrmal spectra in lattice calls, in Proceedings of the Brookhaven Conference on Xeutron Thermaliration Vol. 2, pp. 592-609. Brookhaven National Laboratory, Upton, New York, 1962. 58. Cooper, R. S., A code for reducing many group cross-scctions t o few groups. Los Alamos Scien,ti$c Laboratory Report LAMS-2801. Los Alamos, New Mexico, 1962. 59. Calame, G. P., and Federighi, F. D., A variational procedure for determining spat,ial dependent, thermal spectra. Nucl. Sci. Eng. 10, 190-201 (1961). 60. Fodrrighi, F. n., SWAKRAUM-a spatially dependent thermal spect,riim code for the Philco-2000. Knolls Atomic Power Lnhoratory Report K A P L M - F D F - 2 . Schcncct>ady,Ncw York 1962.
329
ELIZABETH CUTHILL
61. Ombrellaro, P. A., and Federighi, F. D., A variational procedure for calculating fast fcw group constants. Knolls Atomic Power Laboratory Report K A P L-2 2 2 0 . Schenectady, New York, 1962. 62. Buslik, A . J., The description of tho thermal neut,ron spatially dependent spectrum by means of variational principles. Bet& Atomic Power Laboratory Report W A I ' D - B T - 2 5 . Pittsburgh, Pennsylvania, 1962. 63. Smiley, J. W., and Bulmer, J. J . , Two-dimensional t,hermal spectra calculations. Trans. Am. Nucl. Soc. 5, 36-37 (1962). 64. Candelore, N. R . , and Gast, R. C., Non-ccll correction to thin cell homo6, 1-2 (1963). genization. Trans. A m . Nucl. SOC. 65. Shnflcr, S. L., A comparison of a self-adjoint, variational method to 36group thermal spectrum calciilat~ions of heterogeneous systems. Rettis Atomic Power Laboratory Report WAPD-1'-1450,Pit,t,sburgh, Pennsylvania, 1962. 66. Marshak, R . E., Brooks, H., and Hurwitz, Jr., H., Introduction to the theory of diffusion and slowing down of neutrons. Nucleon.ics 4, No. 5, 53-60 (1949). 67. Ehrlich, R., and Hurwitz, Jr., H., Mult,igroup methods for neutron diffusion probloms. Nucleonics 12, No. 2, 23-30 (1954). 68. Wilkins, Jr., J. E., Diffusion approximation to the transport equation, in Proceedings of Symposia in Applied Mathematics, Volume X I , Nuclear Reactor Th,eory (G. Birkhoff and E. P. Wigner, eds.).American Mathematical Society, Providence, Rhode Island, 1961. 69. Pomraning, G. C., Rcduction of transport theory to multigroup diffusion theory. Trans. A m . Nucl. Soc. 6, 228--229 (1963). 70. Pomraning, G. C., and Clark, Jr., M., A new asymptotic diffusion theory. Nucl. "Ski. En.g. 17, 227-233 (1963). 71. Selengut, D. S., Tranfiport corrcctions tJodiffusion theory. Trans. Am. Nucl. SOC.5, 40 (1962). 72. Dawson, C., Modified P , approximations t o the transport equations. Trans. Am. Nucl. Soc. 6, 17 (1963). 73. Mingle, J. O., Disadvantage factors in slab geometxy by the P , calculation. Nucl. Sci. Eng. 11, 85-89 (1961). 74. Pomraning, G. C., and Clark, Jr., M., The variational method applied to the monoenergetic Boltzmann equation, Pclrt>sI and 11. Nucl. Sci. Eng. 16, 147-164 (1963). 75. Bareiss, E. H., A survey and classification of transport, theory calculation techniques, in Proceedings of the Secon,d linited Nations International Conference o n th.e Peaceful Uses of Atomic Energy, CERN, Geneva, Switzerland, 1959. Vol. 16, pp. 503-516. 75a Gast, R. C., On the equivalence of the spherical harmonics method and the discrete ordinate method using Gauss quadrature for the Boltzmann equation. B e t h Atomic Power Laboratory Report W A P D - T M - 1 1 8 . Pittsburgh, Pennsylvania, 1958. 76. Carlson, B. G., The numerical theory of neutron transport, in Methods in Computationml Ph,ysics, Vol. 1, Statistical Physics (B. Alder, S. Fernbach, and M. Rotenberg, eds.). Academic Press, New York, 1963. 77. Jeans, J. H., The equations of radiative transfer of energy. MonthEy Notices Roy. Astron. Soc. 78, 29-36 (1917). 78. Schiff, D., and Ziering, S., Many-fold moment method. Nuc. Sci. Eng. 7, 172-183 (1960).
330
DIGITAL COMPUTERS I N NUCLEAR REACTOR DESIGN
79. Gast, R. C., The two-dimensional quadruple P - 0 and P - 1 approximation. Trans. A m . Nucl. Soc. 4, 76-77 (1961). 80. Yabushita, S., Tschebyscheff polynomial approximation method of the neutron transport equation. J . Math. Phys. 2, 543-549 (1961). 81. Pomraning, G. C., and Clark, Jr., M., Orthogonal polynomial angular expansion of the Boltzmann equation. Nucl. Sci. Eng. 17, 8-17 (1963). 82. Kopp, H. J . , Synthetic method solution of the transport equation. Nucl. Sci. Eng. 17, 65-74 (1963). 83. Cesari, L., Sulla risoluzione dei sistemi de equazioni lineari per approssimazioni successive, Atti Accad. Nazl. Lincei, Rend. Classe Sci. Fis., Mat. Nut. [6] 25, 422-428 (1937). 84. Selengut, D. S., Part,ial current representations in reactor physics, in Reactor Technology Report No. 23, Physics. Knolls Atomic Power Laboratory Report KAPL-2000-20,pp. 1.20-1.26. Schenectady, New York, 1963. 85. Shimizu, A., Response matrix method. Nippon Qenshiryoku Gakkaiehi (J. At. Energy Soc. Japan) 5, 359-369 (1963). 86. Redheffer, R., On the relation of transmission line theory to scattering and transfer. J . Math. Phys. 41, 1-41 (1962). 87. Bellman, R . , and Kalaba, R., Transport theory and invariant imbedding, in Proceedings of Symposia in Applied Mathematics, Volume X I , Nuclear Reactor Theory (G.Birkhoff and E. P. Wigner, eds.). American Mathematical Society, Providence, Rhode Island, 1961. 88. Wing, G. M., An Introduction to Transport Theory. Wiley, New York, 1962. 89. Shimizu, A., and Tadakatsu, A., Response matrix method for criticality calculations of two dimensional reactors. Trans. A m . Nucl. SOC.6, 281-282 (1963). 90. Marchuk, G . I., Numerical Methods for Nuclear Reactor Calculatione. Consultants Bureau, New York, 1959. 91. Hansen, K. F., and Clark, Jr., M., Adjoint functions and orthogonality relations. Nucl. Sci. Eng. 15, 139-141 (1963). 92. Flatt, H., A Survey of Nuclear Reactor Computing in the United States. IBM, Los Angeles, California, 1962. 93. Marlowe, 0. J., Bettis FORTRAN programming: auxiliary subroutines. Bettis Atomic Power Laboratory Report WAPD-TM-365. Pittsburgh, Pennsylvania, 1963. 94. Olsen, T. M., et al., Army pressurized water reactor code-final summary report, Martin Marietta Corporation Report MND-C-2500, Volumes 1-6. Baltimore, Maryland, June 1962; Additions and corrections to MND-(3-2500. Martin Marietta Corporation Report ICIND-(2-2500-7.Baltimore, Maryland, December, 1962. 95. Jacobi, W. M., Lawton, T. J., Meanor, S. H., and Parrette, J. R., ABRACan IBM 704 code to solve the three-dimensional few-group, time-dependent diffusion equation with distributed void effects. Bettis Atomic Power Laboratory Report W APD-TM-203. Pittsburgh, Pennsylvania, 1960. 96. Curtis, A. R., Tyror, J. G., and Wrigley, H. E., STAB: a kinetic threedimensional, one-group digital computer program, Atomic Energy Establishment Report AEEW-Z1 77. Winfrith, Dorset, England, 1961. 96a. Pickering, W., A comparison with experiment of transient results from the spatial kinetics program STAB. U . K . Atomic Energy Authority Report T R G 127. Risley, Lancashire, England, 1961.
331
ELIZABETH CUTHILL
97. Cuthill, E., and Schot, J. W., Specifications for FLAME-a three-dimensional reactor burnup code for the LARC. Applied Mathematics Laboratory Report A M L 84. David Taylor Model Basin, Washington, D.C., 1959. 98. Archibald, J r . , J . A., and Teaford, H. L., KAHE: a system of diffusion tjhcory programs for the Philco-2000. Knolls Atomic Power Laboratory Report KAPL-2165-1. Schenectady, New York, 1962. 99. Marlowe, 0. J., Nuchar reactor depletion programs for the Philco-2000 computer. Bettis Atomic Power Laboratory Report WAPD-TIM-221. Pittsburgh, Pennsylvania, 1961. 100. Fisher, D. L., and Harriman, J. M., Simplified analytical model for simulation of operating boiling water reactors, ANS Topical Meeting on Nuclear Pcrformancc of Power Reactor Cores, San Francisco, California, 1963. 101. Hicks, D., Light water lattice calculations in the United Kingdom, in Light Water Lattices, pp. 99-132. International Atomic Energy Agency, Vienna, 1962. 102. Lcshan, E. J . , et al., HBU: a combined Monte Carlo reactor-burnup program for the IBM 709. American-Stan.durd Report ATL-A-101. Mountain View, California, 1969. 103. Sclcngut. D. S., The construction of approximate theories by variational methods. Trans. A m . Nucl. SOC.5, 413-414 (1962). 104. Kaplan, S., Space and time synthesis by the variational method. Trans. A m . N d . SOC.5 , 412-413 (1962). 105. Kaplan, S., On the best method for choosing weighting functions in the method of wcighted residuals. Trans. A m . Nucl. Soc. 6, 3-4 (1963). 106. Mowrey, Jr., A. L., and Murray, R. L., A generalized variational method for react,or analysis. Nucl. Sci. Eng. 14, 401-413 (1962). 107. Meyer, J. E., Synthesis of three-dimensional powcr shapcs, a flux weighting synthesis technique, in Proceedings of the Second United Nations Internatioibal Conference o n the Peaceful Uses of Atomic Energy, 1958, Vol. 11, pp. 519-522. CERN, Geneva, Switzerland, 1959. 108. Fairey, J. G., Meycr, J. E., Callaghan, J . B., Meanor, S. H., Pace, A. V., and Smith, It. B., PROP and J E T , a program for the synthesis and survey of three-dimensional power shapes on the IBM-704. Bettis Atomic Power Laboratory Report WAPD-TM-116. Pittsburgh, Pennsylvania, 1958. 109. Pfeifcr, C. J., ZIP-2, ZIP-3, Computer Code Abstracts. Nucl. Sci. Eng. 14, 324-326 (1962). 110. Pfcifcr, C. J., and Urbanus, F. R., ZIP-2, a one-dimensional, few-group synthesis nuclear depletion program for the Philco-2O0Ocomputer. Bettis Atomic Power LaOoratory Report W A P D - T M - 2 2 8 .Pittsburgh, Pennsylvania 1961. 111. Flanagan, C. A,, and Hannun, W. H., PWR core 2 reactor and analytic model descript,ion-Part, 2. Trans. A m . Nucl. SOC.4, 319-320 (1961). 112. Sphar. C . D., An evaluation of the “ZIP” synthesis kchnique as applicd 4, 320-321 (1961). to a PWR seed blanket slab reactor. Trans. A m . Nucl. SOC. 113. Frodorick, D. H., Glasser, S., Olscn, T. M., Schaefer, E. A., and Wolf, D. E., A P I and S N theory code for static and dynamic synthesis of two-dimensional flux and reactivity. Trans. A m . Nucl. SOC.4, 74-75 (1961). 114. Colbeth, E. A., and Olsen, T. M., APWRC-SYBTJRN, A FORTRAN-I1 program for synt,hcsized two-dimensional P1 or DSN burnup calculations. Martin Marietta Corp. Report MND-C-2500-4.Baltimore, Maryland, 1962; see also, Colbeth, E.A., and Olsen, T. M.; Trans. A m . Nucl. SOC.4,233 (1961).
332
DIGITAL COMPUTERS I N NUCLEAR REACTOR DESIGN
115. Olscn, T. M., APWRC-SYNFAR-02, a 1’1 and DSN theory FORTRAN-I1 code for static anti dynamic synt,hrsisof two tlimmnional flux and rcacti\.ity. Martin Marietta Corporation Report AINIj-(2-2500-3.Ball iinore, Maryland, 1962. 116. Wachsprcss. E. L., Digit,al computation of space-time variation of neutron fluxes in a complex reactor configuration, in Codesfor Reactor Coinputations. International Atomic Enrrgy Agency, Vionna, 1961. 117. Wachspress, E. L., Burgess, R . D., and Baron, S., Multichannel flux synthesis. Nucl. Sci. Eng. 12, 381--389 (1962). 118. Kaplan, S., Some new methods of flux synthesis. Nucl. Sci. Eng. 13, 22-31 (1962). 119. Lcwins. J., ‘rho approximate separation of kinetics problems into time and space functions by a variational principle. J . Nucl. Energy, Pt. A . Reactor S C ~12, . 108-112 (1960). 120. Bcwick, J . A., Henry, A. F., and ICaplan, S., Synthesis approximations in the time direction. Trans. A m . Nucl. SOC. 5 , 177-178 (1962). 121. Flatt, H., COLLOREK-an experimental code for reactor kinet,ics calculations. Trans. Am. Nucl.Soc. 5 , 171 (1962). 121a. Judge, F., and Daitch, P. B., Application of the variational mct,hod for the time dcpcndence of thc neutron flux in small slabs, cylinders, and spheres Trans. A m . Nucl. SOC.6, 291-292 (1963). 122. Kaplan, S., and MarIowe, 0. J., Application of synthrsis approximations to three-dimensional depletion calculations and to cell theory. Trans. Am. N t d . SOC.6, 254-255 (1963). 123. Garabedian, H. L., and Thomas, D. H., An analytic approach to twodimensional reactor theory. Nucl. Sci. Eng. 14, 266-271 (1962). 124. Toivancn, T., An applicat,ion of a new method for the solution of group diffusion equations to a cylindrical flux trap assembly with spontaneous fission source. Nucl. Sci. Eng. 16, 176-185 (1963). 125. Cuthill, E., FLAME, a three-dimensional burn-up code for LARC, in Codes for Reactor Computations. lnternational Atomic Energy Agency, Vienna, 1961. 126. Cadwell, W. R., TKO-a three-dimensional neutron-diffusion code for the IBM-704. Bettis Atomic Power Laboratory Report W A P D - T M - 1 4 3 . Pittsburgh, Pennsylvania, 1959. 127. Bickel, P. A., and Gallagher, W. J., Thermal and hydraulic operating performancc of tho Shippingport prcssnrizcd water reactor-predictions versus measurements. Trans. A m . Nuclear SOC.4, 360-361 (1961). 128. Christman, R. P., and Jones, D. H., Summary of calciilat,ional and experimental physics results for Shippingport PWR-1 Seed 1. Bettis Atomic Power Laboratory Report W APD-234. Pittsburgh, Pennsylvania, 1961. 129. Greebler, P., TRIXY-a computer program for multigroup nuclear reactor calculations in three space dimensions. Knolls Atomic Power Laboratory Report K A PL-1549. Schenectady, New York, 1956. 130. Auerbach, E. H., Jewett, J. P., and Ket,chum, M.A., UFO-a three-dimensional neutron diffusion code for the IBM 704. Knolls Atomic Power Lnboratory Report KAPL-1999. Schenectady, New York, 1959. 131. Roseberry, R . J.,and Ruane, T. F., Asymmetric core experiments and their analysis by a three-dimensional code and two flux synthesis techniques. Trans. A m . Nucl. Soc. 4, 285-286, 1961.
333
ELIZABETH CUTHILL
132. Fowler, T. B., and Tobias, M., WHIRLAWAY-a three-dimensional neutron diffusion code for t,he TBM 7090 computcr. Oak Ridge Nationml Lnhoratory Report ORNL-3150, Oak Ridge, Tennessee, 1961. 133. Tobias, M., and Fowler, T. B., the EQUlPoISE niethod-a simple procedure for group-diffusion calculations in two- and three-dimensions. N u d . Sci.E7~g.12, 513-518 (1962). 134. Kristiansen, C:. K., The convergence of the EQUIPOISE method. Nucl. S c i . E ~ u J16, . 133-134 (1963.) 135. Battcy, P., A comparison of the performance of WHIRLAWAY, a threedimensional, two group code, on the IBM 7090 and on LAHC. Applied Matherna~ics Laboratory Report A M L 175. David Taylor Model Basin, Washington, D.C., 1962. 136. Davis, It. M., Jackson, M. H., and Cuthill, E., Two-dimensional reactor sirnulator code. for UNIVAC. David Taylor Model Basin Report 1128. Washington, D.C., 1956. 137. Wolf, B., Cegclski, W., and Machell, W., Final Physics Report f o r the Experimental Test Reactor. Atomic Power Equipment Department, General Electric Company, 1956. 138. Cuthill, E., and Schot, J., The ACE, DRIFT, and thermal analysis codes for NORC, unpublished notes. 139. Stone, S. P., 9-ANGIE, a two-dimensional multigroup, neutron diffusion theory reactor code for the IBM 709 or 7090. Lawrence Radiation Laboratory Report UC RL-6076. Livermore, California, 1960; Supplement 1, Tho 4KANGIE code, 1962. 140. Stone, S., and Lingenfelter, R., Neutron diffusion theory programs and their applications to simple critical systems, in Codes f o r Reactor Computations. International Atomic Energy Agency, Vienna, 1961. 141. Peterson, R. E., and Wolf, W. H., Two-dimensional flux and power calculations for the PRTR. Hmnford Atomic Product Operations Report HW-G724G. Richland, Washington, 1960. 142. Hassitt, A., A computer program to solve the multigroup diffusion equation, U . K . Atomic Eneryy Authority Report T R G 229(R). Rislcy, Lancashire, England, 1962. 143. Segal, B. M., and Volk, E. R., Recent advances in onc- and two-dimensional multigroup diffusion computation. Trans. Am. Nucl. Soc. 6, 9-10 (1963). 144. Garg. S. B., and Palmer, R . G., Some effects of power flattening on the Am. Nucl. Soc. 6, 78 (1963). neutronics of a large, fast, oxide reactor. Tra7~s. 145. Wachspress, E. L., CURE, a generalized two-space-dirnension multigroup coding for the IBM 704. Knolls Atomic Power Laboratory Report K A P L - 1 7 4 2 . Schenectady, New York, 1957. 146. Trentham, Jr., F. M., CURE and CUREM: new two-space-dimension multigroup neutron diffusion codes for the IBM 704/7090, in Codes f o r Reactor Computations. International Atomic Energy Agency, Vienna, 1961. 146a. Wachspress, E. L., and Habctler, G. J., An alternating-direction-implicit iteration technique. J . Soc. l a d . A p p l . Math. 8, 403-423 (1960). IRM 704 code for the 147. Tobias, M. L., and Fowler, T. B., EQUIPOISE-an solution of the two-group, two-dimensional, neutron diffusion equations in cylindrical geometry. Oak Ridge National Laboratory Report 0 RNL-2967. Oak Ridge, Tennessee, 1960.
334
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
148. Fowler, T. B., and Tobias, M. L., EQUIPOISE-3, a two-dimensional, twogroup neutron diffusion code for the IBM 7090 computer. Oak Ridge, Tennessee, 1961; Nestor, Jr., J. W., EQUIPOISE-YA, Oak Ridge National Laboratory Report ORNL-3199, Addendum, Oak Ridge, Tennessee, 1962. 149. Fletcher, J. L., Jewett, J. P., Reilly, Jr., E. D., FLEER, a two-dimensional triangular mesh diffusion program for the IBM 704. Knolls Atomic Power Laboratory Report K A P L - 2 0 8 6 . Schenectady, New York, 1960. 150. Wachspress, E . L., Two-dimensional r-0 multigroup calculations, Knolls Atomic Power Laboratory Report K A P L - 1 6 4 1 . Schenectady, New York, 1954. 151. Bilodeau, G. G., Cadwell, W. R., Dorsey, J . P., Fairey, J. G., and Varga, R. S., PDQ- an IBM 704 code to solve the two-dimensional few-group neutron dihsion equations. Bettis Atomic Power Laboratory Report W A P D T M - 7 0 . Pittsburgh, Pennsylvania, 1957. 152. Graves, Jr., H. W., and Janz, R. F., Comparison of prediction with experiment, reactivity and control rod worths. Trans. Am. Nucl. SOC.4, 97 (1961). 153. Klann, P. G., Dechand, C. O., Fein, E., and Gormley, M. F., Description and analysis of further critical experiments in superheater geometries. Trans. A m . Nucl. Sac. 5, 70 (1962). 154. Poncelet, C. G., McGaugh, J. D., and Graves, Jr., H. W., Reactivity calculations in Yankee Core 1, Trans. A m . Nucl. SOC.5, 120-121 (1962). 155. Eich, W. J., Williams, Jr., H. T., Cairns, J. L., and Minton, G. H., Analysis of critical control rod configurations in the Yankee Reactor. Trans. Am. NucZ. SOC.6, 252-253 (1963). 156. Cadwell, W. R., PDQ-3, a program for the solution of the neutron diffusion equations in two dimensions on the IBM 704. Bettis Atomic Power Laboratory Report W A P D - T M - 1 7 9 . Pittsburgh, Pennsylvania, 1960. 157. Cadwell, W. R., PDQ-4, a program for the solution of the neutron diffusion equations on the Philco-2000. Bettis Atomic Power Laboratory Report W A P D - T M - 2 3 0 . Pittsburgh, Pennsylvania, 1960. 158. Kepes, J. J., and Mitchell, J. A., Use of gadolinium septa in slab cores. Trans. A m . Nucl. SOC.5, 81-82 (1962). 159. Cadwell, W. R., PDQ-5, a FORTRAN program for the solution of the twodimensional neutron diffusion problem. Part 1. Steady-state version. Eiettis Atomic Power Laboratory Report W A P D - T M - 3 6 3 . Pittsburgh, Pennsylvania, 1963. 160. Hageman, L. A. , Numerical methods and techniques used in the twodimensional neutron diffusion program PDQ-5. Bettis Atomic Power Laboratory Report W AP D - T M- 3 6 4 . Pittsburgh, Pennsylvania, 1963. 161. Varga, R. S., Numerical solution of the two-group diffusion equation in x-y geometry. I R E Trans. Nucl. Sci. NS-4, 52-62 (1957). 162. Tobias, M. L., and Fowler, T. B., The TWENTY GRAND program for the numerical solution of the few-group neutron diffusion equations. Oak Ridge National Laboratory Report 0 R NL-3200. Oak Ridge, Tennessee, 1962. 163. Hassitt, A., A programme for solving the multigroup neutron diffusion equations in two space dimensions on the Ferranti Mercury computer. Atomic Energy Research Establishment Report AERE TIR 2487. Harwell, Berkshire, England, 1958. 163a. Hassitt, A., Additional notes on a two space dimension multigroup program for the Mercury computer. Atomic Energy Research Establishment Report AERE TIR 2859. Harwell, Berkshire, England, 1959.
335
ELIZABETH CUTHILL
164. Kristiansen, G. K., Description of DC-2, a two-dimensional, cylindrical geometry two-group diffusion theory code for DASK, and a discussion of tho theory for such codes. Atomenergikommissionen Forssgsinstitut RISO-55. Riser, Denmark, 1963. 165. Alexander, J. H., Cyl-Champlin, C., Gratteau, J. E., Joenou, J. E., Kaestner, P. C., and Leshan, E. J., DI3B-a two-space dimension multigroup burnup program. Trans. Am. Nucl. Soc. 4, 81 (1961). 166. Moorhoad, T. P., The effects of errors in cross-section data on calculations for a largc dilute fast reactor, in Physics of Fast and Intermediate Reactors, I I . International Atomic Energy Agency, Vienna, 1962. 167. Moineraeu, P., and Solanes, M., Calculs multigroups dcs piles rapides, in P h p i c s of Past and Intermediate Reactors, I I . International Atomic Energy Agency, Vienna, 1962. 168. Radkowski, A., Physics characteristics of seed-blanket lattices, in Light Water Lattices. International Atomic Energy Agency, Vienna, 1962. 169. Alexander, J., and Givcns, N., A machine multigroup calculation-The EYEWASH program for UNIVAC. Oak Ridge National Laboratory Report O R N L 1925. Oak Ridge, Tennessee, 1955. 170. Habetler, G., One-space-dimensional multigroup for the IBM 650, Part I, Equations. Ki~olls Atomic Power Laboratory Report K A P L - 1 4 1 5 . Schenectady New York, 1955. 171. Walbran, V. A., One-space-dimensional multigroup for the IBM 650. Part 11. Machine program. Knolls Atomic Power Laboratory Report K A P L - 1 5 3 1 . Schenectady, New York, 1956. 172. Bohl, Jr., H., Gelbard, E., and Suarez, R., A few group one-dimensional code for the IRM 650. Bettis Atovnic Power Laboratory Report W A P D - T M - 3 . Pittsburgh, Pennsylvania, 1956. 173. Franklin, J., and Leshan, E., A multigroup, multiregion, one space dimensional code using neutron diffusion theory. American Standard Atomic fhergy Division Report A S A E - 4 , 1956. 174. Marlowe, 0. J., Saalbach, C. P., Culpepper, L. M., and McCarty, D. S., WANDA-a one-dimensional f~w-groupdiffusion code for the IBM 704. Bettis Atomic Power Laboratory Report W A P D - T M - 2 8 . Pittsburgh, Pennsylvania, 1956; Addendum by 0. J. Marlowe and E. M. Gelbard, 1957; Adden,dum 2 by 0 . J. Marlowe, 1959. 175. Marlowe, 0. J., and Suggs, M. C . , WANDA-5, a one-dimensional neutron diffusion equation program for the Philco-2000 computer. Bettis Atomic Power Laborutory Report W A P D - T M -241. Pittsburgh, Pennsylvania, 1960. 176. Stuart, R. N., Canfield, E. H., Dougherty, E. E., and Stone, S. P., ZOOM, a one-dimensional, multigroup, neutron diffusion theory reactor code for the IBM 704. Lawrence Radiation Laboratory Report UCRL-5293.University of California, Livermore, California, 1958. 177. Stone, S. P., Collins, E. T., and Lenihan, S. R., 9-ZOOM, a one-dimensional, multigroup, neutron diffusion theory reactor code for the IBM 709. Lawrence Radiation Laboratory Report UCRL-5682. University of California, Livermore, California, 1959; Supplement 1, 1960. 178. Cole, A. G., Experiments involving gaps in critical assemblies. Trans. A m . Nuel. SOC.5 , 79-80 (1962).
336
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
179. Flatt, H. I'., and Ballrr, D.. AIM-5. a mult,igroiip, one-dimensional diffusion equation code. Atomics International Report NAA-SR-4694. North American Aviation Company, Canoga Park, California, 1960. 180. Fischcr, P. G., Wenstrup, F. D., and Hoffman, T. A., Program ODDa one-dimensional multigroup code for the IBM 7090 (ANP Program No. 657). General Electric Nuclear Materials and Propulsion Operation Report APEX-702. Cincinnati, Ohio, 1961. 181. Flatt, H. P., The FOG one-dimensional diffusion equation codes. Atomics International Report NAA-SR-6104, North American Aviation Company, Canoga Park, California, 1961. 182. Flatt, H. P., and Baller, D. C., The A I M - 6 code. Atomics International N A A Program Description. North American Aviation Company, Canoga Park, California, 1961. 183. Baller, D. C., The FAIM code, a multigroup, one-dimensional diffusion equation code. Atomics International Report A M T 0-118. North American Aviation Company, Canoga Park, California, 1962. 184. Vargofcak, TI.,The ULCER code-a multigroup, one-dimensional diffusion equation code with upscatter. Atomics International Report N A A - S R -7146. North American Aviation Company, Canoga Park, California, 1962. 185. Rhoades, W. A., and Vargofcak, D., ULCER and QUICKIE, multigroup diffusion theory codes for thermalization studies. Trans. Am. Nucl. Soc. 5, 92 (1962). 186. Shain, R . B., and Green, W. B., The PDS code, a one-dimensional multigroup neutron diffusion codo. Douglas Aircraft Report SM-41859. Santa Monica, California, 1962; see also Shain, R. B., and Green, W. B., Trans. A m . Nucl. Soc. 4, 256 (1961). 187. Lenihan, S. R. , GAZE-a one-dimensional multigroup diffusion theory code for the IBM 7090. General Atomics Report GA-3152. Sen Diego, California, 1962. 188. Lcnihan, S., Joanou, G., and Leshan, E., Some convergence studies of GAZE-a new one-dimensional, multigroup, neutron diffusion theory code for the IBM 7090. Trans. Am. Nucl. SOC.254-255 (1961). 189. Todt, F., FEVER-a one-dimensional few group depletion program for reactor analysis. Gen,eral Atomics Report GA-2749. San Diego, California, 1962. 190. Hunter, C. H., and Heffley, R. A., WHY-a one-dimensional few-group diffusion code for the IBM 1620. Knolls Atomic Power Laboratory Report K A P L - M - C T - 1 . Schenectady, New York, 1962. 191. Federighi, F. D., RAUM-solution of one-dimensional coupled diffusiontype equations on the Philco-2000. Knolls Atomic Power Laboratory Report K A P L - M - F D F - 1 .Schenectady, New York, 1962. 192. Replogle, J., MODRIC: a one-dimensional neutron diffusion code for the IBM 7090. Oak Ridge Gaseous Diffusion Plant Report K-1520. Oak Ridge, Tennessee, 1962. 193. Campise, A. V., Analysis of first AETR critical assembly containing a thorium carbide-uranium-233 test core mockup. Trans. A m . Nucl. SOC. 5, 69-70 (1962). 194. Colston, B. W., Experiments and analysis of water-reflected undermoderated zirconium-hydride critical assemblies-Part 11:Analysis. Trans. Am. Nucl. SOC.6, 50-51 (1963).
337
ELIZABETH CUTHILL
195. Teitel, R . J., and Brown, Jr., J. B., Breeding potential of the liquid metal breeder (LIMB) reactor. Trans. A m . Nucl. SOC.6, 255-256 (1963). 196. Mountford, L. A., and Kistler, V. E., Critical mass calculations of a multispectrum critical assembly. Trans. Am. Nucl. SOC.5, 68 (1962). 197. Gonzalez, A. A., and Ruth, B. H., A method for numerical solution of the neutron-diffusion equations in tridircctional (hexagonal) geometry with a variable spatial mesh. Trans. A m . Nucl. Soc. 6, 282-283 (1963). 198. Kantorovich, L. V., and Krylov, V. I., Approximate Methods of Higher Analysia (translated from the Russian by C . D. Benster). Wiley (Interscience), New York, 1958. 199. Forsythe, G . E., and Wasow, W. R., Finite Difference Methodsfor Partial Differential Equations. Wiley, New York, 1960. 200. Birkhoff, G., and Varga, R. S., Reactor criticality and non-negative matrices. J . SOC.Ind. Appl. Math. 6, 354-377 (1958). 201. Varga, R . S., Numerical methods for solving multi-dimensional multigroup diffusion equations, in Proceedings of Symposia in Applied Mathematics, Volume X I , Nuclear Reactor Theory (G. Birkhoff and E. P. Wigner, eds.). American Mathematical Society, Providence, Rhode Island, 1961. 202. Flanders, D. A , , and Shortley, G., Numerical determination of fundamental modes. J . Appl. Phys. 21, 1326-1332 (1950). 203. Stiefel, E., Kernel Polynomials in Linear Algebra and Their Numerical Applications, National Bureau of Standards Applied Mathematics Series 49. U.S. Govcrnment Printing Office, Washington, D.C., 1958. 204. Young, D. M., Iterative methods for solving partial difference equations of elliptic type. Trans. Am. Math. SOC. 76, 92-111 (1954). 205. Arms, R., Gates, L., and Zondek, B., A method of block iteration. J. SOC. Ind.'Appl. Math. 4, 220-229 (1956). 206. Cuthill, E., and Varga, R. S., A method of normalized block itoretion. J. Assoc. Computing Machinery 6, 236-244 (1959). 207. Perter, S. V., On two-line iterative methods for Laplace and biharmonic difference equations. Numer. Math. 1, 240-252. (1959). 208. Heller, J., Simultaneous, successive, and alternating direction iteration schemes. J . SOC.Ind. Appl. Math. 8, 150-173 (1960). 209. Varga, R. S., Factorization and normalized iterative methods, in Boundary Problems in Differential Equations (R. E. Langer, ed.). Univ. of Wisconsin Press, Madison, Wisconsin, 1960. 210. Golub, G. H., and Varga, R. S., Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods, Part 1 . Numer. Math. 3, 147-156 (1961); Part 2. Ibid. 3, 157-168 (1961). 211. Hageman, L. A., Block iterative methods for two-cyclic matrix equations with special applications to the numerical solution of the second-order self-adjoint elliptic partial differential equations in two dimensions, Thesis, Univ. of Pittsburgh, 1962; also Bettis Atomic Power Laboratory Report W A P D - T M - 3 2 7 . Pittsburgh, Pennsylvania, 1962. 212. Arai, K., Noda, T., and Terasawa, S., On the convergence of diffusion codes. Trans. Am. Nucl. SOC.6, 277-279 (1963). 213. Birkhoff, G., Varga, R. S., and Young, D., Alternating direction implicit methods, in Advances in Computers (F. L. Alt. and M Rubinoff, eds.), Volume 3. Academic Press, New York, 1962.
338
DIGITAL COMPUTERS I N NUCLEAR REACTOR DESIGN
214. Douglas, J., Alternating dircction methods for three space variables. Numer. Math. 4, 41-63 (1962). 215. Douglas, J., Kellogg, R. B., and Varga, R. S., Alternating direction iteration methods for n space variables. Math. of Computation 17, 279-282 (1963). 216. Tobias, M., Vondy, D. R., and Fowler, T. B., A note on a simple method for accelerating finite difference group diffusion calculations. Nucl. Sci. Eng. 15, 98-99 (1963). 217. Wachspress, E. L., A numerical technique for solving the group diffusion equations. Nucl. Sci. Eng. 8, 164-170 (1960). 218. Minglc, J. O., Boundary conditions for the even-order spherical harmonics method. Tram. Am. Nucl. SOC.6, 1 (1963). 218a. Gelbard, E. M., Davis, J. A., and Pearson, J . , Iterative solutions of the P , and double P , equations. Nucl. Sci. Eng. 5, 36-44 (1959). 219. Culpcppcr, L. M., Gelbard, E. M., Davis, J. A,, and Pearson, J., The IBM 704 SIMPL Codes. Bettis Atomic Power Laboratory Report W A P D - T M - 1 0 7 . Pittsburgh, Pennsylvania, 1958. 220. Anderson, B. L., Davis, J. A., Gelbard, E. M., and Jarvis, P. H., FLIP, an IBM 704 code to solve the P , and double P , equations in slab geometry. Bettis Atomic Power Laboratory Report WAPD-TM-134. Pittsburgh, Pennsylvania, 1959. 221. Bohl, H., Gelbard, E., Bucrger, P., and Culpepper, G., SLOP-1, a thermal multigroup P-1 code for the IBM 704. Bettis Atomic Power Laboratory Report WAPD-TM-188. Pittsburgh, Pennsylvania, 1960. 222. Goldsmith, M., and Cantwell, R. M., Few group calculations of thermal neutron transport. Nucl. Sci. Eng. 10, 207-218 (1961). 223. Cantwell, R. M., M0150, A FORTRAN program to solve the double P-3 equations in slab geometry. Bettis Atomic Power Laboratory Report W AP D TM-315. Pittsburgh, Pennsylvania, 1960. 224. Cantwell, R. M., M0176, a FORTRAN program to solve several P-approximations to the few group neutron transport equation in slab geometry. B e t h Atomic Power Laboratory Report W A P D -TM-320. Pittsburgh, Pennsylvania, 1962. 225. Gelbard, E . M., Application of spherical harmonics method to reactor problems, in Codes for Reactor Computations. International Atomic Energy Agency, Vienna, 1961; also Bettis Atomic Power Laboratory Report W A P D BT-20. Pittsburgh, Pennsylvania. 1960. 226. Davis, J., Gelbard, E., and Pearson, J., Solution of the P , equations in oneand two-dimensions. Nucl. Sci. Eng. 6, 251-252, 1959. 227. Anderson, B., Davis, J., Gelbard, E., Jarvis, P., and Pearson, J., CLIP-1, an IBM 704 code to solve the P-3 equations in cylindrical geometry. Bettis Atomic Power Laboratory Report WAPD-TM-207. Pittsburgh, Pennsylvania 1962. 228. Gelbard, E., Davis, J., Dorsey, J., Mitchell, H., and Mandel, J., TRIP-1, a two-dimensional P - 3 code in X - Y geometry. Bettis Atomic Power Laboratory Report WAPD-TM-217. Pittsburgh, Pennsylvania, 1960. 229. Kofink, W., Studies of the spherical harmonics method in neutron transport theory. Oak Ridge National Laboratory Report 2358. Oak Ridge, Tennessee, 1957. 230. Sauer, A., A modification of the spherical harmonics method in neutron transport theory. Oak Ridge National Laboratory Report 2456. Oak Ridge, Tennessee, 1958.
339
ELIZABETH CUTHILL
231. Wick, G. C.. Uber ebene Diffusionnprohleme. 2. Physik. 121, 702-718 (1943). 232. Chandrasekhar, S., On the radiative equilibrium of a stellar atmosphere. 11. Astrophys. J . 100, 76-86 (1944). 233. Richtmyer, R. D., Difference Methods for Initial- Value Problems. Wiley (Interscience), New York, 1957. 234. Bareiss, E. H., Flexible transport theory routincs for nuclear reactor design. David Taylor Model Basin Report 1030. Washington D.C., 1956. 235. Dawson, C. W . , and Bareiss, E. H., RDR 4, instruction manual o f the David Taylor Model Basin one-dimensional, one-group transport theory routine No. 4 for nuclear reactor design. David Taylor Model Basin Report 1100. Washington, D.C., 1956. 236. Dawson, C. W., Multigroup transport code RDR 5. David Taylor Model Basin Report 1450. Washington, D.C., 1960. 237. Gast, R. C., A P - 9 multigroup method for solution of thc transport equation in slab geometry. Rettis Atomic Power Laboratory Report W A P D - 2 3 2 . Pittsburgh, Pennsylvania, 1960. 238. Dawson, C. W., Multigroup Transport Code RDR 6. David Taylor Model Basin Report (in preparation). Washington, D.C. 239. Dawson, C., Thermal energy transport code TET. David Taylor Model Basin Report 1613. washington, D.C., 1962. 240. Francis, N. C., and Edgar, K. R., Miiltigroup treatment of thermal neutrons, in Reactor Technology Report No. 19-Physics. Knolls Atomic Power Laborutory Report R A P L - 2 0 0 0 - 1 6 ,pp. 11.9-11. 16. Schenectady, New Y o r k , 1961. 241. Hagernan, L. A,, and Mandel, J. T., RANCH-an IBM 704 program used to solvc the one-dimensional, single energy neutron transport equation with anisotropic scattering. Bettis Atomic Power Laboratory Report W A P D TM-268. Pittsburgh, Pennsylvania. 196 1. 242. Preiser, S., Rabinowitz, G., and de Dufour, E., A program for the numerical integration of the Boltzmann transport equation-NIOBE. A R L Technical Report 60-314. Nuclear Development Corporation of America, White Plains, New York, 1960. 243. Carlson, B. G., and Bell, G. I., Solution of the transport equation by the S, method, in Proceedings of the Second United Nations International Con. ference on the Peaceful Uses of Atomic Energy, 1958, Vol. 16, pp. 535549. CERN, Geneva, Switzerland, 1958. 244. Carlson, B. G., The S, method and SNG codes. Los Alamos Scientific Laboratory Report L A M S 2201. Los Alamos, New Mexico, 1958. 245. Lemke, B., FORTRAN SNG code, NAA Program Description. Atomics International Report TID-18116. North American Aviation Company, Canoga Park, California, 1959. 246. Carlson, B. G., Numerical solution of transient and steady neutron transport problcms. Los Alamos Scientific Laboratory Report LA-2260. Los Alamos, New Mexico, 1959;also Numerical solution of transport problems, in Proceedings of Symposia in Applied Mathematics, Volume X I , Nuclear Reactor Theory (G. Birkhoff and E. P. Wigner, eds.). American Mathematical Society, Providence, Rhode Island, 1961. 247. Carlson, B., Lee, C., and Worlton, J., Thc DSN and TDC neutron transport codes. Los Alumos ScientiJc Laboratory Report 2346. Los Alamos, New Mexico, 1960.
340
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN 248. Bengston, J., Perkins, S. T., Sheheen, T. W., and Tompson, D. W., 2 D X Y a two-dimensional Cartesian coordinate S , transport calculation. AerojetGeneral Nucleonics Report A G N - T M - 3 2 9 . San Ramon, California, 1961. 249. Lemke, B. J., SAIL, NAA Program Description. Atomic International Report TID-18117. North American Aviation Company, Canoga Park, California, 1961. 250. Francescon, S., The Winfrith DSN program. Atomic Energy Establisliment Report A E E W - R - 3 7 3 . Winfrith, Dorsct, England, 1963. 251. Gitter, L., Szac, M., and Yiftah, S., The DSN code for the Philco-2000 computer. Israel Atomic Energy Commission Report I A - $ 6 1 ( A p p . ) .Rehovoth Israel, 1963. 252. Hinman, G. W., The DSN and GAPLSN codes. Genera2 Atomics Report GAMD-3425. San Diego, California, 1962. 252a. Alexander, J. H., and Hinman, G. W., Anisotropic scattering in DSN. Trans. A m . Nucl. SOC.5, 408 (19G2). 253. Putnam, G., and Shapiro, D., MIST (multigroup internuclear slab transport). Internuclear Company Report I N T E R N UC-67. Clayton, Missouri, 1961. 254. Duane, B. H., Ncntron and photon transport, plane-cylinder-sphere, G E -APr'PD program S, variat,ional optimum formulation. General Electric Hanford Atomic Products Operation Report X D C 59-9-118. Richland, Washington, 1959. 254a. Honeclr, H. C., THERMOS, a thermalization transport theory code for reactor lattice calculations. Brookhaven National Laboratory Report B N L 5826. Upton, New York, 196i. 255. Schmidt, E., A stable non-itcrative solution to the discrete ordinate equations. Trans. A m . Nucl. Soc. 6, 8-9 (1963). 256. Blue, E., and Flatt, H. P., Convergence of t h e 8 method for thermal systems h'ucl. S C ~E,?g. . 7, 127-132 (1960). 257. Keller, H. B., Approximate solutions of transport problems 11, convergence and applications of the discrete ordinate method. J . Soc. I n d . A p p l . Math. 8, 43-73 (1960). 258. Keller, H. B., On the pointwise convergence of the discrete ordinate method J . Soc. Incl. AppZ. Math. 8, 560-567 (1960). 259. Keller, H. B., Convergence of the discrete ordinate method for the anisotropic scattering t,ransport equation, in Proceedings of the Rome Symposium on Numerical Treatme)it of Ordinary Differential Equations, Integral, and Integro-Differential Equations. Birkhauser, Basel, 1960. 260. Campise, A . V., Acciiracy of the S, code in all calculations. Nucl. Sci. Eng. 7, 104-110 (1960). 261. Pendelbnry, E. D., and Underhill, L. H., The validity of the transport approximation in critical-size and reactivity calculations, in Physics of Fast and Intermediate Reactors I I . International Atomic Energy Agency, Vienna, 1962. 262. Meneghetti, D., Discrete ordinate quadratures for thin slab cells. Nucl. Sci. Eng. 14, 295-303 (1962). 263. Reactor Physics and Mathemat,ics Technical Progress Report for the Period December 1, 1959 to March 1, 1960. Bettis Atomic Power Laboratory Report W A P D - M R J - 9 . Pittsburgh, Pennsylvania, 1960.
341
ELIZABETH CUTHILL
263a. Goodjohn, A. J., Joanou, G. D., and Wikner, N. F., On the physics of semi-homogeneous graphite U-235 reactors. Trans. A m . Nucl. SOC. 5, 36,77368 (1962). 263b. Cormlcy, M. F., Dechand, C. O., Klann, P. G., and Visner, S., Description and analysis of critical cxperiments with annular multitube superheater fuel elemonts. l’rutcs. A m . Nucl. SOC.5, 342--343 (1962). 264. Best, C . H., Carmichael, B. M., and LeBauve, R. J., Two-dimensional 8, calculations for LAMPRE I. Trans. A m . Nucl. Soc. 6, 11-12 (1963). 265. Joanou, G. U., and Kazi, A. H., The validity of the transport approximation in fast react,or calculations. Trans. Am. Nucl. SOC.6, 17-18 (1963). 266. Lacy, P., and Lewellan, J., Comparison of measured and calculated thermal activation rates for superheater clustered pin lattices. Trans. A m . Nucl. SOC. 6, 32 (1963). 267. Shim, K., P-3 niultigroup calculations of neutron antenuation. Trans. A m . Nucl. Soc. 6, 190 (1963). 268. Vrrbinski, V.V., Fast-neutron transport in LiH. Trans. A m . Nucl. SOC.6, 190-191 (1963). 269. Marshall, A. W., An introductory note, in Symposium on Monte Carlo Methods (H. A. Meyer, ed.). Wiley, Ncw York, 1954. 270. Richtmyer, R. D., Monte Carlo methods, in Proceedings of Symposia in Applied Mathematics, I’olume X I , Nuclear Reactor Theory (G. Birkhoff and E. P. Wigner, eds.). American Mathematical Society, Providcncc, Rhode Island, 1961. 271. Kahn, H., Applications of Monte Carlo. U.S. Atomic Energy Commission Report AECU-3259. Rand Corporation, S a n h Monica, California, 1954, Rev., 1956. 272. Parker, J. B., Monte Carlo methods for neutronics problems, in Numerical Solution of Ordinary a r d Partial UL8erential Equations (L. Fox, ed.). Porgamon Press, New York, 1962. 273. Spanier, J., A unified approach to Monte Carlo methods and their application t o a niultigroup calciilation of absorption rates.SIAM Rev. 4,115-&34 (1962). 274. Spanicr, .J., Monte Carlo methods and thcir applications to neutron transport problems. Bettis Atomic Power Laboratory Report WAPD-195. Pittsburg, Pennsylvania, 1969. 275. Methods in Cornpulatiorial Physics, Volirnie 1, Statistical Physics (B. Adler, S . Fernbach, and M. Rotenberg, eds.). Academic Press, New York, 1963. 276, Syn~poaiumon Monte Carlo Methods (H. A. Meyer, cd.). Wiley, New York, 1956. 277. Hammersley, J. M., Monte Carlo methods for solving rnultivariable problems. An>%. N . Y . A c ~ t fS. C ~86, . 844-874 (1960). 278. Kalos, M. H., and Wilf, H. S., Monte Carlo solves reactor problems. Nucleattics 15, NO. 6, 64-68 (1967). 279. Goertzel, G., and Kalos, M. H., Monte Carlo methods in transport, problems, in Progress of Nuclear Energy, Volume 2, Series 1. Pergamon Press, New York, 1958. 280. Scott, R. L., Monte Carlo method, a literature search. U.S. Atomic Energy Commissiori Report TID-3514. Oak Ridge, Tennesee, 1959. 281. Fillmorc, F. L., Application of Monte Carlo calculations to SRG reactors. Am. NuclearSoc. Meeting on Nuclear Performance of Power Reactor Cores. San Francisco, California, 1963.
342
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
282. Levine, M. M., Resonance integral calculations for UZ3* lattices. Nucl. Sci. Eng. 16, 271-279 (1963). 283. Goldsmith, M., Epithermal neutron absorption in slab lattices. Nucl. Sci. Eng. 15, 382-387 (1963). 284. Amster, H., and Cast, R., Refined analyses of slowing down problems in water. Trans. A m . Nucl. SOC.4, 132 (1961). 285. Risti, H. A., Cleary, J. D., Jennings, B., and Minton, G. H., Experiments and analysis of microscopic parameters for single region and multi-region cores. Trans. A m . Nucl. SOC.4, 101 (1961). 286. Richtmyer, R . D., Monte Carlo study of resonance absorption in hexagonal reactor lattices, AEC Computing Facility Report N YO-6479. Institute of Mathematical Sciences, New York, University 1955. 287. Bushndl, D. L., A Monte Carlo calculation of the resonance escape probability of thorium in a homogeneous reactor. Trans. Am. Nucl. SOC.4,274-275 (1961). 288. Kellman, S., and Goldberg, E., Effect of resonance scattering on the cobalt absorption integral. Trans. Am. Nucl. SOC.5, 90-92 (1962). 289. Foell, W. K., Grimesey, R. A., and Tong, S., A Monte Carlo study of resonance absorption in gold and indium lumps. Trans. A m . Nucl. SOC.6, 272-273 (1963). 290. Rotenberg, A., Lapidus, A., and Richtmyer, R. D., A Monte Carlo calculation of thermal utilization. AEG Computing Facility Report NYO-7976. Institute of Mathematical Sciences, New York University, 1958. 291. Brown, H. D., A Monte Carlo study of neutron thermalization. J . Nucl. Energy 8, 177 (1959). 292. Rotenberg, A., Lapidus, A., and Wetherell, E., A Monte Carlo calculation of thermal utilization. Nucl. Sci. Eng. 6, 288-293 (1959). 293. Hogberg, T., Monte Carlo calculations of neutron thermalization in a heterogeneous system. J . Nucl. Energy 12, 145 (1960). 294. Berger, M. H., and Doggett, J., Reflection and transmission of gamma radiation by barriers: semianalytic Monte Carlo calculation. J . Res. Nutl. Bur. St. 56, 89-98 (1956). 295. Davisson, C. M., and Beach, L. A., A Monte Carlo study of back-scattered gamma radiation. Trans. Am. Nucl. SOC.5, 391-393 (1962). 296. Rooney, K. L., and Roberts, W. J., The application of Monte Carlo calculations to SNAP reactor shields. Trans. Am. Nucl. SOC.5, 216-217 (1962). 297. Raso, D. J., Monte Carlo calculations on the reflection and transmission of scattered gamma-rays. Nucl. Sci. Eng. 17, 411-418 (1963). 298. Leimdorfer, M., A Monte Carlo method for calculating the penetration and energy deposition of gamma radiation from distributed sources in laminated shields. Trans. A m . Nucl. SOC.6, 427-428 (1963). 299. Coveyou, R., Bate, R., and Osborn, R.,Effect of moderator temperature on neutron flux in infinite capturing medium. J . Nucl. Energy 2, 153 (1956). 300. Olhoeft, J. E., and Osborn, R. K., A Monte Carlo study of Doppler coefficients of reactivity for non-uniform temperature distributions. Trans. Am. NU&. SOC.5, 365-366 (1962). 301. Mathes, W., Monte Carlo calculation of the neutron temperature coefficient in fast reactors, in Third Symposium on Reactor Theory. Bad Neuheim, Germany, 1963.
343
ELIZABETH CUTHILL
302. Steelc, L. R., Carson, D., and Dryden, C. E., Solution to the fission recoil energy deposition in a slurry by a Monte Carlo technique. Trans. A m . Nucl. SOC.5, 407 (1962). 303. Kalos, M. H., On the estimation of flux at a point by Monte Carlo. Nucl. Sci. Eng. 16, 111-117 (1963). 304. Taussky, O . , and Todd, J., Gcneration and testing of pseudo-random numbers, in Symposium on Monte Carlo Methods ( H. A. Mcyer, ed.). Wiley, New York, 1954. 305. Martino, M. A., and Stone, W. W., TRAM, a Monte Carlo thermal neutron code for the IBM 704. Knolls Atomic Power Laboratory Report KAPL-2039. Schenectady, New York, 1959. 406. Pull, I. C., Special techniques of the Monte Carlo method, in Numerical Solution of Ordinary and Partial Uiflerential Equations (L. Fox, ‘ed.). Pergamon Press, Oxford, 1962. 307. Layno, S. B., Some useful techniques for Monte Carlo calculations and their effectiveness. Trans. A m . Nucl. SOC.5, 224 (1962). 308. Kalos, M. H., Importance sampling in Monte Carlo shielding calculations, 1. Neutron penetration through thick hydrogen slabs. Nucl. Sci. Eng. 16, 227-234 (1963). 309. Hammersley, J. M., Conditional Monte Carlo. J . Assoc. Computing Machinery 3,73-76 (1956). 310. Penny, S. K., and Zerby, C. D., Examination of the range of applicability of conditional Monte Carlo to deep penetration problems. Nucl. Sci. Eng. 10, 75-82 (1961). 311. Drawbaugh, D. W., On the solution of transport problems by conditional Monte Carlo. Nucl. Sci. Eng. 9, 185-197 (1961). 312. Drawbaugh, D. W., Optimum choice of the weight function in conditional Monte Carlo. Trans. A m . Nucl. SOC.5, 410-411 (1962). 313. Steinberg, H. A., Generalized quota sampling. Nucl. Sci. Eng. 15, 142-145 (1963). 314. Maynard, C. W., An application of the reciprocity theorem to the acceleration of Monte Carlo calculations. Nucl. Sci. Eng. 10, 97-101 (1961). 315. Richtmyer, R. D., A non-random sampling method, based on congruences, for “Monte Carlo” problems. AEC Computing Facility Report N YO-8674. Institute of Mathematical Sciences, New York University, 1958. 316. Kniedler, M., and Jordan, T., Generalized Monte Carlo program for neutrons. Martin Marietta Corporation Report M N D - M C -2856. Baltimore, Maryland, 1962. 317. Pfeiffer, R . A., Stone, W. W., and Tuecke, J. E., TRAM for the Philco-2000. Knolls Atomic Power Laboratory Report K A P L - M - R P C - 1 ,Rev. 1. Schenectady, New York, 1963. 318. Berwind, H. J., and Spanier, J., TRAC-1, a Monte Carlo Philco-2000 program for the calculation of capture propabilities. Bettis Atomic Power Laborutory Report W A P D - T M - 2 2 9 . Pittsburgh, Pennsylvania, 1961. 319. Kneidler, M. J., and Jordan, T., PLMCN-1, neutron Monte Carlo code for slab geometry. Martin Marietta Corporation Report MND-(7-2933. Baltimore, Maryland, 1963. 320. Kneidler, M. J., and Jordan, T., SPMCN-1, neutron Monte Carlo code for spherical geometry. Martin Marietta Corporation Report MND-4539. Baltimore, Maryland, 1963.
344
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
321. Aronson, R., Held, K., Klahr, C., and Steinberg, H., TRG-RS(N). Technical Research Group Report TRG-136-FR. Syosset, New York. 322. Kneidler, M. J., M-1, Monte Carlo radio isotope shielding code. MartinMarietta Corporation Report MND-P-2767. Baltimore, Maryland, 1962. 323. Steinberg, H., ABCD. Technical Research Group Report TRG-211-3-FR. Syosset, New York. 324. Johnston, R. R., A general Monte Carlo neutronics code. Los Alamos Scienti$c Laboratory Report LAMS-2856. Los Alamos, New Mexico, 1963. 325. Rief, H., An IBM 704 Monte Carlo code to calculate fast fission effects in homogeneous and heterogeneous systems. Brookhaven National Laboratory Report B N L 647 (T-206).Upton, New York, 1961. 326. Rief, H., MOCA 2-a multipurpose Monte Carlo code for fast effect calculations. Trans. Am. Nucl. Soc. 6, 12 (1963). 327. Coveyou, R. R., Sullivan, J. G., and Carter, H. P., The 05R code, ageneral purpose Monte Carlo reactor code for the IBM 704 computer, in Codes for Reactor Computations. International Atomic Energy Agency, Vienna, 1961. 328. Spanier, J., Kuehn, H., and Guilinger, W., TUT-T5, a two-dimonsional Monte Carlo calculation of capture probabilities for the IBM 704. Bettis Atomic Power Laboratory Report W A P D -TM-125.Pittsburgh, Pennsylvania 1959. 329. Spanier, J., The physics and mathematical analysis for the TUT-T5 Monte Carlo code. Bettis Atomic Power Laboratory Report W A P D -TM-186. Pittsburgh, Pennsylvania, 1960. 330. Amster, H., Kuehn, H. G., and Spanier, J., Euripus-3 and DaedalusMonte Carlo density codes for the IBM 704. Bettis Atomic Power Laboratory Report WAPD-TIM-205. Pittsburgh, Pennsylvania, 1960, 331. Gelbard, E. M., Ondis, H. B., and Spanier, J., MARC-a multigroup Monte Carlo program for the calculation of capture probabilities, Bettis Atomic Laboratory Report W A P D -TM-273. Pittsburgh, Pennsylvania, 1962. 332. Cantwell, R. M., M0322 and M0332-FORTRAN Programs for calculating neutron absorption in spheres distributed randomly. Bettis Atomic Power Laboratory Report W A P D - T M - 3 5 2 .Pittsburgh, Pennsylvania, 1962. 333. Loechler, J., and MacDonald, J., Flexible Monte Carlo programs FMC-N and FMC-G. General Electric Nuclear Materkzt8 and Propulsion Operation Report APEX-706. Cincinnati, Ohio, 1961. 334. Blaine, R. A., TYCHE, a Monte Carlo slowing down code. Atomics International Report N A A - S R -7357. North American Aviation Company, California, 1962. 335. Thomas, A. W., TTD, a Philco-2000 computer program for calculating two-dimensional, steady-state temperature distributions. Knolls Atomic Power Laboratory Report K A P L - M - E C - 5 .Schenectady, New York, 1961. 336. Boonstra, B. H., Hoff, F. W., and Struch, H. P., STRZ-a 7090 FORTRAN program for the steady-state temperature distribution in R-Z geometry with temperature and direction dependent heat conduction coefficients and radiation in narrow gaps. Reactor Centrum Nederland Report RCN-16. Amsterdam, Holland, 1963. 337. Fowler, T. B., and Volk, E. R., Generalized heat conduction code for the IBM 704 computer. Oak Ridge National Laboratory Report ORNL-2734. Oak Ridge, Tennessee, 1959.
345
ELIZABETH CUTHILL
338. Fowler, T. B., Generalized heat conduction code for the IBM 7090 computer. Oak Ridge National Laborutory Report CF-61-2-33.Oak Ridge,Tennessee, 1961. 339. Rhodes, H. H., THD 2-temperature distribution in a slab with internal hcet generation. Knolls Atomic Power Laboratory Report K A PL-M-EC-27. Schenectady, New York, 1962. from internal generation rates. Knolls 340. Briggs, D. L., Tiger-temperatures Atomic Power Luboratory Report K A P L - M - E C - 2 9 Schenectady, . New York, 1963. 341. Reynolds, W . C., Thompson, D. W., and Fisher, C. R., HECTIC, an IBM 704 computer program for heat transfer analysis of gas cooled reactors. AerojectGeneral Nucleonics Report AGN-TM-381. San Ramon, California, 1961. 342. Var, R. E., and Uthe, Jr., P. M., SEA LION, a time-dependent approximate areo-thermodynamic code to calculate axial temperature distribution of multiple conduits. Trans. A m . Nucl. Soc. 4, 12-13 (1961). 343. Bagwell, D., SIFT-an IBM 7090 code for computing heat distribution. Oak Ridge Gaseous Diffusion Plant Report K-1528. Oak Ridge, Tennessee, 1962. 344. Edwards, A. L., HARC and HEART THROB: computer programs for transient temperature distributions and chemical reactions in compositc solids. Lawrence Radiation Laboratory Report UC RL- 7069. University of California, Livermore, California, 1962. 345. Friedrich, C . M., SEAL-SHELL, a digital program to determine stresses and deflections in an axisymmetric shell of revolution. Bettis Atomic Power Laboratory Report W A P D -TM-277.Pittsburgh, Pennsylvania, 1961. 346. Mirabel, J. A., and Dight, D. G., SOR-11, a program to perform stress analysis of shclls of revolution. Knolls Atomic Power Laboratory Report K A P L - M - E C - 1 9 . Schenectady, New York, 1962. 347. Griffin, D. S., and Friedrich, C. M., Stresses and deflections in thick, curved plates, TCUP. Bettis Atomic Power Laboratory Report W A P D - T M-258. Pittsburgh, Pennsylvania, 1961. 348. Callaghan, J. B., Jarvis, P. H., and Rigler, A. K., BAFL-1, a program for the solution of thin elastic plate equations on the Philco-2000 computer. Betti8 Atomic Power Laboratory Report W A P D - T M - 2 5 5 .Pittsburgh, Pennsylvania, 1961. 349. Friedrich, C. M., Pressure, thermal, and vibrational loads and deflections in linear elastic structures-FORTRAN I1 programs CTAC and MODE. Bettis Atomic Power Laboratory Report W AFD-TM-259. Pittsburgh, Pennsylvania, 1961. 350. Friedrich, C. M., PTAASTIC-SASS--.acomputer program for stresses and dcflections in a reactor subassembly under thermal, hydraulic, and fuel expansion loads. Bettis Atomic Power Luboratory Report W A P D -TM-312. Pittsburgh, Pennsylvania, 1,963. 351. Redmond, R. F., and Hulbert, L. E., Numerical solutions of stress problems in reactor fuel elements. Preprint Puper No. 58. Engineers Joint Council, New York, 1962. 352. Cassidy, L. M., and McGratten, R. J., Computer techniques for stress analysis of reactor vessels. Preprint Paper No. 5 7 . Engineers Joint Council, New York, 1962. 353. Wunderlich, L. H., HAEMAT-steady state flow distribution program. Knolls Atowiic Power Laboratory Report K A P L - M - L X W - 1 . Schenectady, New York, 1962.
346
DIGITAL COMPUTERS IN NUCLEAR REACTOR DESIGN
354. Brown, B., and Lallier, K., Pass, a digital computer program for plant analysis, steady state. Knolls Atomic Power Laboratory Report K A P L - M P P A - 4 0 . Schenectady, New York, 1962. 355. Meyer, J. E., and Peterson, W. D., ART-04-a modification of the ART program for the treatment of reactor thermal transients on the IBM 704. Bettis Atomic Power Laboratory Report W A P D - T M - 2 0 2 . Pittsburgh, Pennsylvania, 1960. 356. Reihing, Jr., J. V., and Clarke, W. G., An appraisal of the MIM and ART differencing methods employing M0076, a digital frequency analysis program. Bettis Atomic PowerLaboratoryReport W A P D -TM-226.Pittsburgh, Pennsylvania, 1961. 357. Birkhoff, G., and Kimes, T. F., CHICK programs for thermal transients. Bettis Atomic Power Laboratory Report W AP D - T M - 2 4 5 . Pittsburgh, Pennsylvania, 1962. 358. Jordan, W. B., Comparison of methods of integrating thermal transport equation. Knolls Atomic Power Laboratory Report K A PL-2206. Schenectady, New York, 1962. 359. Meyer, J. E., and Williams, Jr., J. S., A momentum integral model for the treatment of transient fluid flow, in Bettis Technical Review-Reactor Technology, W AP D -B T - 2 5 . Pittsburgh, Pennsylvania, 1962. 360. Miller, R. I., and Pyle, R. S., TITE-a digital program for the prediction of two-dimensional, two-phase hydrodynamics. Bettis Atomic Power Laboratory Report W A P D - T M - 2 4 0 .Pittsburgh, Pennsylvania, 1962. 361. Rose, R. P., and Pyle, R. S., XITE-a digital program for the analysis of two-dimensional boiling flow transients with fluid expansion. Bettis Atomic Power Laboratory Report W A P D - T M - 3 0 2 .Pittsburgh, Pennsylvania, 1963. 362. Okrent, D., Cook, J. M., Satkus, D., Lazarus, R. B., and Wells, M. B., AX-1 , a computing program for coupled neutronics-hydrodynamicscalculations on the IBM 704. Argonne National Laboratory Report 5977. Argonne, Illinois, 1959. 363. Lehman, J. P., HATCHET-a coupled neutronics-hydrodynamicscode to calculate burst characteristics of a pulsed reactor. Trans. Am. Nucl. Soc. 3, 336-337 (1960); see also Aerojet-General Nucleonics Report AGN-237. San Ramon, California, 1960. 364. Mader, C. L., STRETCH SIN-a code for calculating one-dimensional reactive hydrodynamics problems. Los Alamos Scientific Laboratory Report TID-18571. Los Alamos, New Mexico, 1963. 365. Pruvost, N., and Kolar, 0. C., CONEC-a new coupled neutronics-elasticity theory code. Trans. Am. Nucl. SOC.5, 93-94 (1962). 366. Sandberg, R. O., CAT 11, an IBM 7090 code for predicting thermal and hydraulic transients in an open-lattice core. Westinghouse Atomic Power Division Report WCAP-2059. Pittsburgh, Pennsylvania, 1962. 367. Chezem, C. G., and Stratton, W. R., RAC-a computer program for reactor accident calculations. Los Alamos Scientific Laboratory Report LAMS-2920. Los Alamos, New Mexico, 1963. 368. Nahavandi, A. N., A digital computer analysis of loss-of-coolant accident for a multicircuit core nuclear power plant. Nucl.Sci. Eng. 14,282-286 (1962). 369. Nahavandi, A. N., and Axelson, B. H., A digital computer analysis of pressurized water nuclear power plant start-up using natural circulation. Preprint Paper N o . 55. Engineers Joint Council, New York, 1962.
347
ELIZABETH CUTHILL
370. Businaro, V. L., and Pozzi, G. P., A new approach on engineering hotchannel and hot spot factors: statistical analysis, Report E U R A E C -702. FIAT, Sezione Energia Nucleare, Turin, 1963; see also Trans. A m . Nucl. SOC.6, 13-15 (1963). 371. Murray, R. L., Hasnain, S. A,, and Mowery, Jr., A. L., Reactor fuel cycle analysis by series methods. Nucl. Sci. Eng. 6 , 18-25 (1959). 372. Hill, J., and Illingworth, J. M., PLATYPUS-IBM 7074 program for the calculation of lifetime and running costs of reactors. Rolls-Royce Report RRAIAPI102. Derby, England, 1962. 373. Toppel, B. J., Avery, R., and Fischer, G . J., CYCLE and COST, codes for fast reactor fuel cycle analysis and related cost evaluation. Trans. A m . NucZ. SOC.5 , 92-93 (1962). 374. Shanstrom, R., and Benedict, M., FUEL CYC, a new computer code for fuel cycle analysis Part I. ComputationalModel. NucLSci. Eng. 11,377-385 (1961). 375. Benedict, M., FUEL CYC, a new computer code for fuel cycle analysis. Part 11. Examples of applications. Nucl. Sci. Eng. 11, 386-396 (1961). 376. Eschbach, E., Deonigi, D., and Goldsmith, S., MINIMIZER-a computer code for determining minimum fuel cost. Hanford Atomic Products Operation Report H W-71813. Richland, Washington, 1961. 377. Eschbach, E., Deonigi, D., and Goldsmith, S., QUICK-a simplified fuel cost code. Hanford Atomic Products Operation Report H W-71812. Richland, Washington, 1961. 378. Blaine, R., AIMFIRE-a fuel economics code, Atomics International Report NAA-SR-6706. North American Aviation Company, Canoga Park, California, 1961; see also T r m s . A m . Nucl. SOC.4, 255-256 (1961). 379. Jaye, S., and Fowler, T. B., THOROBRED-an IBM 704 code for steady state nuclear and economic calculations of two-region homogeneous reactors. Oak Ridge National Laboratory Report CF-61-1-76.Oak Ridge, Tennessee, 1961. 380. Wade, J. W., A computer program for economic studies of heavy water power reactors. E . I , du Pont de Nemours Savanna River Laboratory Report DP-707. Aiken, South Carolina, 1962. 381. Heeatand, J., and Wos, L., Cost function studies for power reactors. Argonne National Laboratory Report ANL-6442. Argonne, Illinois, 1961. 382. Fischer, G. J., Avery, R., and Toppel, B. J., Fuel cycle studies for fast breeder reactors. Trans. A m . Nucl. SOC.5 , 131-132 (1962). 383. Gandini, A., Study of the sensitivity of calculations for fast reactors fuelled with P U ~ ~ ~ - and - U ~Uz33-Th ~* to uncertainties in nuclear data. Argonne National Laboratory Report ANL-6608. Argonne, Illinois, 1962. 384. Clemental, E., Comments on the study of the sensitivity of fast reactor calculations to uncertainties in nuclear data. Report EANDC ( E )-44 L. Comitato Nazionale per 1’Energia Nucleare, Bologna, 1963. 385. Adiutori, E. F., and Tedesco, A,, TEDDR-Thermal design data reduction. Knolls Atomic Power Laboratory Report K A P L - M - E C - 3 0 . Schenectady, New York, 1963, 386. Brazos, J. N., Liedel, A. L., Warrington, J. A., SPEC-a neutron spectrometer transmission data reduction program for the Philco-2000 digital computer. Knolls Atomic Power Laboratory Report K A P L - M - E C - 3 2 . Schenectady, New York, 1963.
348
A n Introduction to Procedure-Oriented Languages HARRY D. HUSKEY University of California. Berkeley. California and Indian Institute of Technology, Konpur, India
1. Introduction . . 2. The Evolution of Computer Languages . 3. A Typical Digital Computer . 4. A Language for Describing Computers . 5. A Simple One-Address Computer . 6. A Square-Root Example on the One-Address Computer . 7. Relocatability . 8. An Assembly Program . , 9. The Square-Root Example in Assembly Language . 10. An Algebraic Language Translator . 11. Alternative Methods of Translation . 12. Algorithmic Languages . 13. Comparison of Features of Algorithmic Languages . 13.1 Identifiers and Numbers . . 13.2 Expressions . 13.3 Statements . 13.4 Types of Variables . . . 13.5 Data Structures 13.6 Program Structure . . 14. Some Special Languages. . 14.1 SIMSCRIPT: A Simulation Programming Language . 14.2 COMIT: A Programming Language for Mechanical Translation 15. Summary . References .
. . . . . . .
. . . .
. . . .
.
. . . . .
.
. .
349 350 353 353 357 358 360 361 362 363 367 368 369 369 369 369 373 374 374 314 374 375 375 376
1. Introduction The present chapter is intended to serve a twofold purpose: first, to show how the concepts and terminology of computer languages can be introduced in rigorous fashion and without reference to specific hardware; and secondly, to survey briefly the present state of the a r t by noting some of the common features and differences of the more commonly used languages. Sections 3 to 11 are devoted to the former aim. It appears that such an introduction to the subject might well 349
HARRY
D. HUSKEY
present the ideas in the order in which they arose historically. To make this clear, the sequence of developments which have led to today’s procedure-oriented languages is sketched in Section 2. There follows a description of a computer, of several levels of language, and of the I < processors” used to reduce languages of higher order to the simplest level of machine language. The description of the computer and, hence, the processors, employs a language (introduced in Section 4) which has been created for this paper but which is a close analog of the procedure-oriented languages themselves. To keep this paper within bounds the semantics of this “metalanguage’’ is treated somewhat briefly, considerable dependence being placed upon the educational background of the reader. The language is, both in its structure and in the method of its introduction, a close relative of the ALQOLfamily of computer languages, especially of NELIAC. The survey of procedure-oriented languages given in Sections 12 to 14 is limited by the omission of languages designed primarily for businesstype problems, since this class of languages is well covered in [ I ] . Two examples of special-purpose languages are briefly described in Section 14, mainly to illustrate what will likely be a future trend-the development of languages for special fields of application. 2. The Evolution of Computer Languages
With surprising insight, Mr. F. L. Manbrea [2], writing in 1842 about Babbage’s analytical engine, talked in terms of consigning the translation of algebraic formulas to the analytical engine. Thus, he was the first man to foresee the development, in just the last decade, of problemoriented or procedure-oriented languages. Since Babbage was unable to complete his analytical engine, other developments in these directions did not occur until approximately twenty years ago. One of the significant developments of two decades ago was that carried on under the direction of Dr. A. M. Turing at the National Physical Laboratories in England. There, with the development of a stored program computer called the Automatic Computing Engine, a simultaneous study of languages suitable for expressing problems on the computer was carried on. Most of the work was centered upon an interpretive programming system making use of so-called extended code instructions. In recent years, this concept has been called “macros.” A macro represents a sequence of individual machine activities or instructions referred to by a single name. There was also the use of functional type notations to indicate the things that were to be done by the computing machine. 350
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
With the development of stored program computers in the 1950s, attention again turned to the development of languages suitable for describing problems to be done on an automatic computing machine. I n the beginning, problems were spelled out in the individual machine language commands of the particular computer. However, it was soon obvious that there would be considerable advantage if standard procedures, or standard functioas, could be programmed in such a way that these programs could be used over and over again by many different people. This meant that it was not always convenient for each user t o locate the respective program in the same part of memory, and thus led to the concept of relocatability. Also, the detailed machine language of the computer was usually a compact binary representation. Therefore, as soon as it became necessary to translate locations in order to get the facility for locating them in a variety of positions in memory, it also became convenient to translate from a mnemonic representation of the various machine operations into this compact binary form. Hence, for example, addition might be represented by the letters ADD. A machine program would accept this and substitute for the representation of “ADD” the corresponding machine binary representation. This development was called Symbolic Coding, and the programs which did the translations were called Symbolic Assembly Programs. I n this Symbolic Language form, there was essentially a one-to-one correspondence between the mnemonic instructions in the Assembly Language and the binary representation of the corresponding machine language command. Clearly, the next step was to introduce more powerful terms in the Symbolic Language which would correspond t o a number of machine language steps. I n this way, standard functions such as sine and cosine could be introduced into the Symbolic Language. The use of such standard functions would cause the Assembly Program t o generate whole segments of machine language commands t o be placed in the object program, which would then compute the corresponding function. This is the macro instruction mentioned above. The next step in this line of development was a language called FORTRAN (Formula Translator) [3, 41. This language made use of algebraic-like formulas. A translating program would accept this source language and generate the appropriate machine language commands to evaluate the corresponding algebraic expressions. Note that, to specify a computational procedure, more than just algebraic expressions are needed. There need to be “sequential” instructions indicating the order of the evaluations,. or the sequence of steps t o be carried out. Thus, in FORTRAN there are sequential control statements such as “GO TO” a particular place in the specification; or there are conditional statements 351
HARRY D. HUSKEY
that allow branching and provide for carrying out optional programs that depend upon numbers developed in the computations. Sequential aspects of computations cannot be easily expressed in normal mathematical notation. Therefore, in the FORTRAN language, there were introduced statements which controlled the time sequence of the operation. About 1958, a committee was formed, in the United States and in Western Europe, which defined an International Algebraic Language designed to talk about scientific and engineering computations and to serve as an input medium t o automatic computing machines. This work led to the publication in 1960 of a language called ALGOL(Algorithmic Language) [5]. This was a much more powerful language than FORTRAN and was consequently more difficult to learn. Therefore, it has not been particularly successful in the United States in replacing FORTRAN, but it seems t o be coming into use as a communication language about Algorithms. On the other hand, interest is much greater in using it as an input language to computers in Europe, and there have been many translators developed for it. Along with the development of ALGOL,the Defense Department of the United States sponsored the development of a common businessoriented language called COBOL[ I , 6, 71. I n the United States, this language has come into more widespread use than ALGOL,and most computer manufacturers are supplying translating systems for the language. COBOLis particularly useful in the kind of data processing that one does in business applications. Thus, two major languages have been developed, one for use in scientific and engineering computations and the other one for use in business data processing. Along with this development there has been the development of more specific languages aimed a t narrower fields of applications. One of these is a list processing language, IPL-5 [8]. Another is a language suitable for applications in natural language processing, called COMIT[9]. There also has been interest in languages that are particularly suitable for command and control applications; in this field two languages stand out, one of them called JOVIAL [If?,111 and the other called NELIAC [12,131. Another type of language is that developed for particular purposes. One example of this is the APT Programming System, which is designed t o facilitate the control of machine tools. On most computers there are also systems to manipulate matrices, and to solve systems of linear equations or compute eigenvalues of matrices. Other examples are languages for linear programming or for more general mathematical 352
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
programming applications, where one is interested in finding an optimum under sets of lineal‘ or nonlinear restraints. 3. A Typical Digital Computer
A digital computer will be discussed by first defining its significant elements, then presenting its state-transition rules. The elements are described by “declaration)’ statements, and the behavior by “assignment” statements (specifying the contents of registers) and by “sequential” statements (describing the transitions). Only central processor functions will be presented. For descriptions of other computer designs see, e.g., [ l a , pp. 20-19 to 20-331. For present purposes the essential elements are a memory, an accumulator, an operand register, a command register, and a command counter. The memory is essentially a vector (a sequence of elements). The accumulator, or accumulator register, contains the result of each arithmetic operation. I n this presentation, the type of arithmetic (binary, decimal, floating point, fixed point, etc.) and word length are immaterial. The command counter will appear to count out a succession of commands coming from consecutive locations (words) in memory. However, it is sufficient if it can control the acquiring of a “next” instruction (not necessarily in the next location), and if there is a means t o “jump)’t o a new location in the program (perhaps by use of a “directory” or by searching for a “label”). 4. A Language for Describing Computers
To describe more precisely the typical computer, a special language (like the procedure-oriented languages to be discussed later) will be introduced a t this point. The same language will be needed in later sections to describe assembling and translating procedures. To describe this special language and other languages to be considered later, a meta language (Backus normal form) will be used [5, 151. Sets of elements will be represented by ( . . . ), a connective 1 meaning “or)) will be used, and the symbol :: = will mean “is defined to be”. For example, (digit) :: = 0)1121314)516171819 (4.1) means that the class “digit” consists of the symbols 0, 1, 2, etc. The class “letter” is defined by {letter) :: = alblcl . . . lzlylz (4.2) where the three dots represent the letters d through w.Strictly speaking, the three dots should not be used, but where the meaning is obvious 353
HARRY D. HUSKEY
they make the presentation more concise. Capital letters may be introduced in a similar way. Besides letters and digits, delimiters will be needed.
I -+ 1’1 :I I IF I THEN I GO TO I DO (4.3) :: = (arithmetic operator) 1 (logical {operator) operator) I (relational operator) (4.4) {arithmetic operator) :: = + I - I * I / (4.5) (logical operator) :: = A I V (4.6) (relational operator) :: = ( I < I = I # I 2 I ) (4.7) {punctuation) ::=,I;[. {brackets) :: = ( I ) I r l l l i I } The arithmetic operators * and 1 denote multiplication and division,
(delimiters)
:: = (operator)I(punctuation)
(brackets)
respectively. The logical operators A and V are meant in the sense of conjunction (“and”) and disjunction (‘ ‘or”); which may or may not be equivalent to the “logical multiplication” and “logical addition” of computer words (sometimes found in machine languages) depending upon the representation of “true” and “false.” Square brackets will be used to enclose subscripts, braces will enclose statements, and the use of round parentheses and of punctuation will be essentially analogous to conventional use. The use of the colon, arrow, and the single quote will become clear below. Other operators could be introduced in a similar way, e.g., exponentiation, absolute value, negation, etc. From these elements, the language is built up by means of the rules given below. These rules, taken together, form the syntax of the language. To denote the elements of the typical digital computer, “identifiers” will be used. Also, of course, numbers will be needed. For the purposes of this paper i t is sufficient to consider integers; the extension to other types of numbers is straightforward. (identifier)
I (identifier) (letter) (identifier) (digit)
:: = {letter)
I (integer)
(4.10)
(integer)
:: = (digit)
(string)
:: = any sequence of letters, digits, or delimiters except the single quote (’) (4.12)
354
(digit)
I (4.11)
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
The first definition says that any string of letters or digits the first of which is a letter is, syntactically, an identifier. This is a recursive definition. (variable) ::
=
(identifier) I (integer) I TRUE 1 FALSE( (identifier) [(variable)] I ‘ (string) ’
(4.13)
This defines a variable t o be an identifier, a logical value, a subscripted identifier (generally, the subscript has an integral value; if not, perhaps the largest integer less than or equal to the subscript variable is used). Variables axe the names of items, the quotation symbols around (string) mean that the specific sequence of symbols is the item named. A subset of the class “variable” can be called “logical variables.” Each such variable has two possible values, namely, true or false. Here true will be represented by 1 and false by 0. Expressions are defined as follows: (expression) ::
=
(variable) 1 (expression) (operator) (variable) I ({expression)) I ((assignment statement)) (4.14)
At this point it should be mentioned that the pure in heart will wish to distinguish among the several classes of variables (e.g., logical variables whose values are true or false, integral variables whose values are integers, string variables whose values are strings, etc.), and they will wish to repeat the above definition associating only appropriate classes of variables and operators. Generally, arithmetic variables will be associated with arithmetic operators and logical variables with logical operators, Note, however, that two arithmetic expressions connected by a relational operator constitute a logical expression (e.g., a b < c). The only operator associated with strings is equivalence denoted by “ = ”. Note that the definition (4.14)may be considered as permitting logical expressions of the form a ( b ) c which might be reasonably interpreted as an abbreviation of (a( 6 ) A (b ) c ) . The generality of such definitions as (4.14)is far more than needed in any of the applications given in this paper. On the other hand the experienced machine language programmer will recognize in it a flexibility which he uses every day, e.g., the ability to write (a b + c) Id +x. Such characteristics are not present in the more formal languages discussed in Section 12. The body of text which describes the computer consists of statements and in total is referred to as “the program,” “the flow chart,” or as the computer.
+
+
355
HARRY D. HUSKEY
(empty) I (variable) : (statement) I (declaration statement) I (assignment statement) I (sequential statement) 1 (conditional statement) 1 {(program)} I (procedure definition) I (procedure call) (4.15) : := (statement) I (program) (punctuation) (statement)
(statement) ::
(program)
=
(4.16)
Statements may be labeled for purpose of cross reference from other parts of the program. The label, which may be a variable (subscripted or not), precedes and is separated from the statement by a colon. Groups of statements separated by punctuation constitute a program. A program enclosed in braces becomes a statement,
Declaration statements (declaration statement) : : = (identifier) (dimension) (substructure) (4.17) (value) (dimension)
::
=
(empty) I ((integer)) (integer))
I ((integer):
I ((sub list)) item) I (sub list),
(4.18)
(substructure)
:: = (empty)
(sub list)
:: = (sub
(sub item)
:: = {identifier) {dimension) (substructure)
(4.19)
{sub item) (4.20) (4.21)
(value)
:: = (empty
)I
(integer)
= (integer)
I (value), (4.22)
Thus, a declaration statement defines an item or a vector of items (by using the dimension option). Such elements will be called, respectively, words or vectors. By using the substructure options, details of the structure of words may be given. Note that this is recursive (a word may consist of characters, and characters may consist of binary bits). The value option permits the specification of the initial value of words. Since vectors are the most complex arrays needed to describe computers, the dimension definition does not provide for multidimensional arrays. The dimension (n)is equivalent to ( 0 : n ) .
Assignment statements (assignment statement} : : = (expression) + (variable) I (4.23) (assignment statement) --t (variable) 356
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
Note that not all variables on the right-hand side (integers or strings) make sensible assignment statements.
Sequential statements (sequential statement) ::
= GO TO
(variable)
I DO (variable)
(4.24)
Sequential statements refer to labeled statements and control the sequence of execution. A GO TO statement is like a “transfer of control” or a “jump.” A DO is like a subroutine call; that is, the labeled statement is executed, and continuation is with the statement which follows the “DO” statement.
Conditional statements (conditional statement) :: = IF (expression)
THEN
(statement) (4.25)
The expression in definition (4.25) must be either TRUE or FALSE. If TRUE the statement is executed, if not the statement is skipped. An arithmetic expression (i.e. an expression involving integer variables and arithmetic operators) will be said to be TRUE if its value is different from zero, and FALSE otherwise. This introduces a correspondence between addition and multiplication of integers and the disjunction and conjunction of logical variables.
Procedure dejinition (procedure definition) ::
=
(variable) : {(program)}.
(4.26)
A sequence of statements separated by punctuation marks is called a program. A procedure is a program enclosed in braces and labeled.
Procedure call (procedure call) ::
= DO
(variable)
A procedure as defined above may be “called” by a referring to the label of the procedure.
(4.27) DO
statement
5. A Simple One-Address Computer
A simple one-address computer (except for input and output) is shown in Fig. 1. Using the language described above, this computer can be defined as in Table I. As shown in Table I, the one-address computer is described by five declaration statements followed by two pairs of statements, each labeled. State 1 consists of two assignment statements, and state 2 of a procedure call and a GO TO statement. 357
HARRY D. HUSKEY
-
r
- --
L
Memory
-
a
-
.
Arithmetic unit
-
'1
1
1 8
r
OP
Code
1 Address
7
ID
.L:
-
-'
Command location
4.
FIG. 1. Simple one-address computer.
TABLEI AN ALGEBRAIC DESCRIPTION OF THE ONE-ADDRESS COMPUTER ~
~
Memory (9999),Accumulator, Command Location, Action ( l o ) , Command Register {OP code, Address}; State 1 : Memory [Command Location] -+ Command Register, Command Location 1 Command Location, State 2: Action [OP code], GO TO State 1. Action [Clear Add]: {Memory [Address] Accumulator}. Memory [Address] -+ Accumulator}. Action [Add]: {Accumulator Action [Subtract] : {Accumulator - Memory [Address] Accumulator}. Action [Absolute Value] : { /Accumulator1 Accumulator}. Action [Multiply]: {Accumulator * Memory [Address] -+ Accumulator}. Action [Divide] : {Accumulator/Memory [Address] + Accumulator}. Action [Store]: {Accumulator + Memory [Address]}. Action [Transfer]; {Address -+ Command Location}. Action [Transfer Negative]: {IF Accumulator < 0 THEN Address + Command Location}.
+
+
-
4
-+
---t
6. A Square-Root Example on the One-Address Computer
A procedure for computing the square root of a number by an iterative process (Newton's method) will serve to illustrate machine language coding on the computer of Section 5 and, particularly, the use of symbolic languages and assembly processes. 358
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
The arrangement of data in the memory for the square root example is shown in Fig. 2. For example, the (positive) number n whose square root is desired is in location 791. The approximation to the square root is in location 792. An initial constant and it factor required in the process is the number 112, which is stored in location 790, A bound to
792
791
790
+ 500010 + 00
n
793
+ lOOO,,
X
-03
FIG. 2. Memory for square-root example.
determine when the iteration has been carried far enough is stored in 793. Inspection of the program will show that these are all of the locations in memory where data will need to be stored. The program itself TABLEI1 SQUARE ROOTEXAMPLE FOR
Location
Command
0007 0008 0009 0010 0011 0012 0013 0014 0016 0016 0017 0018 0019 0020 002 1
0791 0790 STO 0792 CLA 0791 DIV 0792 ADD 0792 MUL 0790 STO 0792 MUL 0792 SUB 0791
THE
ONE-ADDRESS COMPUTER
Contents of the accumulator after execution
CLA
n
MUL
-
ABV
0000
SUB TRN
0793 0021
TRA
0010
-
1212
n nlx nlx (0.5)(nix
+
+4
X 2 2
xa - n jx2 - n1 1x2 - n ]
-
- 0.0001
-
359
HARRY D. HUSKEY
starts in location 7 and extends through location 20 with the iterative portion being represented by the commands in locations 10 through 20 (see Table 11). The three commands in locations 7, 8, and 9 establish n/2 as the initia1,value for the approximation of the square root. It is assumed that all of the computation is done in the floating point system. Therefore, there is no need for scaling the values which appear in the computation. I n fact, the numbers in 790 and 793 are represented in floating point forms (see Fig. 2). The computational procedure for the square root consists of averaging the value of the square root and the quotient of the given number divided by the square root. After each such improvement in the approximation, the square root is squared and compared with the given the number. If the absolute value of the difference is less than 1 x iteration is complete, and the program will continue with whatever is in location 21. Otherwise, there is a transfer back to location 10 and another iteration takes place. 7. Re1ocatabiI ity
Clearly, the square-root subroutine should be available for general use, so that each individual wanting to use it need not program it in detail. This introduces two problems: it may not be convenient to start the routine in location 7 , and it may not be convenient to have the argument in location 791. One solution to relocatability of the program is to write the program as if it started in location zero and to mark the two commands whose addresses depend upon the location of the routine (locations 19 and 20 of Table 11). Then a loader can be constructed which will add the value of the initial subroutine loading location to the address of the marked commands. Data can be relocated better by designating them in symbolic form and by letting the loader assign the actual memory locations. This leads to the concept of symbolic assembly programs. The program relocation can be solved by relative-address hardware (in which the command location is added to the address before execution); the only apparent hardware solution to symbolic addressing of data seems to be associative memory techniques. A t the moment associative memories are so expensive that it seems likely that assembly programs will be used for some time to come.
360
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
8. An Assembly Program
A symbolic assembly program will be described (Table 111) which will handle assignment of data locations of simple variables and constants and the relocation of programs. A primitive (undefined) called READ which will read in a next instruction is assumed. The assembly program converts this next instruction to a command and loads it into memory. TABLEI11 AN ASSEMBLY PROGRAM INSTRUCTION 1 (OP CODE, CODE), INSTRUCTION 2(ADDRESS), MEMORY (9999), AVAILABLE MEMORY = 8999, COMMAND LOCATION ~ 2 0 0NAME , ( 5 0 0 ) , LOCATION (500), NEXT NAME INDEX=O; STATE 1: DO READ, DO DECODE, STATE 2: O P CODE + ADDRESS -+ MEMORY [COMMAND LOCATION], COMMAND LOCATION 1 + COMMAND LOCATION, GO TO STATE 1. READ: {a program which reads in source language into the items called INSTRUCTION 1 and INSTRUCTION 2). DECODE: {IF CODE = ‘AS Is’ THEN GO TO EXIT. IF CODE = ‘EXECUTE’ THEN {ADDRESS + COMMAND LOCATION, GO TO STATE l.}; IF CODE = ‘RELATIVE’ THEN {ADDRESS COMMAND LOCATION +ADDRESS, G O TO EXIT.); IF CODE = ‘SYMBOLIC’ THEN {NEXT NAME I N D E X + I, AA: IF NAME [I] # ADDRESS THEN GO TO BB. LOCATION [I] + ADDRESS, GO TO EXIT. BE: I - 1 -1, if I > 0 THEN GO TO AA. N E X T NAME I N D E X 1 + N E X T NAME I N D E X -+ I, ADDRESS + NAME [I], AVAILABLE MEMORY - 1 +AVAILABLE MEMORY LOCATION [I] -+ ADDRESS, GO TO EXIT.); IF CODE = ‘CONSTANT’ THEN {NEXT NAME I N D E X 1 + N E X T NAME I N D E X -+ I, O P CODE NAME [I], COMMAND LOCATION + LOCATION [I], 0 -+ O P CODE, GO TO EXIT.}; IF CODE = ‘LABEL’ THEN {NEXT NAME I N D E X 1 -+ N E X T NAME I N D E X -+ I, ADDRESS -+ NAME [I], COMMAND LOCATION -+ LOCATION [I], GO TO STATE l.}; EXIT:}.
+
+
+
-t
+
-+
+
361
HARRY D. HUSKEY
It will be assumed that data is stored in the end of the memory (vicinity of location 9999 of Table I). Declaration statements will define the items involved in the process, and procedures will specify the operations to be performed. 9. The Square-Root Example in Assembly Language
The square-root example of Table 11, written in assembly language form and as assembled into locations 420 to 435 by the above routine, is shown in Table IV. TABLEI V SYMBOLIC SQUAREROOT ROUTINE
CODE
Item OPCODE ~
1 2 3 4 5 6
c1 c2 0 CLA MUL
STO
7
CLA
8 9
DIV ADD
10 11
MUL
12 13 14 15 16
MUL
17
TRA
STO SUB ABV
SUB TRN
As Assembled in 420 to 435
Instruction 2
Instruction I
ADDRESS
Loo
COMNAND
420 421
+5000,,
+ 00
+1000,,
-03
~
CONSTANT CONSTANT LABEL SYMBOLIC SYMBOLIC SYMBOLIC SYMBOLIC SYMBOLIC SYMBOLIC SYMBOLIC SYMBOLIC SYMBOLIC SYMBOLIC AS IS SYMBOLIC RELATIVE RELATIVE
+5000,, +1000,,
+ 00
- 03
SQ ROOT N
c1
TEMP
N TEMP TEMP
c1
TEMP TEMP N 0
C2 1
- 11
422 423 424 425 426 427 428 429 430 43 1 432 433 434 435
After assembly, the entries in the name list are:
362
I
NAME
1
c1 C2
2 3 4
SQ ROOT
5
TEMP
N
LOCATION 420 42 1 422 8998 8997
8998 420 STO 8997 OLA 8998 DIV 8997 ADD 8997 MUL 420 STO 8997 MUL 8997 SUB 8998 ABV 0 sun 421 TRN 436 TRA 425 CLA
MUL
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
If in any subsequently assembled program there is a transfer to SQ the assembly program will use location 422 as the entry. The above assembly program works very well as long as there are no transfers to locations not yet seen by the assembler (sometimes called “futures”). For example, the program ROOT
Operation TRA CIA
Code SYMBOLIC SYMBOLIC
LABEL
Address ABC X ABC
would cause three entries to be placed in the name list, two for “ABC” and one for “x”. There are two types of solutions to this problem. One is to establish a “transfer vector” to which future transfers are made. The positions in the transfer vector will be filled in with the appropriate transfer commands when the respective labels are reached in the assembly process. A second solution is to issue dummy commands in the object program and to place the locations in a “correction list.” With each label the correction list is scanned and the respective dummy commands corrected. More sophisticated assembly programs may keep all the constants in one list to avoid duplication of storage. Library subroutines may be provided, and one of the codes may cause a library program (specified by the address, say) of many commands to be inserted into the object program. “Macro” facilities may exist which permit the programmer to define and name certain sequences of instructions. When called (using another code) the sequences of commands will be inserted into the object program. 10. An Algebraic Language Translator
In order to discuss a way of translating algebraic languages it will be convenient to define a somewhat more sophisticated computer than was described in Table I. There will be more commands available. Their abbreviated representation and effect are given in Table V. The translation of a language such as that described in Section 4 will be considered in two parts, of which only the second will be discussed in some detail. The first phase involves processing identifiers in much the same way as the assembly routine processed symbolic names. Declaration statements cause the identifiers to be entered into a “Name List” and specific memory locations to be assigned. 363
HARRY D. HUSKEY
TABLEV COMPUTEROPERATIONS OPERATION Clear a d d Add Subtract Subtract and negate Add and negate Multiply Divide Reverse divide Store
EFFECT
-
MPY DIV RDV STO
M[A] +ACC ACC -t M[A] = ACC ACC - M[A] +ACC M[A] - ACC +ACC - (ACC M[A]) *ACC ACC * M[A] + ACC ACC/M[AJ + ACC M[A]/ACC + ACC ACC +M[A]
Transfer Transfer on negative
TRA
A
TRN
IF
Subroutine transfer
TRS
COMMAND LOCATION SUB STACK [I], I 1 + I, A +COMMAND LOCATION
Return transfer
TRR
Extract Unite
EXT
CLA ADD SUB
SUN ADN
UNI
+
+ COMMAND LOCATION ACC < O THEN A --t COMMAND LOCATION --t
+
I - 1 -+I, SUB STACK [I]* COMMAND LOCATION ACC ACC
h V
M[A] +ACC M[A] +ACC
The process of translation (see Tables VI and VII) will involve inspection of two consecutive delimiters and the identifier between them (the identifier may be missing), The delimiters will be called the CURRENT OP and the NEXT OP. The heart of the process will be a routine called ADVANCE which steps ahead in the source formula, compares ranks of operators, and either (generally) BUILDS or STACKS a command. After building each new object program command, the stack is checked by a COLLAPSE STACK routine to see if any entries in the STACK should be transferred t o the object program before moving ahead in the source string. An Accumulator Flag (ACC FLAG) tells if the accumulator would contain a result if the current program segment were executed. It is set to nonzero upon the development of a CLEAR ADD command and is set to zero when a STORE command is placed in the object program, The transfer of NEXT OP t o CURRENT OP, the processing of an identifier and address assignment, and the obtaining of the new NEXT OP is done by a procedure called NEW OP. This routine also reads new source language text as required. 364
A N INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
TABLEVI THE TRANSLATOR TRANSLATOR: START: ‘;’ + N E X T OP, 1 +STACK INDEX, 0 +ACC FLAG, START LOCATION + J , INITIAL WORKING ADDRESS +WORKING ADDRESS; ADVANCE: NEWOP, IF CURRENT RANK [CURRENT OP] < NEXT RANK [NEXT OP] THEN {DO STACK, GO TO ADVANCE.}; DO BUILD COMMAND, GO TO COLLAPSE STACK. STACK: {IF ACC FLAG # 0 THEN DO STORE; (STACK INDEX + I ) + 1 + STACK INDEX, STACK OP [NEXT OP] - + O P [I], ADDRESS + ADDRESS [I], NEXT RANK [NEXT OP] + RANK [I], WORKING ADDRESS -+ WA [I],). BUILD COMMAND: {IF ACC FLAG = 0 THEN {‘CLA’ +OP, 1 +ACC FLAG, GO TO AA.}; DIRECT O P [CURRENT OP] + OP; AA: IF ADDRESS # 0 THEN {OP ADDRESS OBJECT PROGRAM[J], J
+
4
+1
+ J,};}.
COLLAPSE STACK: IF STACK INDEX = 0 THEN ao TO ADVANCE. STACK INDEX - 1 + I , IF O P [I] = 0 THEN {IF NEXT O P = ‘)’ THEN I -+ STACK INDEX; GO TO ADVANCE.}; IF RANK [I] > NEXT RANK [NEXT OP] THEN {OP [I] ADDRESS [I] + OBJECT PROGRAM[J], J 1 J, WA [I] + WORKING ADDRESS, STACK INDEX - 1 +STACK INDEX, ao TO COLLAPSE STACK.}; go to ADVANCE.
+
+
--f
+
STORE: {O+ACC FLAG, (STACK INDEX -+I) 1 +STACK INDEX, STACK O P [CURRENT OP] + O P [I], NEXT RANK [CURRENT OP] +RANK [I], (WORKING ADDRESS + ADDRESS [I] + WA [I]) 1 +WORKING ADDRESS, ‘STO’ WA [I] --L OBJECT PROGRAM [J], J + 1 -P J.].
+
+
To understand the behavior of the translator consider the example
;A*X -B/(C*D+E)+F; The sequence of events in processing this formula is shown in Table VIII. 365
HARRY D. HUSKEY
TABLEVII RANKAND OP TABLE Next rank
Current rank
Delimiter
1 1 1 1 7 7
I
+*
Direct
Stack
OP
OP
CLA
-
2 2 0 2 7 7 8
8
CLA
-
ADD
ADD
SUB
SUN
MPY
MPY
CLA CLA
DIV STO EXL UNL
-
CLA TRR CLA
5 3 3 3
THEN
2 DO GO TO
2
-
CLA TRN
3 1
-
TRS
1
TRA
TABLEVIII TRANSLATINQ
A * X - B/(C * D
+ E)
--+
F;
Stack Step
Current Next OP
1
2 3
*
*
-
4 5
6 7 8
I
9
+
10 11
12 13
3 66
OP
*( )
Index
1 2
OP
MPY -
Address A WAO B 0 C
Rank 8
-
I 8 9 8
Working address
Object program
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
Generally, as the ranks of operators increase, generated commands are placed in the STACK, the op codes being determined by the NEXT OP. Whenever the rank declines object code is generated, the first such command being a CLEAR ADD. Subsequent op codes are determined by the CURRENT OP. As each object command is generated, the rank of the last entry in the STACK is compared with NEXT RANK of NEXT OP t o see if the item in the STACK should be placed in the OBJECT PROGRAM. At a relative minimum of the operator ranks in the formula a storein-working-address will occur, and an appropriate command will be placed in the STACK. Upon an opening parenthesis a zero is entered in the stack. This blocks the stack until a corresponding closing parenthesis is seen. Note that the algorithm processes sequential statements (GO TO and D O ) and that the combination .” generates a RETURN TRANSFER for subroutine exit. However, the conditional statements require special treatment (which is not given in Table VI). The THEN must cause the generation of a conditional transfer, the type of which depends upon the relational operator to which it corresponds. Furthermore, the address must denote a “future” entry to the program, SO a procedure is required which will generate appropriate future linkages in a way similar to that discussed under assemblers. Obviously, the compiler can generate assembly language (very much like that shown in the last column of Table VIII) or it can generate absolute (binary) machine language. The first option gives a two-pass system: translate followed by assembly. The second provides a translate-and-execute arrangement with substantial speed advantage, but less object program memory space since the compiler must be “resident” in the memory. I‘}
11. Alternative Methods of Translation
Generally, the process of translation is that of rearranging the formula into a “Polish String’’ [16, 177 such that, when read in sequence, the operands and operators appear in the order corresponding to the order of execution in the computer. I n the above translator the object commands are generated and then rearranged (by using the STACK). I n other schemes [18]the formula is rearranged and then translated or interpreted. Other compilers [3; 19, p. 1881 generate code in an intermediate language (sometimes three-address), and then some optimization of the code may be done. A universal intermediate language has been proposed.
367
HARRY
D. HUSKEY
Generally, the concept is thought to be good but there is little agreement on what the level of the intermediate language should be. For example, is it a symbolic assembly language, is it like the language of Section 4,or is it like FORTRAN or ALGOL? There has been considerable interest in syntax-directed compilers. [19, p. 306; 201. I n these the input is both the syntactic definitions (see Section 4) and the source language string. So far, these translators have been relatively inefficient in terms of speed of compiling. An attractive idea is to have a syntax-directed compiler of compilers, but no results have appeared in this direction as yet. 12. Algorithmic Languages
Algorithmic languages, either in common use or nearly so, are described briefly below. The descriptions are necessarily abbreviated. For complete rigorous descriptions the reader is referred to the References. They are all similar to the language defined in Section 4. Languages primarily intended for data processing problems have been omitted; they are well covered in [ I ] .The languages considered here are: 1. FORTRAN. [3, 4, 211. A language developed in 1957, generally suitable for scientific and engineering computation. A recent version, FORTRAN IV [22],has many of the features of ALGOL. 2 . ALGOL[5, 23, 241. A sophisticated algorithmic language developed by a joint U.S. and European group. It is widely used as a communication (of algorithms) and publication language. It is so sophisticated that no compiler has been written for the complete language. Nearly complete subsets have been in use, more so in Europe than in the United States. 3 . SMALGOL [25,26].A subset of ALGOLsuitable for most engineering and scientific problems. Difficult features of ALGOL(dynamic arrays, numeric labels, etc.) have been omitted, making it relatively easy to produce translators for most computers. 4 . NELIAC[12, 13, 271. A language developed by the U.S. Navy, suitable for simulation and command and control problems. The meaning of statements is determined by the delimiters, making possible fast easy translation. The NELIACfamily of languages all are self-compiling. This has advantages of self-documentation and easy modification. 5 . MAD [28, 291. This system was developed a t the University of Michigan. Like NELIACit is simpler than ALGOL,having been developed concurrently with an initial version of ALGOLcalled the International Algebraic Language. 6. JOVIAL [lo, 301. JOVIAL was developed a t the Systems Develop368
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
ment Corporation concurrently with the development of NELIACand considerable attention was paid to appliMAD.In developing JOVIAL cations in large programming systems. 13. Comparison of Features of Algorithmic Languages 13.1 identifiers and Numbers
I n all the languages described, an identifier consists of a string of letters or digits the first of which is a letter (see 4.10). I n pure ALGOL there are no limits on the size of identifiers or of numbers. I n the subsets for which translators exist the limits depend strongly upon the computer which is to be used. The range in size of identifiers and numbers for the above languages and for a variety of FORTRANS is given in Table IX. Usually, word size will determine the range in numbers, and significant symbols in an identifier must fit in one, two, or three words in most cases. I n some languages the size of the input buffer determines the number of symbols in a statement. In others (SMALGOL and NELIAC)the method of translating is such that length does not matter. 13.2 Expressions
All algorithmic languages process arithmetic expressions of various levels of complexity. The early versions of NELIACdid not permit parenthesized expressions. Generally, the definition of expressions (see 4.14) is modified as follows: (variable) I ((expression)) I (expression) t (expression) I - (expression) (expression) (operator) (variable) signifies exponentiation.
(expression) :: where
t
=
I
I n some languages there are limitations on the exponent. I n most languages no implied operators are permitted, e.g., expressions like (a b ) (c d ) cannot be used but must be replaced by ( a b ) * (c d). Due to limitations in the printing devices there are a variety of representations for operators such as " f ". I n ALGOLconditional expressions such as IF (expression) THEN (expression) ELSE (expression) are permitted.
+ +
+
+
13.3 Statements
A comparison of the statement forms used in some of the algorithmic languages is shown in Tables X and XI. 369
W
v
0
TABLEI X COMPARISONOF ALGORITHMIC LANGUAGES INCLUDING A VARIETYOF FORTRANS Symbols in identifiers Max. no. Significant ALGOL
No limit
No limit SMALGOL No limit NELIAC 6 MAD 5 1620 FORTRAN 4 1620 GOTRAN 5 650 FORTRAN 5 650 FORTRANSIT 6 705 FORTRAN 6 7070 FORTRAN 6 70417090 FORTRAN 6 HONEYWELL 7 PHILCO ALTAC 8 CDC FORTRAN
"Ten cards.
All 1st 5 and last 5 1st 15
8 5 4
5 5 6 6 6 6 7 8
Integer range
Real range
Array dimensions
Symbols per statement
No limit
No limit
No limit
No limit
10" 10" 10" 104 103 10'0 10'0 10'0
10-38, 1038
No limit
No limit No limit
10'0
10-50, 1050
32767 244 32767 1014
10-38, 1038 10-77, 1076 10-600 10600 10-308 1 0 3 0 8
10-38,
lo=
10-38,
1038
10-50, 10-50, 10-50, 10-50, 10-50,
1049 1049
1050 1050 1050
1 2 2 1 2 2 3 3 3 3 4 3
72 72 125 125 660 660 660 660 No limit 700"
Labels Integer identifier Identifier Identifier Identifier 1-9999 1-999 1-9999 1-999 1-99999 0-99999 1-32767 1-32767 Identifier 1-9999
L
r , p
-<
p I C v)
D
TABLEX
z
BASICS T A T E ~ N FORMS T (ASSIGNMENT CONDITIONAL, FOR)
0 Assignment
Law-ge ~ G O L
(V): = (E>;
SMALGOL
(V): = (E>;
NELIAC MAD
JOVIAL FORTRAN 11 FORTRAN IV
Conditional
(E)
= <E) +
= <E) $ = <E) = <E)
<E>THEN <s); IF <E>THEN ( 8 ) ELSE (s); IF (E) THEN (s); IF <E) TKEN (s) ELSE <s>; <E): <S> <S> Whenever <E>,<S> IF
<E> $ <s)$ IF (<E)) IF (<E))<s> IJJ (<E)) IF
FOR
Statements
UNTIL <E>DO <s) FOR
<E>DO (s> = <E) (<E))<E>{@>) ,<E) FOR = <E), <E>,<E> $ <s> DO ,<E), <E> DO (L> UNTIL
U C
s5 z
2 -0
P
0
Fi C
F
; s
;;i U
A bbreviatwna : (V> :: = (variable) (E) :: = <expression> ( S > :: = <statement)
::=,[.l; (PP) :: = I ;
.
5 z Q
: f2 v)
TABLEXI BAS10 STATEMENT FORMS (GO TO,
PROCEDURE DEFINITIONAND CALL)
Procedure Definition ALGOL
SMALGOL NELIAC MAD JOVIAL FORTRAN IX FORTRAN IV
go to go to
. TFUNSFER
TO
GO TO GO TO
(L)
00 TO
(L>
(L)
(L)
procedure ((PA>)<S) procedure ()<S) ((PA)) : {<S>) External function ()$ <S) $ END Function (I) ((PA)) <S) Subroutine (I> ((PA)) (S) Function (I) ((PA)) (S) Subroutine ((PA)) <S>
Abbreuiatwmr :
:: = (identifier)
(PA) :: = (parameters) :: = , I ; (S> :: = {statement)
Procedure call
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
The assignment statements differ in form in terms of the assignment operator (: = , 4, and = ) and in the punctuation used. FORTRAN and MAD were designed for punched card machines, and using one statement per card there is no need for punctuation. Note that only the labels of statements appear in the conditional statements of FORTRAN I1 (and these must be positive integers). Also, note that the punctuation determines the meaning of the conditional statement for NELIAC. The expressions in conditional statements must all be of Boolean type. There are two main types of FOR statements. One type (like FORTRAN) states that for a range of values of an index (variable or parameter) the following statements down to a specified label are to be executed. The other type (ALGOLand NELIAC) specifies the scope by symbols: BEGIN and END for ALGOL, { } for NELIAC. The expressions defining the initial value, the steps, and the limiting value are frequently restricted to be of the form of a variable plus a constant. As shown in Table X I the structure of GO TO statements, procedure definitions, and calls are similar. I n most of the languages a procedure that defines the value of a variable may be used as a term in an expression. 13.4 Types of Variables
Table XI1 shows the types of variables that can be described in some of the languages. TABLEXI1 TYPESOF VARIABLES
Boolean Integer ALGOL SMALGOL NELIAC MAD JOVIAL’ FORTRAN I1 FORTRAN IV
X X (X)“ X X X X
Real
Real double precision String Complex
X X X X X X X
“Parentheses indicate that not all NELIAC’S have this feature, but at least one does. *In JOVIAL dual constants can be declared but complex arithmetic is not supplied.
373
HARRY D. HUSKEY
In most cases the type of a variable is determined by a declaration statement such as REAL Z , y , Z ; In FORTRAN I1 the type may be determined by a particular punch in the card (“B” in column 1 for Boolean). I n some languages, identifiers starting with such letters as I, J, K, etc., are of integer type while others are of real type. Usually, the variables in a n expression must be of consistent types. In the Bendix-CDC-G20 data is flagged as single or double precision so precisely the same code can evaluate expressions no matter what combination of variable types among integer, single precision real, or double precision real, exists. 13.5 Data Structures
Most of the systems handle up to three-dimensional arrays of data.
NELIAC is limited to one-dimensional arrays. But here, as well as in the other languages, larger dimensional arrays may be processed by storing the multidimensional array as a vector and by computing subscripts in an appropriate way. Some of the languages provide for defining t,he initial values of variables. Some provide for compact storage (packing several items into a computer word) of information. Equivalence statements and block structure of data permit overlay of data as the program is run. 13.6 Program Structure
The block structure of ALGOLis the best developed example of program structures. The scope of identifiers and labels is only within the block in which they are declared. Some of the other languages have partial features of this type. For example, local and universal identifiers have been used extensively. 14. Some Special Languages 14.1 SIMSCRIPT: A Simulation Programming Language [31]
The current state of a system to be simulated is called its “status.” The status changes upon the occurrence of events. The status is specified in terms of “entities,” their “attributes,” and their “set” relationship. An entity may belong t o sets, and it may own sets. Events are classified as “exogenous” (caused by forces which are outside the system) and as “endogenous” (caused by preceding events inside the system). 374
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
SIMSCRIPT source language provides commands such as CREATE, etc. Using this source language, programs must be written for each different kind of event. This language provides for arithmetic statements, sequence control, and input and output. There is also a “report generator” which facilitates the specification of the printed output that is required. The system generates a program in FORTRAN language. This is then compiled by an appropriate FORTRAN system and run. This system provides procedures particularly useful in simulation activity and uses terminology suitable to the field of economics. A similar system could have been set up using any one of the algorithmic languages described above. DESTROY, CAUSE, CANCEL,
14.2 C O M I T : A Programming-Language for Mechanical Translation [9]
In contrast to SIMSCRIPT, COMITgenerates computer code and, thus, does not depend upon another algorithmic language. It is designed to provide the linguist with a language with which he can direct a computer to analyze, synthesize, and translate sentences. A COMITstatement (called a RULE) has five components as follows: (name) (left half)
=
(right half)
I I (routing)
(go to)
In the terminology used here this could, for some COMITrules, be written 89 :
(label) (string) = (string)
I I (string)
(label).
The material being processed is read into a “work-space.’) It is referenced by specifying in the left-half string a sufficient number of symbols in order to match the text in the work space. Upon obtaining a match the constituents are temporarily numbered (a relative address) and may be referred to in the right-half string or in the routing section of the rule. By appropriate specification in the right-half string it is possible to add, delete, and rearrange constituents. Subscripts may be added to constituents, and they may be used in calculations and in the rearrangement and deletion processes. The function of the routing section is to control input, output, dispatch functions (such as n-way branching), list look-up, and the union or separation of constituents. 15. Summary
Except for ALaoL and FORTRAN IV the languages described here have been in general use for several years. Authorities generally agree (but a small minority is not convinced) that such languages are a much 375
HARRY D. HUSKEY
better tool for use in programming all computational work than more machine-oriented assembly languages (like FAP for 7090). Translators are becoming available in the United States for SMALGOL and for nearly complete subsets of ALGOL.SMALGOL systems (ALCOR) have been in use in Europe for a couple of years. ALG,OLis becoming an accepted publication language, and it seems likely that restricted ALQOLsubsets will become widely used as computer input languages. It is difficult to say whether ALQOLsubsets will become the most widely used languages in the next few years. SIMSCRIPT is almost certainly indicative of a future trend. That is, languages aimed a t particular fields of application will be developed. I n many cases, these will be based on well-established algorithmic languages, REFERENCES 1. McGee, W. C., The Formulation of Data Processing Problems for Computers, in Advances in Computers (F. L. Alt and M. Rubinoff, eds.), Vol. 4, pp. 1-52. Academic Press, New York, 1963. 2. Babbage, H. P., Babbage’s Calculating Engines. Spon, London, 1889. 3. Backus, J. w., et al., The FORTRAN,automatic coding system. Proc. Western Joint Computer Conf., Los AngeZes pp. 188-198 (1957). 4. General Information Manual FORTRAN, F28--8074-1. International Business Machines, New York, 1961. 5. Naur, P. (ed.), Report on the algorithmic language ALGOL 60. Commun. A.C.M., 3, No. 5 , 299-314 (May, 1960). 6. COBOL1961-Revised Specifications f o r a Common Business Oriented Language. U.S. Govt. Printing Office, Washington, D.C., 1961. 7. McCracken, D. D., A Guide to COBOL Programming. Wiley, New York, 1963. 8. Newell, A. (ed.),Information Processing Language V Manual. Prentice-Hall, Englewood Cliffs, New Jersey, 1961. 9. Yngve, V. H., A programming language for mechanical translation. Mechanical Translation 5 , No. 1. 25-41 (July 1958). manual. Technical Memorandum TM-555/003/0. 10. Shaw, C. J., The JOVIAL System Development Corp., December, 1961. 1 1 . Shrtw, C. J., JOVIAL-A Programming Language for ReaI-Time Command Systems, in AnnHal Review i n Automatic Programming, Vol. 3, pp. 53-119. Pergamon Press, New York, 1961. 12. Halstead, M. H. Machine Independent Computer Programming. Spartan Books, Washington, D.C., 1962. 13. Huskey, H. D., Love, R., and Wirth, N., A Syntactic Description of BC NELIAC.Commun. A.C.M. 6 , No. 7, 367-375 (July 1963). 14. Huskey, H. D., and Korn, G. A,, Computer Handbook. McGraw-Kill, New York, 1962. 15. Backus, J. W., The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conf., Information Processing in Proceedings of the International Conference on Information Proceseing, Unesco, Paris, June 1959, pp. 125-132. Oidenburg, Munich, 1960.
376
AN INTRODUCTION TO PROCEDURE-ORIENTED LANGUAGES
16. Burks, A. W., Warren, D. W., and Wright, J. B. An analysis of a logical machine using parmthesis-free notation, in Mathematical Tables and Other Aids to Computation, Vol. VIII, 53-67, (1954). 17. Lukasiewicz, J., Aristotle’s Syllogistic from the Standpoint of Modern Formal Logic. Oxford Univ. Press (Clarendon), London and New York, 1951. 18. Samelson, K., and Bauer, F. L. Sequential formula translation. Commun. A.C.M. 3, No. 2, 76-83. (February 1960). 19. Ledley, R. S., Programming and Utilizing Digital Computers. McGraw-Hill, New York, 1962. 20. Irons, E. T., A syntax directed compiler for ALGOL60. Commun. A.C.M. 4, No. 1, 51-55 (January 1961). 21. McCracken, D. D., A Guide to F O R T R A N Programming. Wiley, New York, 1961. 22. FORTRAN I V Language, C28-6274. International Business Machines, New York, 1963. 23. Naur, P. (ed.), et al., Revised report on the Algorithmic language ALGOL60. Commun. A.C.M. 6, No. 1, 1-17 (January 1963). 24. Woodger, M. (ed.), Supplement to the ALGOL60 Report. Commun. A.C.M. 6, No. 1, 18-23 (January 1963). 25. SBIALGOL-61 (Presented by the Subcommittee on Common Languages). Commun. A.C.M. 4, No. 11, 499-502 (November 1961). Manual of the ALCORGroup). 26. ALaoL-Manual der ALCORGruppe (ALGOL Elektronische Rechenanlagen (May 6, 1961 and February 1962), Munich. 27. Huskey, H. D., Halstead, M. H., and McArthur, R., NELIAC-tI dialect of ALGOL.Commun. A.C.M. 3, No. 8, 463-468 (August 1960). 28. Arden, R., Galler, B., and Graham, R., Michigan Algorithmic Decoder. University of Michigan Reports, June 1960 and January 1963. 29. Galler, B. A., The Language of Computers. McGraw-Hill, New York, 1962. 30. Schwart,z,J. I., JOVIAL-a general algorithmic language, Symbolic Languages in Data Processing, in Proc. of the Symbolic Languages Symposium, International Computation Center, Rome, March 1962, pp. 481-494. Cordon and Breach, London, 1962. A Simulation 31. Markowitz, H. M., Husner, B., and Karr, H. W., SIMSCRIPT: Programming Language, Memorandum RM-3310-PR, The Rand Corp., November, 1962. 32. K. E., Iverson, A Programming Language. Wiley, New York, 1962. 33. Goodman, R., Annual Review in Automatic Programming, Vol. 1. Pergamon Press, New York, 1960.
377
This Page Intentionally Left Blank
Author Index Numbers in parentheses are reference numbers and are included to assist in locating references when the authors' names are not mentioned in the text. Numbers in italics refer to the page an whicb the reference is listed.
A
Abrahams, S. C., 259 (lo ), 284 Adiutori, E. F., 326 (385), 348 Agalides, G. E., 198, 223 Agalides, J. E., 207, 225 Ahmed, F. R.,258 (2), 271 (2), 284 Ainsworth, A., 140 (63), 221 Aizerman, M. A., 132, 220 Albarde, P., 205 (152), 224 Alexander, J. H., 309 (169), 312 (165), 322 (252a), 336, 341 Alfieri, R.,140 (65), 221 Allen, W. A., 236, 253 Amarel, S., 189, 223 Amster, H. J., 301 (52, 53), 324 (284), 325 (330), 329, 343, 345 Anderson, B. L., 321 (220, 227), 339 Andrew, A. M., 151, 219 Andrews, 126 Anokhin, P. K., 206, 225 Arai, K., 318 (212), 319, 338 Archibald, J. A., Jr., 295 (23), 307 (98), 309 (98), 313 (23, 98), 317 (23), 318 (98), 327, 332 Arden, R., 368 (28), 377 Arms, R., 317 (205), 338 Arndt, U. W., 259 ( l l ) ,284 Arnott, S., 279, 281 (57), 287 Aronson, R.,325 (321), 345 Arsentieva, N. G., 33, 105 Ashby, W. Ross, 120, 123, 155, 165, 219, 222 Auerbach, E. H., 311 (130), 333 Austin, G. A., 168, 222 Avery, R., 326 (373, 382), 348 Axelson, B. H., 326 (369), 347 B
Bebbege, H. P., 350 (2), 376
Babcock, M., 141, 221 Backus, J. W., 351 (3), 353 (15), 367 (3), 368 (3), 376 Bagwell, D., 326 (343), 346 Baker, J. G., 240, 241 (55, 56), 254 Ball, G. L., 310 (46, 5 0 ) , 329 Baller, D. C., 309 (179, 182, 183), 337 Banerji, R. C., 194, 223 Barbizet, J., 205, 224 Bareiss, E. H., 304 (75), 322 (234, 235), 323 (235), 330, 340 Bar-Hillel, Y., 216, 226 Baricelli, N., 222 Barlow, B., 207, 225 Baron, S., 308 (117), 333 Bartlett, F., 111, 205, 218, 224 Bate, R., 324 (299), 343 Battey, P., 311 (135), 334 Bauer, F. L., 367 (18), 377 Beach, L. A., 324 (295), 343 Beer, S., 116, 218 Beevers, C. A., 263, 285 Bell, G. I., 322, 323 (243), 340 Bellman, R., 306 (87), 331 Benedict, M., 326 (374, 375), 348 Bengston, J., 322 (248), 341 Bennett, J. M., 258 ( l ) ,274, 284 Berek, M., 234, 252 Berger, M. H., 324 (294), 343 Berning, J., 238, 253 Berwind, H. J., 325 (318), 344 Best, G. H., 323 (264), 342 Beurle, R. L., 141, 198, 221 Bewick, J. A., 308 (120), 333 Bickel, P. A., 311 (127), 333 Bilodeau, G. G., 313 (151), 316 (151), 318 (151), 335 Birkhoff, G., 291 ( 8 ) , 300 (39), 316 (200), 318, 326 (357), 326, 328, 338, 347
379
AUTHOR INDEX
Bishop, J., 207, 225 Black, G., 238, 242, 253, 254 Blaine, R. A., 325 (334), 326 (378), 345, 348 Blake, C., 275 (41), 278 (41), 279 (4l), 286 Blow, D. M., 281, 287 Blue, E., 323, 341 Bohl, H., 321 (221), 339 Bohl, H., Jr., 295 ( l e ) , 301 (18), 309 (172), 327, 336 B Itzmann, L., 297 (33), 328 Boonstra, B. H., 326 (336), 345 Booth, A. D., 269, 275, 285 Borieua, R. E., 281 (61), 287 Borowiec, J., 79 ( 2 ) , 80 (2), 105 Bowman, R. A., 140, 221 Braightenberg, V., 207, 225 Braines, I., 207, 215, 225 Branden, C. I., 275 (41), 278 ( 4 l ) , 279 ( 4 l ) , 286 Brezos, J. N., 326 (386), 348 Briggs, D. L., 326 (340), 346 Brillouin, L., 216, 226 Broadbent, D. E., 205, 224 Brodsky, R., 294, 295 (14), 296, 327 Brooks, H. 303 (66). 330 Brown, B., 326 (354), 347 Brown, H. D., 324 (291), 343 Brown, J., 205, 224 Brown, J. B., Jr., 309 (195), 338 Bruner, J. S., 168, 205, 222, 224 Bruns, H., 235, 253 Buchdahl, H. A., 236, 253 Buerger, M. J., 262, 274, 281, 285, 286 Buerger, P., 321 (221), 339 Bujosa, A., 258 (9), 271 (9), 273 (9). 284 Bulmer, J. J., 302 (63), 330 Burgess, R. D., 308 (117), 333 Buricelli, 157 Burke, A. W., 156, 222 Burks, A. W., 367 (IS), 377 Bush, R. R., 129, 220 Bushnell, D. L., 324 (287), 343 Businaro, V. L., 326 (370), 348 Busing, W. R., 271 (32), 276, 285 Buslik, A. J., 302 (62), 330 Butler, M., 295 (20),297 (31), 369 (20), 324 (31), 327, 328
380
C Cadwell, W. R., 311 (126), 313 (151, 156, 157, 159), 316 (126, 151), 317 (156, 157), 318 (126, 151, 157, 159), 333,335 Caianiello, E. R., 119, 198, 219 Cairns, J. L., 313 (155), 335 Calame, G. P., 302 (59), 329 Callaghan, J. B., 301 (53), 308 (108). 326 (348), 329, 332, 346 Cameron, S . , 142, 221 Campise, A. V., 309 (193), 323 (260), 337, 341 Candelore, N. R., 302 (64), 330 Canfield, E. H., 301 (49), 309 (176). 329, 336 Cantwell, R. M., 321 (222, 223, 224), 325 (332), 339, 345 Capla, V. P., 24, 105 Carlson, B. G . , 304 (76), 319, 322 (244, 247), 323 (243), 330, 340 Carmichael, B. M. 323 (264), 342 Cam, J. W., 111, 24 (4), 105 Carson, D., 324 (302), 344 Carter, H. P., 325 (327). 345 Case, K. M., 299 (41), 328 Cassidy, L. M., 326 (352), 346 Cegelski, W., 312 (137), 313 (137), 334 Cesari, L., 306, 331 Chambers, F. W., 259, 284 Chandrasekher, S., 321 (232), 340 Charkviani, C., 124 (24), 219 Chernoff, H., 244 (66), 255 Chernyavskii, V. C., 54, 107 Cherry, C., 146, 215, 216, 221 Chezem, C. G . , 326 (367), 347 Chichianaze, C., 219 Chichinadze, 124 Chorafas, D. N., 24, 105 Christiansen, G. K., 311 (134), 312 (134), 319. 334 Christman, R. P., 311 (128), 333 Church, A., 215, 226 Churchman, H. W., 200, 223 Clark, M., Jr., 304 (70, 74), 305 (81), 306 (91), 330, 331 Clarke, R., 141, 198 (71), 221 Clarke, R. A., 240, 254 Clarke, W. G . , 326 (356), 347
AUTHOR INDEX Cleary, J. D., 324 (285), 343 Clemental, E., 326 (384), 348 Cochran, W., 258, 266, 270 (5), 272 (5), 276, 282, 284, 286 Colbeth, E . A., 308 (114), 332 Cole, A. G., 309 (178), 336 Cole, H., 259, 284 Collar, A. R., 280, 287 Collins, E. T., 309 (177), 315 (177), 319 (177), 336 Colston, B. W., 309 ( 1 9 4 ) ,337 Conrady, A. E., 234 (7, S), 236 (as), 252, 253 Cook, J. M., 295 (20), 309 (20), 326 (362), 327, 347 Cooper, R. S., 301 (58), 329 Corey, R. B., 277 (50), 286 Coulson, J. E., 210 (181), 225 Coulter, C. L., 273 (371, 275, 278 (41), 279 (41), 281 (57), 286, 287 Coveyou, R., 324 (299), 325 (327), 343, 345 Cowan, J. D., 216 (197), 226 Cox, A., 233, 252 Cox, E. G., 273 (36), 286 Craik, K.,J. W., 205, 224 Crane, H. D., 138, 220, 221 Crockett, J. B., 244 (66), 255 Cruickshank, D. W. J., 258 (2, 9), 269, 271 (2, 9), 272, 273 (9, 34, 36), 276, 277 (46), 284, 285, 286 h i k , K., 101, 105 Culpepper, G., 321 (221), 339 Culpepper, L. M., 309 (174), 321 (219), 336, 339 Curry, H. B., 214, 216, 226, 244 (65), 255 Curtis, A. R., 307 (96), 331 Cuthill, E., 307 (97), 311 (97, 125), 312 (136, 138), 317 (97, 206), 318 (97, 136), 332, 333, 334, 338 D
Daitch, P. B., 333 Daugavet, 0 . K., 58, 105 Davies, D. R . , 271 (29), 285 Davis, J. A., 320 (218a), 321 (219, 220, 226, 227, 228), 339 Davis, R. M., 312 (136), 318 (136), 334
Davison, B., 297 (34), 298 (34), 300 (34), 301 (34), 303 (34), 304 (34), 305 (34), 306 (34), 320 (34), 321 (34), 328 Davisson, C. M., 324 (295), 343 Dawson, C. W., 304 (72), 321,322 (236, 236, 238, 239), 323 (235), 330, 340 Dechand, C. O., 313 (153), 323 (263b), 335, 342 de Dufour, E., 322 (242), 340 Deonigi, D., 326 (376, 377), 348 Dersch, W. C., 132 (45), 220 Dight, D. G . , 326 (346), 346 Doggett, J., 324 (294), 343 Donaldson, P., 207, 225 Donohue, J., 277 ( 5 0 ) , 281 (6l), 286, 287 Dorsey, J. P., 313 (151), 316 ( E l ) , 318 (151), 321 (228), 335, 339 Dougherty, E. E., 309 (176), 336 Douglas, J., 318 (214), 339 Douglis, A., 299 (42), 328 Drawbaugh, D. W., 325 (311, 312), 344 Dryden, C. E., 324 (302), 344 Duane, B. H., 322 (254), 341 Dudek, J. S., 301 (47), 329 Duncan, W. J., 280, 287 Dyer, J., 301 (56), 329 E
Edgar, K. R., 322 (240), 340 Edwards, A. L., 326 (344), 346 Edwards, W., 211, 226 Ehrlich, R., 290 (4), 302 (4), 303 (67), 307 (4), 311 (4), 322 (4), 326, 330 Eich, W. J., 313 (155), 335 Eichorn, E. L., 271 (28), 276, 277, 285 Eldert, C., 251, 255 Ershov, A. P., 25 (12), 30, 33, 34, 40, 42 (11, 12), 46, 48 ( l l ) , 49, 51, 52 (16), 68 (15), 105, 106 Eschbach, E., 326 (376, 377), 348 Estes, W. K., 129, 219 Eyer, J., 240 F
Fairey, J. G., 308 (lOS), 313 (161), 316 (151), 318 (151), 332, 335 Farley, B., 141, 198, 221
381
AUTHOR INDEX
Feder, D. P., 236, 239, 240, 243, 244 (67, 68), 245, 253, 254, 255 Federighi, F. D., 301 (54), 302 (59, 60, 61), 309 (191), 310 (191), 329, 330, 337 Fedoseev, V. A., 66, 67 (18), 68 (18), 106 Fein, E., 313 (153), 335 Ferster, C. B., 205 (154), 224 Fialkowski, K., 7 1 (17), 80 (17), 106 Fiegenbaum, T., 187, 198, 223 Fillmore, F. L., 324 (281), 342 Finkelstein, N., 238, 253 Fischer, G. J., 326 (373, 382), 348 Fischer, P. G., 309 (180), 337 Fisher, C. R., 326 (341), 346 Fisher, D. L., 307 (loo), 332 Fisher, R. S., 216, 226 Flanagan, C. A., 308 ( l l l ) ,332 Flanders, D. A., 316 (202), 338 Flatt, H. P., 307 (92), 308 (121), 309 (179, 181, 182), 318 (92), 323, 331, 333, 337, 341 Flavell, J. H., 224 Fletcher, J. L., 312 (149), 315 (149), 317 (149), 318 (149), 335 Foell, W. K., 324 (289), 343 Fogel, L. A., 200, 224 Ford, P. W., 238, 253 Forsythe, G. E., 315 (199), 338 Foulkes, J. D., 149, 221 Fowler,T. B., 311 (132, 133), 312 (133, 147, 148), 313 (162), 318 (133), 319, 326 (337, 338, 379), 334, 335, 339, 345, 346, 348 Francescon, S., 322 (250), 341 Francis, N. C . 322 (240), 340 Frank, L., 124 (30), 219 Franklin, J., 309 (173), 336 Frazer, R. A., 280, 287 Frederick, D. H., 308 (113), 332 Friedrich, C. M., 326 (345, 347, 349, 350), 346 Frolov, G. D., 68 (37a), 107 Fyer, J., 254 G
Gabor, D. A., 151, 221, 222 Galambos, I, 206, 225 Galanter, E., 176 ( l l l ) , 223
382
Gallagher, W. J., 311 (127), 333 Galler, B. A., 368 (28, 29), 377 Gandini, A., 326 (383), 348 Gannon, L. J., 295 (19), 301 (19), 327 Garabedian, H. L., 308 (123), 333 Gardner, M., 226 Garg, S. B., 312 (144), 334 Gast, R. C., 302 (64), 304 (75a), 305 (79), 322 (237), 323 (237), 324 (284), 330, 331, 340, 343 Gates, L., 317 (205), 338 Gelbard, E. M., 290 (4), 302 (4), 307 (a), 309 (172), 311 (4), 320, 321 (219, 220, 221, 225, 226, 227, 228), 322 (4), 325 (331), 326, 336, 339, 345 George, F. H., 201, 224 Gerrard, R. W., 207, 225 Ginitie, G. M., 207 (173), 225 Girard, A., 245, 255 Gitter, L., 322 (251), 341 Givens, N., 309 (169), 336 Glaser, R., 210 (180), 225 Glasser, S., 308 (113), 332 Glasstone, S., 291 (13), 327 Glushkov, V. M., 52, 68 (19), 106 Goertzel, G., 323 (279), 325 (279), 342 Goldacre, J., 157, 222 Goldberg, E. 324 (288), 343 Goldsmith, M., 321 (222), 324 (283). 339, 343 Goldsmith, S., 326 (376, 377), 348 Golub, G. H., 317, 338 Gonzalez, A. A., 315 (197), 338 Goodenough, 168 Goodjohn, A. J., 323 (263a), 342 Goodman, R., 377 Goodnow, J. J., 222 Gormley, M. F., 313 (153), 323 (263b), 335, 342 Gorn, S., 204, 217, 224 Gosden, J . A. 24 (I&), 106 Gottfried, H. S . , 239, 254 Graham, R., 368 (28), 377 Graves, H. W., Jr., 313 (152, 154), 335 Greebler, P., 311 (129), 316 (129), 333 Green, W. B., 309 (186), 337 Grecne, P. H., 144, 199, 221 Gregory, R. L., 205, 224 Greniewski, M., 70 (21, 22, 23), 80 (22, 23), 88 (20, 21, 22, 23), 106
AUTHOR INDEX
Griffin, D. S., 326 (347), 346 Grimesey, R. A., 295 (19), 301 (19), 324 (289), 327, 343 Grosch, H. R. J., 238, 253 Guilinger, W., 325 (328), 345 Gunther, G., 200, 224
Hicks, D., 301 (51), 302 (51), 307 (51, 101), 313 (101), 329, 332 Higgins, 0. C., 240, 254 Hill, J., 326 (372), 348 Hinman, G. W., 322 (252, 252a), 341 Hirshfeld, F. L., 278 ( 5 2 ) , 286 Hochstrasser, U., 295 (24), 327 Hodgkin, D. C . , 264, 285 H Hodgson, L. I., 277, 278, 286 Habetler, G. J . , 290 (4), 299 (43), 302 Hoff, F. W., 326 (336), 345 (4), 307 (4), 309 (170), 311 (4), 312 Hoffman, T. A., 309 (180), 337 (146a), 319 (146a), 322 (4), 326, 328, Hogberg, T., 324 (293), 343 Holmes, D. K., 291 (12), 303 (12), 327 334, 336 Hageman, L. A., 313 (160), 315 (160), Honeck, H. C., 301 (40), 322 (254a), 328,341 318, 319 (160), 322 (241), 335, 338, Hook, S., 110 (2), 218 340 Haire, P. F., 124 (22), 219 Hooper, C. W., 279 ( 5 8 ) , 281 (58), 287 Halstead, M. H., 352 (12), 368 (12, 27), Hopkins, H. H., 236, 242, 253, 254 Hopkins, R. E., 240, 249, 250 (77), 376, 377 Hamilton, L. D., 279 (58),281 (58),287 251, 254, 255 Horn, P., 263, 285 Hamilton, W. R., 234, 235, 252, 253 Hammersley, J. M., 323 (277), 324 Hovland, C. I., 204, 205, 224 (277), 325 (277, 309), 342, 344 Howard, R. A., 218, 226 Hubel, H. D., 207, 225 Hannun, W. H., 308 ( l l l ) ,332 Hansen, K. F., 306 (91), 331 Hughes, E. W., 269, 276, 277, 285 Harker, D., 262, 285 Hulbert, L. E., 326 (351), 346 Harlow, H. F., 205, 224 Hull, C. L., 205, 224 Hunt, E. B., 176, 205, 223, 224 Hannan, L. D., 130 (38), 220 Harouless, G., 124, 219 Hunter, C. H., 309 (190), 337 Harrah, D., 215, 226 Hurwitz, H., Jr., 303 (66, 67), 330 Harriman, J. M., 307 (loo), 332 Huskey, H. D., 352 (13), 353 (14). 368 Hartmann, J., 238, 254 (13, 27), 376, 377 Husner, B., 374 (31), 377 Hasnain, S. A., 326 (371), 348 Hassitt, A., 309 (142), 312 (142, 163, Hyden, H., 206, 225 163a), 315 (142), 316 (163), 319, 334, 335 I Hauptman, H., 282, 287 Hawkins, D. G., 239, 254 Iliffe, J. K., 33, 106 Hebb, D. O . , 205, 224 Illingworth, J. M., 326 (372), 348 Heestand, J., 326 (381), 348 Inselberg, A., 132, 220 Heffley, R. A., 309 (190), 337 Irons, E. T., 368 (20), 377 Held, K., 325 (321), 345 Iverson, K. E., 377 Hellens, R. L., 295 (16), 327 Heller, J., 317 (208), 327, 338 1 Hemphill, J. P., 295 (18),301 (18), 327 Hennessy, W. P., 250 (76), 255 Jackson, M. H., 312 (136), 318 (136), 334 Henry, A. F., 289, 300, 308 (120), 326, Jacobi, W. M., 307 (95), 311 (95), 331 328, 333 Herzberger, M., 234, 238, 239, 252, Jacobson, R. A., 278 (52), 286 253, 254 James, R. W., 258 (7), 270 (7), 284
383
AUTHOR INDEX
Ketchum, M.A.,311 (130),333 Kilmuir, J., 207, 225 Kimes, T.F., 326 (357),347 Kindler, E.,42, 101 (30),103, 106 Kingslake, R.,233, 252 Kistler, V. E., 309 (196),338 Kitov, A. I., 30 (31),32 (32),33, 106 Klahr, C.,325 (321),345 Klann, P. G., 313 (153),323 (263b), 335, 342 Kloucek, J., 88, 106 Kniedler, M. J., 325 (316, 319, 320, 322), 344, 345 Kobler, A. L., 212 (Ha),226 Kochen, M.,168, 222 Kofink, W.,321, 339 Kolar, 0.C . , 326 (365),347 Kopp, H.J., 306, 331 Korn, G.A.,353 (14),376 Korobov, N.M., 106 K Korolyuk, V. S., 107 Kammerer, W., 101 (26),106 Kovalev, N., 66 (36),107 Kozhukhin,G.I.,49,51(16),52(16), 106 Kaestner, P.C . , 301 (55),329 Kahn, H.,323 (271), 324 (271), 325 Kozmidiadi, V. A.,54, 107 (271),342 Krinitskii, N.A.,32 (32),33, 68 (37a), 106, 107 Kalaba, R., 306 (87),331 Kristiansen, G. K., 336 Kalm&r,L., 99, 106 Kalos, M.H., 323 (278,279),324 (278, Kruse, F. H., 258 (3),271 (3),276 (3), 284 279, 303), 325 (278,308). 342, 344 Krylov, V. I., 315 (198),338 Kamper, J., 264 (19),285 Kamynin, S. S., 24 (27),29 (27), 30 Kuehn, H.G., 325 (328,330), 345 Kurochkin, V. M., 52, 68 (38),107 (27),34 (27),68 (15,27), 106 Kantorovich, L. V., 315 (198),338 Kapany, N. S., 239, 254 L Kaplan, S., 307, 308 (120, 122), 311 La Bauve, R. J., 240, 254 (122),332, 333 Lacy, P., 323 (266),342 Karle, I. L.,282 (67),287 Laikin, M.,238, 253 Karle, J., 282 (67),287 Lallier, K., 326 (354),347 Karr, H.W., 374 (31),377 Lamberts, R.L., 240, 254 Katz, R.C . , 124, 219 Langridge, R.,279 (58),281 (58),287 Kazi, A. H., 323 (265),342 Lapidus, A., 324 (290,292), 343 Keim, R.E., 239, 254 Lavrov, S. S., 48, 62, 107 Keldysh, M.V., 68 (28),106 Lawton, T.J . , 307 (95),311 (95),331 Keller, H.B., 323, 327, 341 Layno, S. B., 325 (307a),344 Kellman, S., 324 (288),343 Lazarus, R. B., 326 (362),347 Kellogg, R. B., 318 (215),339 Kendrew, J. C . , 258 (l),274 (40),275 Le Bauve, R. J., 323 (264),342 Ledley, R. S., 367 (19),377 (41),278 (41),279 (41),284, 286 Lee, C. E., 298 (36).304 (36),319, 322 Kepes, J. J . , 313 (158),335 (247),323, 328, 340 Kerber, A.,234, 252 Janz, R. F., 313 (152),335 Jarvis, P.H., 321 (220,227),326 (348), 339, 346 Jasper, H. H., 207, 225 Jaye, S., 326 (379),348 Jeans, J. H., 304, 330 Jennings, B., 324 (285),343 Jensen, L. H., 267, 285 Jewett, J. P., 311 (130),312 (lag),315 (149),317 (149),318 (149),333, 335 Joanou, G. D., 301 (47,55), 309 (188), 323 (263a,265), 329, 337, 342 John, 206 Johnson, D. L., 212 (184),226 Johnston, R.R., 325 (324),345 Jones, D.H., 311 (128),333 Jordan, T.,325 (316,319, 320), 344 Jordan, W.B., 326 (358),347 Judge, F.,333
384
AUTHOR INDEX
Lehman, J. P., 326 (363), 347 Leimdorfer, M., 324 (298), 343 Lemke, B., 322 (245, 249), 340, 341 Lenihan, S. R., 309 (177, 187, 188), 315 (177), 319 (177), 336, 337 Lerner, A. Ya, 132, 220 Leshan, E. J., 307 (102),309 (173, 188), 325 (102), 332, 336, 337 Leslie, D. C., 301 (57), 329 Lettvin, J. Y., 208, 225 Letvin, 207 Levenberg, K., 246 (80), 251, 255 Levine, M. M., 324 (282), 343 Levy, H. A., 271 (32), 276, 285 Lewellan, J., 323 (266), 342 Lewins, J., 308 ( l l e ) , 333 Lewis, B. N., 212 (188, 189, 190, 191), 226 Liedel, A. L., 326 (386), 348 Lindsey, J., 264 (19), 285 Linfoot, E. H., 239, 240, 254 Lingenfelter, R., 309 (140), 312 (140), 334 Lipscomb, W. N., 278 (52), 286 Lipson, H., 258 (8), 266, 270 ( 5 ) , 272 (5), 282, 284 Loebenstein, H. M. 247 (72), 255 Loechler, J., 325 (333), 345 Loefgren, L., 115, 120, 156, 218, 219 Long, R. W., 295 (16), 327 Lorentz, 205 Lorenz, K. Z., 224 Love, R., 352 (13), 368 (13), 376 Lovell,F.M.,258(9), 271(9),273(9),284 Lu, C., 281 (61), 287 Luce, D., 129, 220 Luce, R. D., 218, 226 Lucy, F. A., 240, 254 Lukasiewicz, J., 367 (17), 377 Lukaszewicz, L., 69 (44), 74 (45), 77 (44), 80 (44, 45), 107 Lukhovitskaya, E. S., 24 (46), 37, 107 Lull, Ramoun, 21 1, Lumsdaine, A. A., 210 (180), 225 Luzzati, V., 263, 285 Lyapunov, A. A., 24, 25, 32 (41), 57 (41), 68 (28, 41), 102, 106, 107 Lyubimskii, E. Z., 24 (27, 42), 29 (27), 30 (27), 34 (27), 36 (42), 48, 52, 68 (15, 27), 106, 107
M
McArthur, R., 368 (27), 377 McCarthy, C-A., 246 (178), 250 (77, 78), 255 McCarty, D. S., 309 (174), 336 McCracken, D. D., 352 (7), 368 (21), 376, 377 McCulloch, W. S., 134, 193, 207, 208, 220, 225 MacDonald, J., 325 (333), 345 McGaugh, J. D., 313 (154), 335 McGee, W. C., 350 ( l ) ,352 ( l ) ,368 ( l ) , 3 76 McGoff, D., 301 (48), 329 McGratten, R. J., 326 (352), 346 Machell, W., 312 (137), 313 (137), 334 Mackay, D. M., 146, 221, 222, 223 Mackay, M., 264 (19), 285 MacKay, 140, 152, 165, 185 Mackworth, N. H., 205, 224 Mader, C. L., 326 (364), 347 Magoun, H. W., 225 Mandel, J. T . , 321 (228), 322 (241), 339, 340 Marchuk, G. I., 306 (go), 331 Markov, A. A., 32, 107, 176, 214, 223 Markowitz, H. M., 374 (31), 377 Marlowe, 0. J., 303 (93, 99), 308 (122), 309 (174, 175), 311 (99, 122), 317 (99). 318 (99), 331, 332, 333, 336 Maron, H. E., 137, 202, 220 Marsh, R. E., 273 (38), 286 Marshak, R. E., 303 (66), 330 Marshall, A. W., 323 (269), 342 Martino, M. A., 299 (43), 325 (305), 328, 344 Martynyuk, V. V., 48 (48), 107 Marvin, D. A., 279 ( 5 8 ) , 281 ( 5 8 ) , 287 Marzocco, F., 174, 223 Mathes, W., 324 (301), 343 Matturana, H. R., 207, 208, 225 Maynard, C. W., 325 (314), 344 Mazurkiewicz, A., 74 (45), 80 (45, 49), 107 Meanor, S. H., 307 (95), 308 (108), 311 (95), 331, 332 Meghreblian, R. V., 291 (12), 303 (12), 327 Meiron, J., 247, 248, 249, 255
385
AUTHOR INDEX
Meneghatti, D., 323 (262), 341 Mesarovic, M. D., 116, 146, 218 Meyer, J. E., 307 (107), 308 (108), 326 (355, 359), 332, 347 Miller, G. A., 176, 223 Miller, R. I., 326 (360), 347 Miller, 124 Milner, P. M., 205, 224 Mingle, J. 0 . 3 0 4 (73). 320(218), 330,339 Minsky, M., 174 (log), 177, 222, 223 Minton, G. H.. 313 (155), 324 (285), 335, 343 Mirabel, J. A., 326 (346), 346 Mironov, G. A. 68 (37a), 107 Mitchell, H., 321 (228), 339 Mitchell, J. A,. 313 (158). 335 Mittelstaedt, H., 185, 223 Moineraeu, P . , 312 (167), 313 (167), 336 Moldauer, P. A., 301 (45), 329 Moorhead, T. P., 312 (166), 336 Morozov, Yu. I, 48 (67), 108 Mostcllcr, F., 129, 220 Mount, B. H., 295 (16), 327 Mountford, L. A,, 309 (196), 338 Mowrey, A. L., J r . , 307 (106), 326 (371), 332, 348 Muellen, I?. E., 295 (19), 301 (19), 327 Murcott, N., 239, 254 Murray, It. L., 307 (1061, 326 (371), 332, 348
N Nahavandi, A. N., 326 (368, 369), 347 Nakatsu, K., 279 ( 5 5 ) , 287 Napttlkov, A , , 207, 215 (165), 225 Nather, V., 295, 296 (28, 29), 327 Naur, P., 352 (5), 353 (5), 368 (5, 23), 376, 377 Newell, A , , 173, 176, 177 (116), 180, 18.5, 187, 222, 223, 352 (8), 376 Nicolson, It. B., 301 ( S O ) , 329 Noda, T., 318 (212), 319 (212), 338 Nordman, C. E., 279 ( 5 5 ) , 287 NovBkov&, M., 107 Novikoff, A., 132, 220 Nunn, M., 246, 255 0 Okaya, Y., 259, 284 Okrent, D., 326 (362), 347
386
Olhocft, J. E., 324 (300), 343 Olsen, T. M., 307 (94), 308 (113, 114, 115), 331, 332, 333 Ombrellaro, P. A., 302 (61), 330 Ondis, H. B., 325 (331), 345 Ordway, F., 258 (4), 284 Osborn, R. K., 324 (299, 300), 343 Oxley, S., 240, 254 Ozerova, E. F., 58, 105
P Pace, A. V., 308 (108), 332 Palmer, R. G., 312 (144), 334 Papert, S., 141, 221 Parker, J. B., 323 (272), 324 (272). 325 (272), 342 Parker, K., 301 (44), 328 Parrette, J. R., 307 (95), 311 (95), 331 Parter, S. V., 317 (207), 338 Pask, G., 119 (16, 17), 139, 151 (60), 155 (84), 157, 210 (179), 212 (185, 186, 187, 189, 190, 192),216 (16), 219, 221, 222, 225, 226 Patterson, A. L., 262, 285 Pavlov, I. P., 207, 225 Pearson, J., 320 (218u), 321 (219, 226, 227), 339 Pendelbury, E. D., 323 (231), 341 Penny, S. K., 325 (310), 344 Pepinsky, R., 258, 275 (44), 284, 286 Perkins, S. T., 322 (248), 341 Peterson, R. E., 312 (141), 334 Peterson, W. D. 326 (355), 347 Petri, C. A., 200, 224 Pettibone, J., 301 (49), 329 Petzval, J., 235, 253 Pfeifer, C. J., 308 (109, 110), 332 Pfeiffer, R. A., 325 (317), 344 Phillips, D. C . , 259 (ll),275 (41), 278 (41), 279 (41). 284, 286 Piaget, J.,205, 224 Pickering, W., 307 (96a), 331 Pickworth, J., 264 (19), 285 Pilling, D. E., 258 (9), 271 (9), 273 (9), 284 Pitts, W., 134, 171, 193, 220 Podlovchenko, R. I., 33, 107 Polya, J., 168, 222 Pomraning, G. C . , 303 (69), 304 (70,74), 305 (81), 330, 331
AUTHOR INDEX
Poncelet, C. G., 313 (154), 335 Pozzi, G. P., 326 (370), 348 Preiser, S., 322 (242), 340 Pribam, K., 176 ( l l l ) , 223 Pringle, J. W. S., 163, 222 Prosen, R. J., 258 (3), 264 (19), 271 (3), 276 (3), 284, 285 Pruvost, N., 326 (365),,347 Pull, I. C . , 325 (307), 344 Pun, L., 116, 218 Putnam, G., 322 (253), 323 (253), 341 Pyle, R. S., 326 (360, 361), 347
Rosenblatt, F., 135, 142, 220 Rossmann, M. G., 278 (52), 281, 286, 287 Rotenberg, A., 324 (290, 292), 343 Ruane, T. F., 311 (131), 333 Russell, B., 216, 226 Ruth, B. H., 315 (197), 338
S
Saalbach, C. P., 309 (174), 336 Samelson, K., 367 (18), 377 Samuel, 187 Sandberg, R. O., 326 (366), 347 R Sangren, W. C . , 289 (1, 2), 295, 296 Rabinowitz, G., 322 (242), 340 (28, 29, 30), 326, 327 Radkowsky, A., 294, 295 (14), 296, Satkus, D., 326 (362), 347 Sauer, A., 321, 339 313 (l68), 327, 336 Sawicki, S., 80 (62), 107 Raiffa, H., 218, 226 Rashevsky, N., 156, 222 Sayre, D., 282, 287 Schade, J. P., 207, 225 Raso, D. J., 324 (297), 343 Redheffer, R., 306 (86), 331 Schaefer, E. A., 308 (113), 332 Redmond, R. F., 326 (351), 346 Schechter, S., 295 (26), 327 Reihing, J. V., Jr., 326 (356), 347 Scher, J. M., 110 ( l ) , 218 Reilly, E. D., Jr., 312 (149), 315 (149), Schiff, D., 305 (78), 330 Schmidt, E., 323 (255), 341 317 (149), 318 (149), 335 Reno, T. J., Jr., 301 (54), 329 Scholl, D. A,, 207, 225 Schot, J. W., 307 (97), 311 (97), 312 Replogle, J., 309 (192), 337 (138), 317 (97), 318 (97), 332, 334 Reynolds, W. C . , 326 (34I), 346 Rhoades, W. A., 309 (185), 337 Schwartz, J. I., 368 (30), 377 Schwarzschild, K., 236, 253 Rhodes, H. H., 326 (339), 346 Richtmyer, R. D., 294 (15), 295 (17), Scott, R. L., 323 (280), 325 (280), 348 321 (233), 323 (270), 324 (270, 286, Sechvinsky, V., 215 (165), 225 290), 325 (270, 315), 327, 340, 342, Sedlbk, J., 92, 95 (52), 107 343, 344 Seeds, W. E., 279 (58), 281 (58),287 Segal, B. M., 312 (143), 334 Rief, H., 325 (325, 326), 345 Rigler, A. K., 326 (348), 346 Seidel, J., 235, 253 Risti, H. A,, 324 (285), 343 Selengut, D. S., 304 (71), 306 (84). 307, 330, 331, 332 Roberts, W. J., 324 (296), 343 Robertson, J. H., 263, 264 (19), 285 Selfridge, 0. D., 174, 176, 177, 196, 222, 223 Robertson, J. M., 270 (26), 285 Robinson, G., 277 (49), 286 Sesonke, A., 291 (13), 327 Rollett, J. S., 271 (29), 277, 278, 285, Setchvinsky, 207 Shain, R. B., 309 (186), 337 286 Rooney, K. L., 324 (296), 343 Shannon, C. E., 116, 215, 219 Roos, B. W., 296 (30), 327 Shanstrom, R., 326 (374), 348 Rose, R. P., 326 (361), 347 Shapiro, D., 322 (253), 323 (253), 341 Roseberry, R. J., 311 (131), 333 Shaw, C. J., 352 (10, l l ) , 368 (lo), 376 Rosen, R., 120, 155, 219, 222 Shaw, J. C . , 173, 176 (115), 222, 223 Rosen, S . , 251, 255 Sheheen, T. W., 322 (248), 341
367
AUTHOR INDEX
Shimizu, A., 306 (85, 891, 331 Shoemaker, C . B., 264 (19), 265 (20), 285 Shoemaker, D. P., 265 (20), 275 (42), 276 (42), 277 (50), 281 (el), 283 (42, 43), 285, 286, 287 Shoemaker, V., 277 (50), 286 Shortley, G., 316 (202), 338 Shreider, Yu. A., 54, 57, 68 (53, 54), 107 Shtarkman, V. S., 24 ( 5 5 ) , 37, 43 (55), 107 Shudde, R., 301 (56), 329 Shufler, S. L., 302 (65), 330 Shura-Bum, M. R., 24 (27), 29 (27), 30 (27), 34 (27), 38, 48 (64), 52, 68 (27, 28), 106, 108 Shure, K., 323 (267), 342 Shurygin, V. A.. 60, 62 (56), 107 Sim, G. A., 275 (45),286 Simon, H. A., 173, 176 (115), 198, 222, 223 Singleton, J. A., 142, 221 Skinner, B. F., 205, 224 Sly, W. G., 275, 283 (43), 286 Smiley, J. W., 302 (63), 330 Smith, J. A. S., 273 (36), 286 Smith, R. B., 308 (108), 332 Smith, T., 231, 252 Sobolev, S. L., 66, 107 Solanes, M., 312 (167), 313 (167), 336 Solomonoff, R. 193, 223 Spanier, J., 323 (273, 274), 324 (273, 274), 325 (273, 274, 318, 328, 329, 330, 331), 342, 344, 345 Sparks, R. A., 258 (3), 269), 271 (3), 276 (3), 277 (46), 278, 279 (56) 282 (66), 284, 285, 286, 287 Spencer, G. H., 249 (75), 250 (76), 255 Spencer, M., 280 (59), 287 Sphar, C. D., 308 (112), 332 Stark, R. H., 236, 253, 295 (21), 313 (21), 316 (21), 327 Stavroudis, 0. N., 239, 240, 254 Steele, L. R., 324 (302), 344 Steinberg, H. A., 325 (313, 321, 323), 344, 345 Steinbuch, K., 124, 219 Stewart, R. M., 140, 221 Stiefel, E., 316 (203), 338
3 88
Stognii, A. A,, 52 (68, 591, 53, 54 (691, 95,107 Stone, S. P. 309 (140, 176, 177). 312 (139, 140), 315 (177), 318 (139), 319 (177), 334, 336 Stone, W. W., 325 (305, 317), 344 Stratton, W. R., 326 (367), 347 Struch, H. P., 326 (336), 345 Stuart, R. N., 309 (176), 336 Suarez, P., 301 (52),309 (172), 329,336 Suggs, M. C., 309 (175), 336 Sullivan, J. G . , 325 (327), 345 Svoboda, A., 101, 107 Swianiewicz, J., 42 ( 6 l ) , 71 (17), 80 (17, 61, 62), 106, 107 Synge, J. L., 235, 253 Szac, M., 322 (251), 341 Szorc, P., 80 (63), 108 T
Tadakatsu, A., 306 (89), 331 Tarjan, R.. 124, 219 Taussky, O., 325, 344 Taylor, C. A., 258 (a), 284 Taylor, W. K., 132, 174 (110), 220, 223 Teaford, H. L., 307 (98), 309 (98), 313 (98).318 (98), 332 Tedesco, A., 326 (385), 348 Teitel, R. J . , 309 (195), 338 Terasawa, S., 318 (212), 319 (212), 338 Thomas, A. W., 326 (335), 345 Thomas, D. H., 308 (123), 333 Thomas, G. M., 124, 219 Thompson, D. W., 326 (341), 346 Tinbergen, N., 205, 224 Tobias, M., 311 (132, 133), 312 (133, 147, 148), 313 (162), 318 (133), 319, 334,335, 339 Toda, M., 157, 222 Todd, J., 325, 344 Todt, F., 309 (189), 337 Toivanen, T., 308 (124), 333 Tompson, D. W., 322 (248), 341 Tong, S., 324 (289), 343 Toppel, B. J., 326 (373, 382), 348 Trantham, F. M., Jr., 312 (146), 318 (146), 334 Travis, L., 174, 223 Trifonov, N. P., 38, 48 (64), 108
AUTHOR INDEX
Trueblood, K.N., 258 (3),263, 164 (19), 271 (3, 301, 273 (37),276 (3), 284, 285, 286 Truter, M.R., 258 (9),271 (9),273 (D), 284 Tuecke, J. E., 325 (317),344 Turski, W., 70 (22, 23), 80 (22, 23), 88 (22,23, 65,66),106, 108 Tyror, J. G . , 307 (96).331 U
Uhr, L., 195, 197,223 Ullman, L. R., 193,223 Underhill, L. H., 298 (37),302 (37), 322 (37),323 (261),328, 341 Urbanus, F. R., 308 (110),332 Uthe, P.M., Jr., 326 (342),346 Uttley, A. M.,124, 137, 138, 194 (28), 207, 219, 220
V Vand, V., 275 (44),286 Van Norton, R., 295 (17),327 Vapnik, V. N., 132 (42),220 Var, R. E., 326 (342),346 Varga, R. S., 291 (lo),295 (22),312 (22),313 (151,161), 315 (10, 161), 316 (10,22, 151,200, 201),317 (206, 209), 318 (151,213, 215), 326, 327, 335, 338, 339 Vargofcak, D., 309 (184,185),319 (184) 337 Verbinski, V. V., 323 (268),342 Vinograd, C.,226 Visner, S., 323 (263b),342 Vlcek, J.,88, 106, 107 Volinez, G., 249, 255 Volk, E.R., 312 (143),326 (337),334, 345 Voloshin, Yu. M., 24 ( 6 8 ) , 49, 51 (16), 52 (l6),106, 108 von Bertalanffy, L., 116, 218 Vondy, D.R., 319 (216),339 von Foerster, H., 110 (3,4), 116, 132, 135, 218, 220 von Neumann, J., 156, 222 von Reichardt, W., 207, 225 Vossler, C.,195, 197, 223 Vyezalov, L. H., 48 (67),108
W
Wachendorf, F., 241, 254 Wachspress, E. L., 308 (116, 117),312 (l45,146a),313 (150),315 (145,150), 316 (145),318 (145,150),319 (146e), 333, 334, 335,339 Wade, J. W., 326 (380),348 Walbran, V. A., 309 (171),336 Walter, W. G., 157, 207, 222 Waiters, R.,250 (77),255 Warren, D.W., 367 (16),377 Warrington, J. A.,326 (386),348 Waser, J., 279 (54),286 Wasow, W. R., 315 (199).338 Watson, H.C . , 275 (41),278 (41),279 (41), 286 Wattanabe, S., 129,219 Weaver, W. E., 116 (14),215 (14),219 Weinberg, A. M.,291 (ll), 297 (36), 298 (36),303 (ll),327, 328 Weiner, N., 116, 163, 193, 199, 218 Weinstein, W., 231, 238, 252, 254 Welford, A. T., 205, 224 Wells, M.B.,326 (362),347 Wenstrup, F. D., 309 (180),337 Wetherell, E.,324 (292),343 White, J. G . , 264 (lg), 285 Whitehead, A. N., 216, 226 Whittaker, E.T.,277 (49),286 Wick, G.C . , 321 (231),340 Widrow, B.,134, 220 Wiesel, T.N.,207, 225 Wigner, E. P., 291 (ll),297 (32),303 (ll),327, 328 Wikner, N. F., 323 (263a),342 Wilf, H.S., 323 (278),325 (278),342 Wilkins, J. E., Jr., 303 (68),330 Wilkins, M. H. F., 279, 281 (58),287 Williams, H.T . , Jr., 313 (155),335 Williams, J. S., Jr., 326 (359),347 Williams, R.E., 124,219 Willis, G. D., 134, 142, 220 Wilson, H.R., 279 (68), 281 (58),287 Wing, G.M.,306 (Ski), 331 Winograd,216 Wirth, N., 352 (13),368 (13),376 Wolf, B.,312 (137),313 (137),334 Wolf, D.E.,308 (113),332 Wolf, W. H., 312 (141).334
389
AUTHOR INDEX
Wolfe, A., 295 (17), 327 Wolfe, R . N., 240, 254 Woodcock, B., 222 Woodger, M., 368 (24), 377 Woolfson, M. M., 282 (66), 287 Worlton, J., 322 (247), 340 Wos, L. 326 (381), 348 Wright, J . B., 367 (16), 377 Wrigley, H. E., 307 (96),331 Wunderlich, L . H., 326 (353), 346 Wynne, C. G., 245,246, 255 Y
Yabushita, S., 305 (go), 331 Yanenko, N. N., 60, 62 (56), 107, 108
390
Yanov, Yu. I., 24 (72), 25 (71), 30, 32, 108 Yiftah, S.,322 (251), 341 Yngve, V. H., 352 (9),375 (9), 376 Young, D. M., 316, 318 (213, 215), 338 Young, J. Z., 208, 225 Z
Zerby, C. D., 325 (310), 344 Ziering, S., 305 (78), 330 Zondek, B., 317 (205), 338 Zweifel, P. F., 299 (41), 301 (46, 60), 328, 329
Subject Index A ABC (American Broadcasting Company), 15, 17-21 Abstraction, 198 Abstractive sorting program, 187 Abstractive mechanisms, 197 ACE,350 Action function, 133 Adaline, 134 Adaptation, 110, 166 maturational, 133 Adaptive, 118, 146 computing machines, 130 machines, 212 threshold systems, 135 AEC Computing and Applied Mathematics Center, 295 ALGOL,24, 25, 46, 48-52, 66, 70, 72, 77, 83, 101, 350, 352, 368-376 Algorithm, 214 inductive inference, 193 Algorithmic languages, 368-374 Alternating direction iterations, 318 American Broadcasting Company see ABC American Nuclear Society, 290, 296 Amplitude see Structure Analog computers (in crystallography), 258 Apportionment, 4 APT, 352 Argonne Code Center, 296, 325 Arithmetical formulas, operators, 29, 35, 36, 74, 96, 103 Artificial intelligence, 110, 112, 165 Artificial neurones, 130 Assembly languages, programs, 8ee Symbolic Associative memories, 360 ATLAS,284 Attribute, 215 AUTHOR,260
Automata, 110 adaptive, 121 localized, 110, 114, 118, 144, 156 probabilistic, 124 unlocalized, 114, 144, 165, 156 variable, 121 Automatic Computing Engine (ACE),360 AVIDAC,294 Awareness heuristic, 192
B Backus normal form, 353 Behavior inferential, 137 macroscopic, 141 Bendix-CDC-GZO,374 BESK,296 BESM,34, 40, 42 Bibliography of nuclear reactor codes, 294 Binary scaling, 73 Block-diagonal least squares refinement, 276 Boolean expressions, functions, 74, 76, 142 Brain, 111, 113, 156, 165 Broadcasting media, 3 C Calcium thymidylate, structure of, 263 Cards, punched, 5 Causal descriptions, 146 CBS see Columbia Broadcasting System CDC 1604 computer, 297 C-E-I-R, 15, 16 Census Bureau, 3, 4 Chain expressions, operators, programs, 96-100 Characteristic functions, 84-88 Charting, automatic, 15 Chebyshev polynomials, 305, 316 Chebyshev extrapolation, 319 Chemical restraints 8ee Restraints
391
SUBJECT INDEX Circuit, decision, 161 Classification of nuclear reactor codes, 291-297
COBOL,71, 88, 101, 352 Columbia Broadcasting System, 1, 17-21 COMIT, 375 Communications on election night, 5, 15 Compilers, 367 8ee aEao Symbolic assembly programs, Translators Concept learning, 206 Confidence, 16 limits, 12 Congressional districts, 3, 4 Congressional elections, 13 8ee also Representatives Conjugate gradient method, 245 Conjunctive concepts, 168 Consistency, 117 Contour detector, 133 Control hierachies, 144, 146, 153, 179, 194 Control operators. 27 Control, unstratified, 204 Conventional numbers, 35, 43 CPC bee IBM Criticality calculations, 316, 320 Cybernetic analysis, 110
Difference learning, 183, 186 Difference patterns, 188 Differential Fourier synthesis refinement, 266, 269, 275, 276
Differential equations, programs for, 9295
Differentiation of polynomials, 61 Difficulties, 200 Diffraction effects (in optical design), 230, 240
Diffractometers, 269 Diffusion equation, 308-315 nee alao Multigroup, Finite difference methoda Diffusion theory, 302. 309-319 Digital probe, 32 Discrete ordinate approximation, discrete 8,approximation, 304, 321, 322 Displacement 8ee Mean-square displacement Display, 14, 15 District of Columbia, 6 Division of Reactor Mathematics and Computation, Amer. Nuclear Soc., 296 DNA, structure of, 279-281 Double P , method, 305, 321
D
E
Damped leaat-squares method, 246, 251 Daspan links, 5 Data structure (in algorithmic languages),
Economic characteristics (of states, precincts), 3, 8, 14, 16 Economic data, processing of, 88-92 Economic information, 89 Educable machines, 209 Eikonal. 236 Electoral college, electoral vote, 6, 12, 16,
374
Datatron, 296 Decision circuit, 161 Democratic Party, 2 Demooratic characteristics, 3, 8 Dendrites, 140 Descriptions causal, 146 teleological, 1 4 6 1 4 8 Descriptive language, 114 Design parameters, optical, 228, 242, 244, 246
aEao Lens parameters Detector, contour, 133 Device, single-layer recognition, 134 Diagonal least-squares refinement, 276 Difference Fourier synthesis refinement, 8ee
268-268
392
16
Electron density, 260-262 EMAL,69, 88 Encumbrance set, 63 Entropy, 116 EPAM,197 EPOS, 101 Errors, in prediction, 14 Ershov’s algorithm, 46-48 Ethnic characteristics (of states, precincts), 3, 14, 16 Evolution, 111, 163, 173 medium for, 112 Evolutionsry system, 150
SUBJECT INDEX Exchange, telephone, 111 Experiments, hierarchy of, 147 Expressions (in algorithmic languages), 355, 369
External selective mechanism, 126
F f-number, 233, 239 Fermat’s principle, 234 Few-group approximation, 310 Fibers, 130 passivated, 140 Filter, learning, 151 Finite difference methods for diffusion equation, 308 Finite life span, 115 Fission reaction, 291 Florida, 2 Focal length, 233, 237, 245, 248 Food, 167 store, 167 networks, 161 Formal representation, 119 Forming operators, 27-29 FORTRAN, 290,294’307, 351,352,368-375 Fourier series calculation, Fourier synthesis, 261, 264-266, 273-275 m e alao Weighting, Difference, Differential Future transfers, 363, 367 G Gauss elimination method, 316 Gauss-Seidel iteration method, 316 Gaussian quadrature, 321, 322 General problem-solving program, 173 Generating function, 12 Geographical characteristics (of precincts), 14, 16
Geometric aberrations, 230 Geometrical optics, 230, 240 Goals, hierarchy of, 146, 153 Goal-directed, 115, 180 G.P.S., 173, 177, 181 Gubernatorial elections, 4, 13
H Harker peaks, 262-264 Harvard Mark IV computer, 242
Heuristics, 113, 173 awareness, 192 Hierarchy of abstraction, 194 of control, 143, 144, 146, 153, 179, 194 of experiments, 147 of goals, 146, 163 Hierarchical structure, 112 Homeostat, 123 House m e Representatives Hypothesis testing, 175 I
IBM computers type 604: 238 types 650, 701, 702: 294 type 704: 275, 283, 290, 296, 297 types 705, 709: 296 type 1620: 275 type 7090: 15, 17-21,274, 275,281, 284, 296, 322
type 7094: 297 CPC, 294 NORC,294, 296, 307 STRETCH, 284 Identification of alphabetic inscriptions, 84 Identifiers (in algorithmic languages), 354, 369
Ignorance, observer, 151 Illinois, 2 Indefinite life span, 116 Index of organization, 116 Inductive inference algorithm, 193 Inferential behavior, 137 Information assembly, 89 Inner iterations, 319 Intellect, 165 Intelligence, artificial, 110, 112, 166 Internal environment, 111 International Atomic Energy Agency. 290 Interval. refractory, 141 IPL, 352
J Jacobi polynomials, 305
JOVIAL, 352, 368-374 K Kennedy, John F., 16-21 Key precincts, m e Precincts
393
SUBJECT INDEX L Lability, 166 Language descriptive, 114 mixed, 204 object, 204 Languages, 147 algorithmic, 368-374 level of, 147 LARCsee UNIVAC Lattice, symmetric, 189 Laue equations, 260 Learning, 110, 118, 174, 182 concept, 205 difference, 183, 185 filter, 151 models, statistical, 129 Least-squares method, 239, 246, 248 Least-squares refinement, 266, 269, 270, 276-281 Legendre polynomials, 205, 320, 321 Lens bending, 234, 242, 250 Lens design (optical), 228-230, 233-250 Lens parameters, 234, 236 see also Design parameters Lethargy, 297 Level of Languages, 147 Levels of organization, 163 LGP, 275 Life span, finite, indefinite, 115 Line relaxation method, 317 Linguistic system, 64 List prograniming, 194 Localized automata, 114, 118, 144, 156 Logical conditions, 25, 27, 29, 30, 33, 37 Logical scale, 32, 48, 73
M Macro-instructions, 350, 351, 363 Macroscopic behavior, 141 MAD,368-374 Maine, 2 Manchester University computer, 238 Marginal rays, 231 Matching process, 194 Material transformations of programming schemes, 31, 33 Mathematical model for election forecasting, 5-9
394
Matrix approximations, least-squares, 277, 278 Maturational adaptation, 133 Mean-square displacements, 272 Mechanical restraints see Restraints Mechanisms, computing, 110 abstractive, 197 external selective, 126 Medium, 111 for evolution, 112 Memistors, 134 Memorization, 198 MERCURY,296 Meridian ray, 231, 232 Merit function, 241-244, 247-249 Metalanguage, 146, 167 Mississippi, 21 Mixed Language, 204 Model for clection forecasting, 5-9 Model, self-organizing system, 126 Model sampling see Monte Carlo Models, statistical learning, 129 Monitor (ofTV election coverage), 16, 16 Monte Carlo, 309, 323-326 Multichannel method see Synthesis techniques Multigroup approixmation, 301, 303, 310 Multigroup diffusion equations, 303, 309 in one space dimension, 314, 315 i n 2 space dimensions, 312, 313, 315 in 3 space dimensions, 311, 315 Multilevel preparation of programs, 31, 52 Multiplication factor (in fission reaction), 292 N
NAREC,294 NBC (National Broadcasting Company), 15, 17-21 NELIAC,350, 352, 368-374 Networks, food, 161 Neuristor, passivated, 140 Neuristor transmission line, 138 Neurones, 141, 217 artificial, 130 Neutron flux, 297 Neutron transport equation, 292, 297-306 Neutron transport problem, 306-326 News servicos, wire, 3 Newton’s method, 260
SUBJECT INDEX
Nixon, Richard M., 16-21 Nonnumerical autocode, 60 Nonstandard operators, 29, 41 NORC gee IBM Nuclear Codes Group, 296
0 Object language, 146 Observer ignorance, 151 Odds, 16 ODRA,70 Operators, 25, 27-29, 41 Operator schemes, 62 Optical methods (in crystallography), 258 Optical path length, 235, 241 Optical scanning devices, 15 Optimization (of codes), 42-48, 96 Optimum gradient method, 244 Optimum relaxation factor, 318 Organization, index of, 116 levels of, 163 self-, 110, 116, 126 Outer iteration, 316, 320 Overrelaxation, 316-3 18 P
P , approximation, 304, 320, 321, 323 Pandemonium, 176 system, 196 Pandora’s box system, 149 Paper tape, punched, 5, 15 Parameter change operators, 27, 29 Parameter introduce operators, 27, 392 Paraxial rays, 236, 242 Passivated fibers, 140 neuristor, 140 Path difference errors, 241 Patterns, difference, 188 Pattern-recognizing, 132, 194, 195 Patterson synthesis, 262, 263, 281, 282 Perceptrons, 135 Phase angles, phase problem, 260, 261, 282 Philco 5-2000 computer, 297 Polish strings, 367 Political parties, 13 see aZso Democrats, Republicans Polls (of public opinion), 6 Popular vote, 16 PP Bee Programming programs
PP-1, 34 PP-2, 34-38 PPS, 40-42 PPBESM, 40 Precincts, 2 key, 7, 8 characteristics of, 14 Predicates, 25, 26, 29, 20, 37, 41 Presidential election, 4 Primitive-recursive functions, 65 Principal ray, 231, 248, 249 Probabilistic automation, machine, 124 Probability of election victory, 10, 11 Problem solving, 112 Processes, 177 matching, 194 Processors, 350 see also Symbolic assembly programs, Translators Product of operators, 25 Program structure, 374 Programming list, 194 Programming programs, “Programs which program”, 33-42,52,53, 67.68, 100 partial, 60 Projection, 2 Proof making, 193 Proteins, structure of, 279 Pseudo-random numbers .9ee Random sampling Pseudo-symmetry, 264 Punched cards, 5
0 Quasi-manual coding, 70 R
Radiation Shielding Information Center, 297 Random access memory, 16 Random sampling, random variables, generation of, 324, 325 Ray tracing, 231, 232, 241 RCA, 16 RCA-501 computer, 16, 17-21 Reactor and nuclear physics, codes for, 294 Reactor kinetics, 295 Reactor survey, codes for, 295 Readdressing function, 64 Readressing operators, 27-29, 41
395
SUBJECT INDEX Receptor, 133 Recognition, 174 device, single-layer, 134 Recursive, 180 Redistricting, 3 Redundancy, 11 6 Refinement procedures (in crystallography), 266-270 see aZ8o Difference, Differential, Fourier, Least-squares refinement Refactory interval, 141 Region of existence, 63 Registration (of voters), 13 Reinforcement, 217 variable, 136 Relaxation, 316-318 Relevance, 11 7 Relocatability, 351, 360 Repeat and check operator, 41 Repetitive calculations, 31 Representation, formal, 119 Representatives, House of, 4, 13 election for, 4 Republican Party, 2 Resolving power (of optical system), 230, 239, 240, 250 Restoring operators, 27-29, 41 Restraints, i n least squares, 278, 279 Rigid body vibration, 272 Rotation (in Patterson function), 281 Rotation matrix, 280 Rotational symmetry of optical systems, 232 S
S-FAC,258 S,,method, 322, 323 SAPO, 92 Sayre’s equation, 282 SEAC,238, 239, 294, 296 Search, 174 Seidel aberration, 236, 249, 260 8ee a h Third order aberrration Selective mechanism, external, 126 Selective Sequence Electronic Calculator (SSEC), 238 Self-organization, 110 Self-organizing, 116 model system, 126 Senatorial elections, 4, 13, 14
396
Shielding calculations, codes for, 296 Shtarkman’s algorithm, 43-46 Siberian dialect, Siberian language, 49-61 SIMBCRIPT, 374-376 Single-channel method 8ee Synthesis techniques Single-layer recognition device, 134 Skew rays, 231, 232 Slab geometry, 306, 321 SLAMB, 246, 247 SMALQOL, 368, 374, 376 Snell’s law, 231 Social characteristics (of states, precincts), 3, 14, 16 Source iteration, 316 Space group, 260 Span of life, finite, indefinite, 116 Spherical harmonics, 304, 320 Spot diagrams, 238-240 Stability, 123 Standard subprograms, standard subschemes, 38 Standard Lyapunov operator, 29 State graph, 166 Statements (in algorithmic languages), 353, 356, 357, 369-373 Statistical learning models, 129 Steepest descent, 242-244, 247, 248 Steering parameters, 26 Store, food, 167 STRELA,34, 38, 40, 42, 43, 62 STRETCH 8ee IBM computers Structure amplitude, 260 Structure factors, 260-262 calculation of, 270-273 Superposition (of Patterson functions), 281 Survival, 123, 163 Swings (in elections). 7, 8 Switching logical conditions, operatore for, 27, 29 Symbiosis, 114 Symbolic addresses, 72, 360 assembly programs 351, 360-363 coding, 361 language, 361, 362, 363, 367 numbers, 72 Symmetric lattice, 189 Syntax, 364
SUBJECT INDEX Synthesis techniques (for neutron transport problems), 306, 308 Systems adaptive threshold, 136 evolutionary, 150 pandemonium, 196 Pandora’s box, 149 T Tally (of votes), 14 Tapes, teletype, see Paper tapes Teleological descriptions, 146-148 Telephone exchange, 111 Temperature parameters, 270 Tesselation, 156 Testing, hypothesis, 176 Thick lenses, 234 Thin lenses, 233, 234 Third-order aberrations, 236, 241, 251 see also Seidel aberrations Third order lens design, 234-237 Three-line relaxation, 318 Threshold, 134 systems, adaptive, 136 Time zones, 3 Topology, use in programming, 62-66 Tote unit, 176 Tranceivers, 6 Transfer operators, 27-29 Translation, translators of algebraic formulas, 350 of algebraic languages, 351, 363-368 of natural languages, 376 Transmission line, neuristor, 138 Transport equation, transport theory, 319-323 eee aleo Neutron transport Trend, 7
Trial structure, 258 Turnout (at elections), 12, 13 Two-line method, 317 U
Ultrastable, 123 Unit, tote UNIVAC,1, 16, 17-21 -I, 290, 295 -1103, 296 LARC,296 Unlocalized automata, 114, 144, 166, 166 Unstratified control, 204 URAL,34, 69, 70, 80, 81
V Variable, reinforcement, 136 Variables, in algorithmic languages, 366, 373, 374 Variance (of election estimate), 8-10 Variance reduction techniques, 324 Vibration, 270 Bee also Rigid body vibration Vitamin B,,,structure of, 264, 265 Voting machines, 2
W Wave front aberration, 241, 242 WEIZAC,247 Weighting, 275, 277 Wire news services, 3 X X-RAC, 268 XYZ computers, 69, 71, 74 Z
ZAM computers, 69, 71-74
397
This Page Intentionally Left Blank