Parallel to Perception: Some Notes on the Problem of Machine-Generated Art
HAROLD COHEN University of California at Sa...
51 downloads
1398 Views
4MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Parallel to Perception: Some Notes on the Problem of Machine-Generated Art
HAROLD COHEN University of California at San Diego
In a very large number of applications the computer is used for its ability to perform a set of predetermined transformations upon a set of data, and this kind of use has become standard in 'computer art', where the data is some original provided by the artist. If the aim is to mode! human art-making behavior, rather than merely to use the machine as a too! in this quite traditional sense, such a definition of the machine's functions is inadequate. Human art-making behavior is characterized by the artist's awareness of the work in progress, and programs to mode! such behavior will need to exhibit a similar awareness. Thus, 'behavioral functions' are defined here as functions which require feedback from the results of their actions as a determinant to their subsequent actions. Programs designed upon this specification will also require appropriate schema for the description of the work in progress. The feedback systems employed in intelligent behavior might be pictured as the asking of questions about the perceptual world whose answers will be relevant to decision-making. For the machine, 'awareness' of the work is totally defined by this question-and-answer structure, and in this sense is equivalent to the human perceptual system. It is not clear what descriptions of the work will serve for a reasonable simulation of human art-making behavior, or what questions will need to be asked. They will not necessarily reflect the 'facts' of the human system, but it seems likely that the machine's feedback system as a whole will need to possess a comparable adaptiveness to permit of the fluently changing pattern of decision-making which characterizes the practice of art. If a photographer takes a picture, we do not say that the picture has been made by the camera. If, on the other hand, a man writes a chess-playing program for a computer, and then loses to it, we do not consider it unreasonable to say that he has been beaten by the computer. Both the camera and the computer may be regarded as tools, but it is clear that the range of functions of the computer is of a different order to that of the camera. Tools serve generally to extend or to delimit various human functions, but of all the many tools invented by man only the computer has the power to perform functions which parallel those of the mind itself, and its autonomy is thus not entirely illusory. The man actually has been beaten by the machine, and if the program was structured appropriately its performance might be considerably better, in fact, than it was when he first loaded the program. If we acknowledge the machine's autonomy in this kind of situation, would it not seem reasonable to consider the possibility of autonomous art-making behavior, not in the trivial sense that it can control the movements of
a pen, but in the sense that it can invent those movements? Would it be possible, for example, for the machine to produce a long series of drawings rather than a single drawing, different from each other in much the same way that the artist's would be different, unpredictable as his would be unpredictable, and changing in time as his might change? If the answer to this question is that it would, then it would seem to follow that some of the machine's functions will need to parallel, at least in a primitive way, some aspects of the human perceptual process. The drawing will be the real world for the computer, just as it is a part of the real world for the artist: and just as the artist will deal with the drawing in terms of gestalts rather than in terms of raw data, so the machine will need to formulate characterizations of the current state of the drawing, rather than treating it merely as an agglomeration of marks and non-marks. I should say that I consider the possibility of this kind of behavior to be a real one, and not merely speculative, for the reason that my own work with the computer has
Reprint from Computer Studies IV-3/4, 1973
1
gone some part of the way towards realizing it, as I will try to show; far enough to suggest that the rest is attainable. But I should define my position more carefully, for the simulation of human perceptual processes arises as a result, rather than as a motive, for this work. Whatever other territories may appear to be invaded, I believe that my behavior in programming the machine to simulate human art-making behavior is, in itself, primarily artmaking behavior, and I have proceeded by attempting to deduce from the requirements of the venture as a whole what perception-like abilities may be appropriate. The plausibility of the resultant structure must thus rest upon the success of the whole system in satisfying its purpose, rather than upon whether it appears to provide a satisfactory model of perception. If the whole system can autonomously generate art — autonomously, that is, in the obviously qualified sense used above — then we will know something more about ways in which art may be made, and conceivably something about the way in which human beings make it; but not necessarily about the specific mechanisms upon which human art-making rests. The purpose of this essay, then, is to say something about the nature of the characterizations, or representations, of the work in progress which the machine will need to build; and about the constraints under which they are formulated. Unfortunately, so much confusion now exists in the interface areas between art and computers, thanks to the strange manifestation popularly known as 'computer art', that some clarification will be needed before I can get to my subject. Evidently the power of image-making retains something of its primitive magic even in a society as familiar with images as our own; and, like the camera, the computer seems to exert a democratizing influence, making this power widely available, where it was at the disposal, previously, only of an elite with the skills and abilities to exercise it. Image-making is in the province of anyone with the price of an Instamatic and a roll of film; anyone with access to a computer and a little programming experience. The programmer starts with carefully digitized Snoopy drawings, progresses to rotating polygons, and by the time he gets to polynomial functions he is ready for the annual Calcomp Computer-Art Contest. For most people outside of art, probably, art is directed primarily at the production of beautiful objects and interesting images; and who is to argue that a complicated and intricate Lissajou figure is less beautiful than an Elsworth Kelly painting or a Jackson Pollock; or that a machine simulation of a Mondrian is less interesting than the original it plagiarizes? To talk of beauty or of
interest is to talk of taste, and matters of taste cannot be argued with much profit. The fact is that art is not, and never has been, concerned primarily with the making of beautiful or interesting patterns. The real power, the real magic, which remains still in the hands of the elite, rests not in the making of images, but in the conjuring of meaning. And I use the word meaning in a sense broad enough to cover not only the semantic content of the image itself, but all that is involved in the making of the image. The particular kind of usage the computer has received in almost all 'computer art' offers some clue as to why 'computer art' can barely claim consideration as art at all. With a few notable exceptions the machine has been used as a picture-processor, which is to say that it is programmed to perform a number of transformations upon material previously defined by the artist. In this role it has something in common with other processes used traditionally by the artist, and yet it has failed to support the dynamic interplay we normally expect between a process, the art-making intentions which give rise to its use, and the formal results of that use. The cause of this failure may be the relative inflexibility of the processes available, but I am inclined to believe that it is dictated by the whole structure rather than by inadequate implementation of the structure. It should go without saying that it is beyond the power of a process to invest an image with significance where none existed before; that if you cannot draw without a computer — and by drawing I mean the conjuring of meanings through marks, not just the making of marks — it seems unlikely that you can draw with one. At all events, it is clear that the use of the computer as a tool in the sense that a camera is a tool represents the antithesis of autonomy, and is thus not my subject here. All the same it will be worth examining the notion of picture processing as a starting point in order to see how other possibilities relate to it. Diagramatically (Fig. 1) we might think of the processor as a black box, with a slot at the top through which original material is fed, and a slot in the bottom through which the processed material exits. The range of possible processes, or transformations, is fixed for any given configuration of the machine, but they may be selected and concatenated by the user, who also has a measure of quantitative control over their application — if he chooses to transform an image by rotating it, then he can specify how much it is to be rotated, for example. The actual number of processes which can be programmed is large, but, as one might almost anticipate, the few examples of exceptional quality which have occurred have tended to make use of
2
which run through neo-Pythagorean cosmology for some hundreds of years of European thought, and it is precisely the proportions of the figure which Csuri has chosen to manipulate. From a processing point of view, the drawing (Fig. 3) by Kenneth Knowiton and Leon Harmon does no more than to replace small square patches within the original
quite minimal processing functions, and instead of burying the original image, lay emphasis precisely on the metamorphosis itself. One such example would be Charles Csuri's manipulations of the Leonardo Vitruvian Man (Fig. 2), in which the figure retains its identity even after being distorted, rubber-sheet fashion, in a number of unlikely ways. The manipulations are effective for reasons of content rather than form, however, and are even part of that content, since the Vilruvian Man is a symbol for
Fig. 3. Kenneth Knowiton and Leon Harmon, "processed photograph – Vietnamese child.”
Fig. 2. Charles Csuri, "Circle to square transformations, based on Leonardo's Vitruvian Man".
those notions about the proportions of the human body
photograph with an alphabetic character, chosen from a range of different type faces, each with a proportion of white character to black space equivalent to the gray level in that patch. The processing function is a simple, onestep affair, and again, the interest lies behind the image itself, since Knowiton and Harmon are concerned primarily here with the nature of visual information; not only, WHAT is read, but, HOW it is read. Even so, the choice of original material is evidently by no means entirely neutral, since the text which is used for transfiguring the picture of the Vietnamese child is in fact the Declaration of Human Rights. Processes may range from the simplest geometrical transformations to highly complex systems in which images of objects are not only rebuilt from the collection of three-dimensional coordinates which represent them,
3
but can be shown moving around in correct perspective, complete with the shading and shadows caused by any specified lighting conditions. In practice, the machinery for operating these transformational systems may vary enormously, and the wholly electronic nature of the more modern devices allows them to operate at speeds which give the illusion of a direct interaction between the user and the process. Rather than looking at the processed image drawn on paper, then resetting the parameters for the processing functions and starting again, the image can be displayed on a screen, and will change as fast as the knobs are twiddled (Fig. 4). (In fact, operated in this way, the processor inevitably takes on some curious
Fig. 4
affinities to musical instruments: one seems to be playing it, rather than playing the image with it, and the problems relating to the (outside of music) unresolved differences between improvisation and composition which have already appeared in the use of the video-synthesizer must arise here also.) As far as the manipulation of the image is concerned, however, the speeding up of the system might induce a change of attitude, but does not represent a change of structure. What all these things have in common is their diagrammatic representation — the original going in at the top, the processed image coming out at the bottom, selective and quantitative control only over the process — and also the fact that at no point does the machine need to read back what it has done. By definition processing is a deterministic affair, and for any single run its functions are predetermined and invariant. Feedback from the result to the functions themselves has no part in this process. On the other hand feedback is clearly a part of the human art-making process, or indeed of any intelligent process, and if the only feedback possible within the computer environment is via the human user, then the computer is a tool in no essential way different from any other tool: and it is evidently capable, up to this point at least, of handling material of much less com-
plexity, and in much cruder ways, than most of the artist's more fertile, and more traditional, tools. Suppose, now, that we wished to modify this schema in some structural way, hopefully to arrive at something corresponding, in itself, more closely to human art-making behavior. What modifications would be possible, and what could we deduce in relation to them? We have already seen that increasing the number and range of the processes — adding more knobs and switches to the control panel, as it were — would make no fundamental difference: just as increasing the speed of the system would make no fundamental difference. Closing off the slot in the bottom of the box would simply render the system inoperative, in that there would be no result. If the slot in the top were closed off, could the system provide itself with original material upon which to function? The first answer would seem to be that it could. In fact, of course, the machine works on a description of the original rather than on the original itself, and although I have tended to write as though the description were always given as a set of points — that is to say, as a digitized version of an original — there are other ways of describing pictures. For that matter, sets of points could be included in the program which defines the processor rather than entering them after the program had been set up, although doing so would be merely a device, and would severely limit the versatility of the system. Other kinds of description, like the use of mathematical equations to describe curves, probably could not be entered conveniently after the program was set up, and would more properly form part of the program itself. Here, too, versatility would be seriously limited, since there is a relatively low limit on the number of kinds of curves which one might realistically hope to describe by means of mathematical functions. In any case, what becomes clear is that the question more usefully to be asked is not whether the machine could function with its input slot closed, but whether its program could actually GENERATE material, as opposed to being given it as it needs it on the one hand, or being given it in advance on the other. Note that, although it is factually true that a mathematical function can generate a set of points, I have treated it as a storage device rather as a generator, precisely equivalent to the list of points it will generate. I have done so for the reason that a curve is fully described by its equations, just as it is fully described by the set of points of which it is comprised. But suppose we were to find some way of writing a program that required no preliminary input, no 'original', that did not make use
4
of mathematical functions in place of input; that nonetheless succeeded in generating a graphic result; would it not then be true that the program as a whole fully described the image? That it was, in effect, exactly equivalent in that respect to the mathematical function? The answer I will give to this question is that programs can be written which do not fully describe the images they generate in the same sense that a mathematical function does. But we should examine the implications behind this answer with some care, since it appears to involve the question of whether a machine might be capable of non-deterministic behavior. I have some doubt whether any definitive answer can be given to this question: whatever more rigorous definitions of the term 'nondeterministic' might be available in other disciplines, it seems to me that here it relates to what we think human behavior is like at least as much as it does to what we think machine behavior is like. Thus it seems to become a problem of definition rather than a problem of identification, and my own question was proposed as a more meaningful alternative to it. And in answering this question more fully, I will try to demonstrate the possibility of what I will term behavioral functions, which differ from mathematical functions in that they require feedback from the image — the image in progress, that is — and contain the necessary feedback mechanisms within their own structures (Fig. 5). It would seem reasonable to say of such functions that they do not fully describe the images they generate in the sense that mathematical functions do.
ance, and initiating formal material. Once again, the question is heavily colored by our beliefs concerning the nature of human behavior. But to what extent could we reasonably maintain that the human mind initiates? Concepts are formed on the basis of prior concepts, decisions are made on the basis of feedback from the environment and from the results of previous decisions. The probability is that, if one could identify the starting point for an artist's whole life's work, one would find a set of concepts completely formulated if not completely digested, given to him and not initiated by him. We habitually speak of the artist 'beginning to find himself at some date much later than this starting point: the artist himself will tend rather to speak of his life and his work as a continuous self-finding process. Thus the question of starting points and starting material is misleading in relation to the machine's performance, not because the machine could or could not initiate material, but because the idea of the machine being loaded with a program, running the program, and stopping, forms a discrete unit which has no real parallel in human behavior. What we would need to imagine to establish a reasonable parallel is a machine equipped with an archival memory, running a self-modifying program not once, but hundreds or even thousands of times, modifying future performance on the basis of past performance (Fig. 6). In this state, the nature of the initial input might
Before going on to describe programs of this sort, and to talk about the nature of the feedback, and of feedback interpretation, I should deal more thoroughly with the initial premise; the notion, that is, of a machine which can provide itself with its own original material. This innocent-sounding suggestion reveals itself to be even more troublesome than the idea of non-deterministic behavior, or perhaps simply a more troublesome formulation of the same idea: since what appears to be implied is nothing less than the machine initiating its own perform-
5
be of no more importance to the final outcome than the name and style of an artist's first teacher. I do not believe that the existence of such a machine is around the nearest corner: and there is no doubt that before we get to it, and to other machines which, like it, would profoundly challenge man's thinking about his own identity, there will be emotional roadblocks of significant proportions to be taken down. But, of course, it has already been demonstrated that machines can learn, given appropriate criteria for performance: and conceivably the idea that no such criteria can exist in art will prove to be simply one of the roadblocks. In practice, it is not possible to run a program from scratch without providing initial material. You cannot tell the machine, "draw some circles", you have to tell it how many circles you want drawn, as well as specifying a general program for drawing circles, how big you want them, and where they are to be placed. But it is possible to have the machine itself decide these things, and the programmer can make use of the machine's random number generator for this purpose. You tell the machine "draw some circles — anything from ten to thirty will do. I want them not less than an inch in diameter, and not bigger than three inches" ... and so on. We should not be too impressed with this ability: it is no more indicative of intelligence in the machine to make decisions randomly than it is for a human being to make his decisions by tossing coins. Intelligent human beings make their decisions this way only when the outcome does not matter, and what is at stake here is the programmer's tacit declaration that his program will function to give satisfactory results regardless of whether it has fifteen circles or twenty-five to work on. What we might anticipate from the hypothetical learning machine is that parameters would be initially set as random choices over very wide ranges, and that the machine would itself narrow those ranges down to the point where specific values could produce specific results. Let us turn now, finally, to the question of feedback, and of what kind of programs one might imagine could be built up given appropriate feedback structures. Any complex, non-organic system must make use of feedback structures to keep it in a stable state, just as any organism does. In the computer there will be such mechanisms functioning at electronic level, but these are not the ones under consideration since they are operating regardless of what the machine is doing, just as the body uses feedback to control its temperature regardless of what the mind is thinking. Similarly, organisms have feedback structures to control their physical movements, and while
something of the sort may be built into a computer program, they do not come high enough in the scale of activities for organisms or for mechanisms to be considered intelligent. The program which generated the drawings in Fig. 12, for example, has at its lowest level a subprogram which draws lines between pairs of points which have been determined higher up in the program. The sub-program uses a sort of homing strategy: it wants to wander freely as if it had no destination, but at each step it corrects its path, so that it arrives at its destination nevertheless, without overshooting and without needing to spiral in as a moth does around a candle flame. While the feedback structure is more sophisticated than the moth's, equivalent perhaps to those we might employ in driving a car, it is of essentially the same order, and the structures exhibited higher up in this program are of a different kind. I will return to these in a moment. I use the term 'feedback' in the most general sense, to denote, within a system, the passage of information back TO a function FROM the result of the operation of that function, such that the operation tends subsequently to be modified. In intelligent systems, we might thus characterize feedback, in a rough ad hoc sort of way, as the asking of questions relevant to continuing operation, and might even describe the complexity of the feedback system in terms of the number of questions the continuing operation requires. This is not to say, however, that the complexity of the result necessarily depends upon the complexity of the system. One of the most amazing examples of an apparently simple system yielding complex results is the 'Game Of Life', which has received enough attention recently not to require further description. In this matrix-manipulating program, the asking of the same single question for each cell in the matrix — how many of its neighboring cells are live and how many are dead? — is enough to provide for the generation of a rich set of patterns often possessing remarkable characteristics. If I would exclude the 'Game Of Life' from the class of systems I am discussing it is because, although it appears to be asking its question of the result of an operation, the operation itself never actually changes at all, and the result of one application of the operation simply becomes the input, the original, for the next application of the same operation (Fig. 7). In other words, the system should properly be considered as an iterative processor of ingenious design, in which the complexity of the result stems from the iteration rather than from the process itself. In practice, it is quite difficult to avoid the appearance of this processing structure in computing, since the whole methodology of programming is built upon the notion of
6
Fig. 7
iteration. But to say that processing structures are embedded in a program is not the same as saying that the program itself functions as a processor. The distinction becomes important if we pursue the idea that feedback complexity may be measured by the number of questions which need to be asked in order to determine subsequent operation, for clearly the total number of questions to be asked within the whole program to provide a single unit of information will depend upon a number of issues, not least of which is the availability of that single piece of information. We might imagine an artist having a piece of work made by telephone, updating his mental image of the piece by asking questions. "How many lines are there in the drawing now?" he might need to know: and he has no interest in whether the person on the other end will need to go and count them yet again, or whether he has been smart enough to keep an updated record. For our purposes we would say that only one question has been asked. Similarly (Fig. 8), the computer program could be considered as a two-part affair, in which the upper part — the 'artist', as it were — accesses the work in progress by interrogating the lower part about it. Our measure of feedback complexity is then given by the number of loops between the two levels, and is not concerned with how complex are the functions occurring within the lower part, many of which will certainly appear as processing functions in the sense we have already discussed. In many ways, then, the upper part might be thought of as using the lower part as the human user uses a processor.
Fig. 8
If I were to write a program which packed equal sized circles into a hexagonal array, no feedback at all would be required, since I could calculate in advance where all the centers would need to be, and would know without looking that there would be no overlapping. A program which sought to pack the same circles over and around an irregular projection (Fig. 9) would be a different matter,
Fig. 9
however, since the space available for each new circle could not be known until the previous circle had been
7
drawn. Similarly, a program which caused a number of loosely distributed circles to grow irregularly, amoeba-like (Fig. 10), until all the available space had been absorbed,
Fig. II. Harold Cohen, "Labeled Map" (1969), 102" x 192", oil on canvas.
much, since to draw each new line the program needs only to know which of many possible destinations may be reached without crossing an existing line, and, of them, which is the nearest and which the furthest away. But the program is structured in such a way that more particularized decisions may now be reached on the basis of more extended information requiring more complex feedback — which of the possible destinations will result in the straightest lines? Which is closest to the center of
would need to ask, for each new development of each developing shape: what is the current state of its boundary? What shapes lie in the direction of the proposed development, and what is the state of THEIR boundaries? Has this developing shape reached those existing boundaries yet? The end-state of the whole development (Fig. 11), which has been managed entirely by the program, has thus evolved from a unique set of events, and the large set of drawings which actually resulted from a long run of this program exhibited wide variety without any change in the operating parameters. A more recent program of mine ran continuously for a month during a museum exhibition, and, again, required no human intervention in making over three hundred drawings (Fig. 12) beyond changing the sheets of paper on the drawing machine and refilling the pens. The feedback in the current version of this program is more complex than in the previous example, but not by
Fig. 12A
Fig. 12B
8
Fig. 13
Fig. 12C, 12D, 12E
the picture? Which is in the densest part of the picture, and so on: with even greater variety of output than we have at the moment. But this brings me now to the central issue in this enquiry. I suggested before that feedback complexity might be measured by the number of questions which needed to be asked about the current state of the drawing, not by how difficult it would be to answer them. Consider the question: is the pen currently inside a closed shape? (Fig. 13). If the lower part of the program had been keeping up a running index of closed shapes, updating it every time a new closure was made, the question might be answered immediately. If not, it might not be able to give an answer at all within any reasonable time. Obviously one would not want to write a program which
asked unanswerable questions: or to put the matter the other way around, one would certainly want to be sure that the lower part was in a position to answer the questions one knew the upper part would need to ask; that the required information was either explicitly available, or easily derived from what was available. What may not be equally clear is that as far as the machine is concerned its awareness of the picture exists solely and exclusively in terms of this information, and it is by no means mandatory that this information be visual, in any sense which might seem to apply to human perceptual behavior, and which makes the recognition of closed shapes a trivial human problem. I am not in a position to judge what the relationship actually is in human perception between the outside world and the internal representation : my experience in teaching students to draw suggests that the internal representation of the visual world is certainly not exclusively in visual terms, and indeed that visual information may be a good deal harder to retain than information of other sorts. As far as the machine is concerned, the internal representation similarly need not be equivalent to a complete view of the picture — such as might be given by a television scan of the picture itself, or by a fine matrix in which each cell records the presence or absence of a mark in the drawing — and for many purposes such a representation would yield up the required information very poorly. Actually, they might be better regarded as transcripts than as representations. The need to model the machine's internal representation in terms of the upper program's special preoccupations is not merely a matter of efficiency, since once it is established it places an absolute limit on what the upper program will be allowed to do. There will always be a line beyond which the upper program will
9
not be able to go, questions in answer to which it will be told — sorry, we don't know anything about that. There is, of course, no mode of internal representation of the work in progress which could be described, meaningfully, as 'natural' to the machine, and no single universal mode to satisfy all possible requirements. Presumably the same could be said of human internal representations of the real world, since we do find it necessary from time to time to build new models, or at least to modify old ones, pushing back the line and finding ways of asking new questions. Whatever else it is, art is primarily a model-making activity. Thus unless we were limiting the aim to simulating the work of a particular artist at a particular time (a human process known as plagiarism, not as art) it would be obviously simplistic to think in terms of the machine's upper program needing to ask the same questions about the work in progress that the human artist would ask, and of the lower pro-
gram building representations of the work very much like those the human artist would build. The nature of art is not to be characterized in terms of specific sets of questions and representations, since these will be, by definition, in flux for any given artist and even perhaps peculiar to a single artist only. The interface between the questions and the representations, permitting fluid change in both, might reasonably be thought to possess more general properties, since art does change, at least within our own Western tradition. Thus I would conclude that the machine's autonomy rests upon developing total systems, in which the feedback structures linking the decision-making processes above with the characterizations below are sufficiently flexible, or adaptive, to support the changes which must occur in both; not upon pinning down particular characteristics of human perception, or particular formal aspects of human art.
10
ON PURPOSE AN ENQUIRY INTO THE POSSIBLE ROLES OF THE COMPUTER IN ART HAROLD COHEN
This is not another article about ‘computer art'. The development of the computer has brought with it a cultural revolution of massive proportions, a revolution no less massive for being almost silent. We are living now in its early stages, and it would be difficult to predict certainly well outside the scope of this article what changes will be effected within the next two or three decades. I think it is clear, however, that well within that period, subject to such issues as public education, the computer will have come to be regarded as a fundamental tool by almost every conceivable profession.¹ The artists may be among them. That will be the case, obviously, only if it shows itself to have something of a non-trivial nature to offer to the artist; if it can forward his purposes in some significant way. There is little in 'computer art' to justify such an assumption. On the other hand I have come to believe, through my own work with the machine, that there may be more fundamental notions of purpose, and a more fundamental view of what the machine can accomplish, than we have seen so far; and this article is intended as a speculative enquiry into that proposition. Speculation is cheap, of course, as the popular media have shown. If you fantasize any given set of capabilities for the computer, without regard to whether the real machine actually possesses them, then you can have it achieving world domination or painting pictures, falling in love or becoming paranoid; anything you wish. I would hope to offer something a little more rigorous, if rather less romantic. Thus I propose to proceed by describing the machine's basic structure and functions, and by giving a simple account of programs of instructions which it can handle with those functions. It should not prove necessary to make any speculation which cannot be stated in terms of these. All the same, the undertaking is not without its difficulties. There is no doubt that the machine can forward artists' purposes. It has forwarded a reasonable range of specific purposes already - some have been trivial, some have not - and there is no reason why that range should not be extended. But the significance of the question would seem to point to the notion of Purpose rather than purposes, implying, if not a hierarchical structure with Ultimate Purpose sitting on top as its informing principle, certainly a structure of some sort which relates all of an artist's individual purposes. The chain of interrogation: Why did you paint this picture blue ? Why did you paint this picture ? Why do you paint ? is thus a good deal less innocent than it might seem at first glance. I suspect that the notion of Ultimate Purpose enjoys little currency today: but then it must follow that Purpose is not to be arrived at by backtracking up a hierarchical structure from the things that an artist does, much less from the objects he makes. The problem is
Figure l The Hewlett Packard 2100 A computer is a small, fast, general purpose machine characteristic of the ‘minis’ now on the market
rather to propose a structure which can be seen, as a whole, to account for the things the artist does. The notion of Purpose might then reasonably be thought to characterize that structure, as a whole. In what terms, then, would it be possible to maintain that the use of the computer might ‘advance the artist's Purpose' ? Any claim based upon the evidence that 'art’ has been produced would need to be examined with some care, and in the absence of any firm agreement as to what is acceptable as art we would probably want to see, at least, that the 'art' had some very fundamental characteristics in common with what we ordinarily view as art. This could not be done only on the basis of its physical characteristics: merely looking like an existing art object would not do. We would rather want to see it demonstrated that the machine behavior which resulted in the 'art’ had fundamental characteristics in common with what we know of art-making behavior. This is already coming close to a more speculative position: that the use of the machine might be considered to advance the artist's Purpose if, following the earlier argument, it could be Seen that this use might itself generate, or at least update, an appropriate notion of structure. In either of these cases, it must be clear that my definitions have much in common with the curious way in which we ordinarily make our definitions of art. We would probably agree, simply on the evidence that we see around us today, that the artist considers one of his functions to be the redefinition of the notion of art². Or we might say that the artist uses art in some way to redefine, i.e. modify himself. But since he is the agency which is responsible for the art process which effects the modification, we could restate this: the artist who uses art to modify the artist who uses art to modify.... These are recursive³ structures. I think it will become evident in due course that my definition of Purpose is recursive also; and the balance of this article may suggest that it has, in fact, been generated by my use of the machine. For the moment, though, I propose to adopt the earlier position, and to argue that the machine behavior shares some very fundamental
characteristics with what we normally regard as art-making behavior. Let us now look at the computer itself, and then examine what some of these characteristics may be. There is an increasing diversity in computer design today. At one end of the spectrum machines are getting smaller, at the other end they are getting much, much bigger; at both ends they are becoming much faster. Yet it remains reasonable to talk about 'the machine' because, big or small, fast or slow, all computers do much the same things, and consist, diagrammatically at least, of the same parts. Part of it, usually called the Input/Output Unit, lakes care of its communication with the world outside itself. Part, as you probably know, is used for storage – it is the computer’s ‘memory.’
like alphabetic characters, colors or instructions. The computer is a general-purpose symbol-manipulating machine, and it is capable of dealing with any problem which can be given a symbolic representation. If its accelerating use in our society rests upon its remarkable versatility, then its versatility rests in part upon the fact that a very large number of problems much larger than you might suspect - do indeed lend themselves to symbolic, even numerical, representation. The on-off switch might not seem too promising as a device for counting, since it can only record 'zero' - off, or 'one' - on. But a race of creatures with two-hundred and seventy-nine fingers might consider our own ten-positionswitch system pretty limiting also. We still need
Figure 2
The 'operations room' of the whole machine, appropriately enough called the Central Processing Unit (CPU), is concerned with the processing of the stuff the machine handles, and for shifting this stuff around inside the machine. If you think of 'memory' as a very long string of numbered boxes, or cells, then the CPU looks after the business of storing things in the cells, labeling the cells, keeping up an index of where all the labels in use are to be found, retrieving the contents of cells with particular labels, and so on. What are these 'things', this 'stuff’ the machine handles ? There are different ways of answering this question, and their relationship demonstrates one of the most significant features of the computer. Physically speaking, what the machine handles is pulses of electrical current, which are triggered by switches, and in turn trigger other switches. But the configurations into which the switches are set actually represent numbers, and the numbers represent ...well, just about anything that can be represented numerically: quantities, dimensions and values obviously, but also anything which can be given a numerical code,
to add a second switch to get up to 99, a third to get up to 999, and so on. Whatever 'base' you use for counting, how high you can count depends upon how many switches - each with the 'base' number of positions - you put together. When the 'base' is two, you will need large number of switches to get very far, but each of them need only have two positions - on or off: obviously an ideal situation for counting electrically. (Fig. 3.) If you were to take a somewhat less metaphorical look at those little cells in the computer's memory, you would see that each one was in fact a string of switches. Most small modern computers have adopted sixteen as a standard, though not all: and you can figure out that this sixteen-switch cell - or 'sixteen-bit word', to use the jargon - will be able to hold any number up to 216 - I. In a very rough sense, the size of the machine is measured by how many of these words it has in its memory, and its speed by how long it takes to retrieve one. There would probably be between four and thirty-two thousand sixteen-bit words in a small machine: up to a quarter of a million sixtyfour-bit words in a big one. (Fig. 4.)
The Central Processing Unit is responsible for moving these words around, and for performing certain operations upon them. Ingeniously, it knows from the words themselves what it is to do, since several bits of each word are actually reserved for instruction codes. Thus part 'A' of a word might tell the CPU, 'put the number shown in part "B" into memory'; or, 'get the number which is in the cell in memory specified by the number in part "B" '; or, 'add the number in part "B" to the number you are now holding, and put the result back in memory'. A machine might recognize and act upon as many as fifty or sixty such instructions, but in fact most of them will be concatenations of simpler instructions, like 'add', 'subtract,' 'multiply,' 'divide,' 'compare,' 'move this into memory,' 'move this out of memory.' The user sees nothing of all this going on. Sitting in the outside world, the set of instructions he composes for the machine will almost certainly be written in a 'higher level' language, like Fortran or Algol, and before the machine can execute that program of
Figure 3 ‘Binary' counting is illustrated here by hand, using each successive finger in its 'on' or 'off’ positions to count successive powers of two. The total is given in each case by adding the 'on' fingers together.
Figure 4 This memory module taken from the Hewlett Packard 2100 A computer illustrates the development of miniaturization in recent technology. The module holds 8000 sixteen-bit words -128,000 switches in all. The switches are minute doughnut-shaped ferrite 'cores' strung on wires. Courtesy Hewlett Packard
instruction it must first run a program of its own to translate it into its own numerical code. A single line of code - a 'statement' - in any higher-level language will normally break down into a large number of machine instructions, and these are executed electronically, literally by switching electrical currents, with consequent speeds measured in millionths of a second per instruction. Yet the computer's phenomenal speed is probably less significant in accounting for its versatility than the fact that it can break down any user's program into the same instruction set. While the machine is running a user's program it can't do anything else, so that you might say the machine is identified by the program. But it can take on a new identity in the time it takes to clear one program from memory and load a new one, and in a single day a moderately sized computer installation may run a thousand different programs. A thousand different tasks, a thousand 'different' machines. The man-machine relationship I am describing here is a very curious one, and not quite like any other I can think of. Nor is it possible to deal meaningfully with questions relating to what the machine can do except in terms of that relationship. It is true that the machine can do nothing not determined by the user's program; that the program literally gives the machine its identity. But it is true also that once it has been given that identity, it functions as independently and as autonomously as if it had been built to perform that task and no other. Whatever is being done, it is being done by the machine. When we talk of the computer doing something, it is implied that it is doing it, or controlling the doing of it, in the outside world. For the computer this outside world consists of any or all of a large number of special purpose devices to which it may be connected through its Input/Output Unit, varying widely in their functions from typing or punching cards, to monitoring heart beats or controlling flowvalves. Some of these 'peripheral' devices serve the computer in the very direct sense that they provide communication channels to the user, allowing him both to get his program into the machine and receive its response to it. The ubiquitous teletype, and its many more sophisticated modern equivalents, serve both needs: combinations of punched-card reader and line-printer, or paper-tape reader and punch, do the same. Several peripherals function as extra memory for the machine, but then memory simply means storage, and a deck of punched cards, or a punched paper tape, is as much a storage medium as is magnetic tape or the more recently developed magnetic disc. Once a program has been entered via the teletype or the card reader, the computer can permanently record it in any of these media, and reload it from them when required to do so. Obviously, these media can be used also for storing large quantities of information.
Drawing machine (Fig. 5a, b). Assume that the computer has already been loaded with the program by means of which it will be able to interpret my own instructions. (My instructions here will not be phrased in any existing 'higher level' language but in a fictitious one designed make clear what is being done. In fact I will describe programs diagrammatically, by means of what are known as 'flow-charts', rather than in the line-by-line form required by every language.) Let’s see if the machine works:
now that this program has been loaded, I type 'RUN' on the teletype, and the machine responds. . . 3. The program has taken around 1/50,000 of a second to run - the teletype, being mechanical, takes much longer to operate, of course - and we know that the machine can figure out that 1+2=3. Let’s try something a bit more complicated:
Above and right, figure 5a & b
In general, you might say that the computer may receive messages from any device which is capable of putting an electrical voltage on a line, and may control any device which can be switched by a change in voltage generated by the computer. The user today has a host of peripherals at his disposal, covering a wide range of sophisticated abilities: perhaps for that very reason it is important to recognize that the use of more sophisticated peripherals does not necessarily imply more sophisticated use of the computer. If you wanted to make an animated sequence, say of a cube revolving in space, then a television-like device which could display individual frames at the rate of thirty per second would have much to commend it over a mechanical device like a plotter, whose pen only moves at five or six inches per second as it draws the frames one by one. As far as the computer is concerned, however, the task is to generate a series of views of a cube rotating in space, and it will use literally the same program to do so regardless of what device it is addressing. The point would seem obvious enough not to need underlining, were it not that many writers appear to hold the view that the failure of 'computer-art' to achieve images of notable stature can be ascribed to the lack of peripherals appropriate to the artist's needs! Incongruously, the kind of peripheral upon the basis of which some of these writers project rosy futures for 'computer-art' don't relate to new needs, but to old ones. All will be well when the artist can communicate to the computer with a paint-brush. 4 Failure to produce significant images arises from lack of understanding, not from lack of machines. The truth is that it has been, and remains, extremely difficult for any artist to find out what he would need to know, either to use the computer, or even to overcome his certainty
that he couldn't possibly do it for himself. He will almost inevitably find himself confronted by professionals who are more than anxious to help him, but that might be a large part of his problem. ‘What will the machine do ?' he asks. ‘Well,’ he is told, 'it will do A, B, C, or D. You just choose which you want and we will program it for you!’ The specialist is wellintentioned, and it seems unreasonable to blame him if he is less than well-informed about what the artist wants. Surprisingly, he will probably assume the artist to be incapable of learning to program, or at least unwilling to do so. Less surprisingly, he will probably hold the notion that art is principally involved with the production of 'exciting' images, and that he will best serve the artist's needs if he can enable him to produce a large number of widely differing images, all 'exciting.' How would it be to try to write poetry by employing a specialist in rhyme-forms ? Each time you get to the end of a line you call him up to ask what word he thinks would best convey what you have in mind. The process sounds rather more promising than trying to produce art by getting a specialist to write computer programs on your behalf. If we are to get past 'computer-art', as I am sure we shall, to art made with the help of computers, it will need to be on the basis of a massive change of mental-set on the part of the artist. 5 Suppose, now, that I have a computer whose abilities are like those I have described. Suppose also that it is connected to a teletype and to a
this time, when I have loaded the program and type 'RUN', the machine will get the I it has just put in the cell labeled COST, square it, store the result in BOX3, and then print out that result. But then, instead of stopping, it will add I to the l already in COST, and go through the whole cycle again, printing out 4 this time, and then 9,16,25, and so on until it has completed the ten re-iterations called for. This is pretty simplistic, of course, involving a lot of unnecessary PUTting and GETting into and out of memory. If the machine's language were a little more sophisticated, we could have written the program: ·lr
with exactly the same result. Note how powerful a device it is that instead of saying first 'print the square of l ', then 'print the square of 2', then 'print the square of 3', we need only say, 'print the square of whatever is in the cell labeled COST', repeating the same instruction every time. All that changes is the contents of the cell COST. This notion of referring to a number by the name on its cell is fundamental to programming, and in fact it is something we do all the time ourselves. Saying that a carpet is ten feet long and seven feet wide is essentially like saying:
where the cell PEN will hold I as a code for 'pen down' and O as a code for 'pen up'. We might also have generalized a step further, and said:
will produce this curve.
because now we might want to write the sort of reiterative program we looked at earlier, to draw a whole series of points. In writing such a program we will now use a shorter notation for PUT, so that instead of writing PUT 5 in HOZ, we would write HOZ!5. and we could obviously build this into a program for finding the area of the carpet by adding
the important thing here is the level of generality, since the program will now work for whatever values we put in the cells labeled LENGTH and WIDTH. We should be able to get the drawing machine to draw something now. You will probably remember the idea that you can describe the position of any point on a sheet of paper by two distances, or coordinates: how far the point is horizontally from the left hand edge and how far it is vertically from the bottom, Suppose we were to reserve two cells labeled HOZ and VERT for storing the two coordinates for any point to which we wanted the pen to go. If the pen is sitting in the bottom left hand corner, and our program says:
the computer will recognize from the command MOVE that it must send its instructions to the drawing machine, not to the teletype, and will thus send out the commands required to make the pen move to the center of the bed. The only problem with this program is that it didn't specify whether the pen was to be down or up. The program should probably have read:
Figure 7
The thing is that any pair of statements which relate the horizontal coordinate to the vertical in a coherent way will produce some sort of curve, and it's quite easy at this point to start popping in all kinds of trigonometrical functions and stand back to see what happens. This one was written by a passing computer-science student I hesitate to say 'invented', since it is almost entirely a matter of chance whether it will produce anything pretty, which I think it does. Figure 8
Figure 6
Not a very exciting drawing, but it does illustrate a lot of principles. You might be surprised by the statement HOZ!HOZ+!5 but of course this isn't algebra, and it isn't an equation. It means, simply, 'take what was in the cell labeled HOZ, add !5 to it and put it back in the same cell'. The pen has drawn a series often short line segments which in this case make up a straight line: and has then lifted and gone back to the bottom left hand corner. The same general form will draw lines which are not straight, if we can simply think of a way of generating the appropriate pairs of coordinates. For example:
No doubt the introduction of this sort of technique for drawing curves into 'computer art' owes much to the mathematics-oriented programmer, who would tend to view a curve essentially as the graph of a mathematical function. But not all curves can be handled in this somewhat simplistic way, and artists wishing to handle more complex curves have been obliged mostly to use an entirely different approach, if anything even more simplistic. Since it is possible to describe any point by its HOZ-VERT pair, it follows that any drawing can be approximated by a set of points, each of which can be treated in the same way, so that the whole drawing can be described in purely numerical terms. Imagine then that you have already done a drawing, that you have reduced it to a string of points, and that you have typed the HOZ and VERT values of each point together with its PEN code, on a series of punched cards (or, of course, any other storage medium for which the computer has the appropriate peripheral). A program like this
would then simply read the first card to find the first point in the drawing (PEN would presumably be O until it gets here), move the pen to that point, read the next card for the next point, and so on until it has done all the points. The machine has duplicated your drawing from your numerical description.
Figure 9
It may not be clear why anyone would want to use such elaborate means to reproduce a drawing he has already made. The answer is that quite a lot of things can be done to the drawing by suitable programs. Not only can it be reduced, enlarged, shifted, rotated, squashed up, pulled out (Fig. 9): it can also be transformed as if it were drawn on a sheet of rubber which was then stretched in various irregular ways. None of these operations, or transformations as they are called, is difficult to program, and since they can be applied to any set of points whether
generated from mathematical equations or read in from cards, they have tended to become the stock-in-trade of 'computer art'. Indeed, it would be difficult to see how any computer animation involving drawn images could proceed without such transformations. For our purposes, however, the question to be asked is whether the notion of a picture processor, operating upon some previously generated image, corresponds in any useful way to what we know of human art-making behavior. I think the answer has to be that it does not. To achieve that correspondence, the machine would need to generate the image, not merely to process it. Intuitively, it seems obvious that the human process involves characteristics which are quite absent from these procedures, and in particular I think we associate with it an elaborate feedback system between the work and the artist; and dependent upon this system are equally elaborate decision-making procedures for determining subsequent 'moves' in the work. Our enquiry might reasonably proceed by examining whether the machine is capable of simulating these characteristics. Before going on, I must explain that the computer possesses one significant ability which was implied by the earlier examples but never explicitly stated. It is able to compare two things, and on the basis of whether some particular relationship holds between them or not, to proceed to one of two different parts of the program. In practice this primitive decision-making device can be built into logical structures of great complexity, with the alternative paths involving large blocks of program, each containing many such conditional statements, or 'branches', It would be quite difficult to demonstrate a complex example here in any detail. The drawing on the cover of this issue of Studio International was generated by a program of about 500 statements, of which over 50 were concatenated from these simple conditionals, equivalent to about 85 branches. We might look at one part of that program, however, about 50 statements in all, which generates the individual 'freehand' lines in the drawing. Obviously the flow-chart is a muchsimplified representation. The argument behind the sub-program runs like this: in any 'sub-phase' of a line's growth it will be swinging to the left or to the right of its main direction ('straight on' if given by swings=O). This swing may be constant, accelerating or decelerating, and both the rate of swing and the rate of change of swing may be either slow or rapid. Overall, the line must not swing beyond a certain pre-set angle from its main direction. A single full phase will consist of two sub-phases normally swinging in opposite directions. Both of these, and the phase itself, may vary in length, and normally the starting direction for each full phase does not depend on that of the previous one. If the line swings beyond its angular limit, however, all the
Figure 10
factors controlling the current phase are immediately reset and a new full phase is initiated, starting off in the opposite direction. It should be noted also that the line has some definite destination and corrects continuously in order to get to it. The program would look something like this:
this structure does evidently possess a feed back system not unlike the kind we employ in driving a car. There is an overall plan - to reach a destination - which breaks down into a succession of sub-plans, which are in turn responsible for generating a series of single movements. But if an 'emergency' is signaled, the current sub-plan is abandoned, and a new one set up. The quality of the line is directly related to the way in which the factors for each new subphase are reset: if the length of each sub-phase varies enormously, or if the rate of change of swing varies greatly from one to the next, the line will tend to be quite erratic. If the angular limits are set quite small - by the over-all plan -
then the line as a whole will be more ‘controlled.’ How does the program ‘decide’ on new factors for each new sub-phase ? The ranges permissible for each factor are precisely determined in relation to what the range was last time, indicating another level of feedback. Within that range, the machine makes a random choice. There seems to be so much popular misunderstanding about the nature of randomness that a word might be said on the subject before going further. Contrary to popular belief, there is no way of asking the machine to draw 'at random', and if you try to specify what you mean by drawing 'at random' you will quickly see that what you have in mind is a highly organized and consistent behavioral pattern, in which some decisions are unimportant provided they are within a specified range of possibilities. This is characteristic of directed human behavior: if you plan to rent a car, you will probably be concerned that it should be safe, that its size and power will be appropriate to your needs. You probably won't care too much what color it is, and in being prepared to take whatever comes you are making a 'random choice' of color: although you probably know it isn't likely to be iridescent pink, matte black, or chromium plated. The same might be said - though with much narrower limits - of the painter who tells his assistant to 'paint it red'; or indeed the painter who uses dirty brushes to mix his paint. They are all examples of making a random choice within specified (or assumed) limits. In fact the computer generates random numbers between zero and one, which must then be scaled up to limits specified by the user's program. You might consider that, in human terms, these limits will be narrow where precise definition is required, wide where it is not. For the computer, the existence of limiting ranges rather than specified values will result in the possibility of an infinite number of familyrelated images being produced rather than a single image made over and over again. There might be some difficulty in demonstrating the case to be otherwise for the artist. While it would seem obvious that any complex purposeful behavior must make use of feedback systems, there is no suggestion that such systems alone can account adequately for the behavior. Moreover the ability to satisfy some given purpose, as the 'freehand' line generator does in homing on its destination, accounts for only slightly more. The formulation of the purpose is something else: and we would expect to find in human art-making behavior not only a whole spectrum of purpose-fulfilling activities, but also a spectrum of purposeformulating activities. If I am to pursue my enquiry, I must now try to demonstrate the possibility of such a structure occurring in machine behavior, although the strategies employed within the structure may or may not correspond to the strategies the artist might
employ. Certainly no such claim will be made for the program I am about to describe. This program is one of a series in which the principal strategy is devised in relation to an 'environment' which the program sets up for itself. An example would be one in which the program first designs, and then runs, a maze: the resultant drawing being simply the path generated by the machine in performing the second part. In the present program, the environment is a rectangular grid of small cells, into which are distributed sets of digits (Fig 11.) The strategy adopted in the second part involves starting at a 'l' from there seeking to draw a line to a '2', then to a '3' and so on. The digits are considered as a continuous set, '10' being followed by ' l', so that but for three things the program would continue indefinitely. The first is that no digit may be used as a destination more than once, and since a digit is also cancelled if a line goes through its cell, the number of destinations steadily reduces, and the program terminates. The second is that a destination will not be selected if getting to it involves crossing an existing line, so that finding a destination becomes more difficult as the drawing proceeds. And the third is that there are certain 'preferences' operating in choosing between those destinations recognized by the machine as viable. As a consequence of these constraints, the machine will eventually find itself unable to continue to the next digit, and it will then back up to the previous digit on its part and attempt to go on again from there. The drawing will be complete when the back-up procedure has taken it all the way back to the original 'l'. Now it is possible, by manipulating the factors controlling the machine's 'preferences' I will say more about those in a moment - and by appropriately setting various other factors, to produce a very wide range of characteristics in the drawings produced. For example, the number of cells in the grid and the number and
type of distributions of the digits are both critical factors in determining the complexity of the drawing. Under studio conditions I have varied these factors myself, but for a recent exhibition I wrote an executive program which took over that task: and under its control the machine produced almost three hundred drawings during the four weeks it was in the museum. These varied from a few squiggly lines to quite complex drawings, from a single large image to anything up to twelve small ones on a page; and they required no human
Figures 11, 12
Figures 13 a, b, c, d, e
participation beyond changing the paper and refilling and changing pans. 6 (Fig. 13.) What I have described as being controlled by the executive is, in a very general sense, the purpose-formulating mechanism for the 'freehand' line generator, the structure that determines where the lines are to be drawn. You might say that I am the purpose-formulating mechanism for the program as a whole, but the executive program makes my own part in the process rather more remote, if no less significant. In fact, I doubt whether the main program will be changed much at this point, since what is at stake for me is not what it does, but what determines what it does. I am referring to the 'preferences' mentioned earlier. As it reaches each destination, the machine has to choose between anything up to twentyfive next destinations, depending upon the state of the drawing. In the present state of the program, its preference is for destinations within certain distance limits, but it is easy to see how it might 'prefer' long lines to short ones, a destination near the center of the picture, or in highly active parts of the picture: or it might 'prefer' the one involving minimum change of direction; or the reverse of any of these. Obviously the character of the resultant drawings would vary enormously as the machine exercised one 'preference' rather than another, but in fact I am suggesting something more complex than simply switching 'preferences'. Suppose, rather, that the machine exercises its whole range of 'preferences' by scoring each possible destination for its ability to satisfy each preference, and taking the destination with the highest total score as its choice. It might then choose the destination which was relatively far away, didn't involve too much deviation from the current direction, and was in an area of high activity quite close to the center of the drawing. I think this would be a much closer simulation of the way in which human preference-structures are exercised. Let us go one step further, and suppose the machine to be capable of weighting its scores for its different 'preferences', and of modifying these weightings itself. This possibility is by no means speculative: readers familiar with the development of the field of Artificial Intelligence will recognize its similarity to Samuel's now classic program for a checkers-playing machine (1959)- They will recall also that the program enabled the machine to learn to play, by having it play against itself, one part always adopting the best strategy found to date, the other varying the weightings of the 'preferences' which determined that strategy until it found a better one, and so on. In a short time it was able to win consistently against any human player. We might recognize a significant difference between applying a learning program of this sort to successful game playing and doing so to successful art-making. Of course the difficulty is ours, not the machine's: since we ourselves would be in some doubt as
to the nature of the criteria towards the satisfaction of which the machine might aim. Art is not a deterministic game like checkers, to be won or lost by the 'player'; and though we acknowledge, empirically, that some artists are 'better' than others, that some artists do improve, the problem of formulating general criteria for improvement may be no different in relation to the machine than it is for the teacher in relation to the art student. It is probably reasonable to assume that there do exist criteria at levels even more remote from the work than any I mentioned: in which case we should be able to formulate them and the machine should be able to satisfy them. But there remains the suspicion that satisfactory performance in art is not to be measured solely by the satisfaction of explicit criteria, and would still not be so no matter how far back one pushed. As to those explicit criteria: there would seem little reason to deny that the machine behaves purposefully at every level described. Yet no level defines its own purpose. The learning level would - but for the difficulties mentioned above - advise the preference-structure as to the best way of defining the manner in which the executive commands the main program to select the points between which the 'freehand' line generator is to draw lines. One might say even that the purpose of each level is to formulate the purpose for the next level. It is true, of course, that the machine's organization is that way because it has been set up that way, but in considering the nature of explicit criteria in human art-making behavior we might reasonably adopt the machine's organization as a model, and say that these criteria relate to the formulation of new levels of purpose in satisfaction of prior purposes. This can be maintained without any suggestion that the machine can move higher and higher up the ladder until it is finally in possession of the artist's Purpose. On the contrary, it seems to me that pushing back along the chain of command - either for the machine or for oneself - is less like climbing a ladder than it is like trying to find the largest number between zero and one: there is always another midway between the present position and the 'destination'. It should be evident, then, that I do not consider 'serving the artist's Purpose' to be equivalent to 'talking over the artist's Purpose', or identify the machine with the artist. I identify the artist with the whole Purposestructure, the machine with the processes which are defined by the structure and in turn help to redefine it. Since under other circumstances these processes too would be played out by the artist, I am also identifying playing-out with the computer with playing-out without the computer. For the machine to serve his Purpose the artist will need to use it as he uses himself. There is no reason to anticipate that the use will be more or less trivial than the use he makes of himself, but every reason to suppose that the
structure will change in ways which are presently indefinable. The step by step account of the computer's functions and its programs was intended, of course, to try to demonstrate that the machine can be used in this way. The original question whether the machine can serve the artist's Purpose - is more redundant than unanswerable, and is in any case not to be confused with asking whether artists might see a need to use it. It is characteristic of our culture both that we search out things to satisfy current needs, and also that we restate our needs in terms of the new things we have found. Nor is it necessarily immediately clear what wide cultural needs those things might eventually serve. The notion of universal literacy did not follow immediately upon the development of moveable type, but it did follow that development, not demand it. Up to this point the computer has existed for the artist only as a somewhat frightening, but essentially trivial toy. When it becomes clear to him that the computer is, in fact, an abstract machine of great power, a general purpose tool capable of delimiting his mind as other machines delimit him physically, then its use will be inevitable. (Photos for this article by Becky Cohen.) 1. I wish I had more space here to develop and justify what may seem to be extravagant views. Readers wishing to pursue the issue for themselves will find these views to be almost timid compared to the current rate of growth and technological development within the industry. There is extravagance indeed! Of an estimated 80,000 computers now operating in the US alone, 13,000 were installed in 1972 by a single manufacturer. Spending on small computers is projected by a leading magazine to rise to $600,000,000 a year in the US by 19752. But not necessarily for other times and other cultures. 3. Recursion is a powerful mathematical concept which is difficult to describe in non-mathematical terms: indeed, the above examples are as good as any I have been able to find. If you think of a mathematical function as being a structure which operates upon something provided to it, then a recursive function is one which provides itself with the 'something' by its previous operation. Since the 'something' will be different for each operation, this is not to be confused with circular structures: e.g. art is something produced by an artist, an artist is someone who makes art. Also, the idea of the boy holding the bag of popcorn on which there is a picture of the boy holding the bag of popcorn on which. . . actually represents a hierarchical structure rather than a recursive one. 4. I am not making this up. See 'Idols of Computer Art’ by Robert E. Meuller, Art In America, May, 1972: and my own reply in 'Commentary' in the following issue. 5. Under grant number A72-1-288 from the National Endowment for the Arts, Washington, DC I am currently investigating the feasibility of setting up a Center for Advanced Computing in the Arts. One might speculate that, among other things, such a center might enable artists to use the machine for their own purposes, rather than presenting them with a cookery book of possibilities. 6. 'Machine Generated Images', La Jolla Museum, California. October-November 1972. The drawings reproduced here are taken from the show. The machine was able to make drawings in several colors, but the museum staff had some difficulty in following its instructions for mounting the appropriate pens. In the event, it was limited to asking for the correct size pen to be mounted.
THE MATERIAL OF SYMBOLS ****************************
First. Annual Symposium on Symbols and Symbol Processes
UNIVERSITY OF NEVADA LAS VEGAS
Harold Cohen Center for Art/Science Studies UNIVERSITY OF CALIFORNIA AT SAN DIEGO August 1976
Karin and Sherry are seven year-old twin sisters. They are both in the habit of carrying large bags of colored pens and pencils with them wherever they go, and at every possible opportunity they sit down on the floor and start to make drawings. Those illustrated her (1,2) are entirely typical of their work. Their output is certainly well above average in quantity, but the drawings themselves are in no major respect atypical of the sort of drawings which most Western children might make at some period of their development. On the same afternoon that these drawings were made, I proposed a game to them: I would cover sheets of paper with dots, and they would make their drawings by joining up the dots (3»~)- They both took to the game with obvious enjoyment, but also with an unexpected attention to the structural constraints imposed upon their performance by these new rules, which they promptly investigated. One of them wanted to know whether it was permitted to leave some of the dots unused. The other asked whether she was allowed to use the dots as eyes, if she was drawing a face, and in due course she contrived to use the dots also as Christmas tree decorations, snowflakes, sunbeams, and a number of other unspecified objects which she said were falling from the sky (3). In both cases — and I do not believe that this is part of their normal procedure — each drawing was followed by a long verbal account of the subject matter. After an hour or so the game ended, and they returned to their habitual mode. There are a number of formal differences in the drawings which result from the two modes which might be dealt with at some length. Their normal practice, for example, is to use the plane of the paper to represent some sort of coherent spatial unity, corresponding very roughly to what we might call a "view". In some of the dotted drawings this practice gives way to a more elemental approach, in which the plane of the paper is used in a manner largely neutral with respect to the images, and the images are disposed upon it without regard to any concept of "natural" ordering in the real world. (By the way, these results are quite consistent with the results of a more extensive set of similar experiments with a drawing class at UC San Diego.
page 1
fig 2
fig 1
fig 3 fig 4
page 2
The students there were up to forty years older than Karin and Sherry, and their habitual modes involved different conventions to those of the children; but they were certainly no less conventional.) But the more immediately noticeable differences between the two modes relate less to the formal aspects of the drawings than to the level of imagination and inventiveness wich Sherry and Karin exercise in making them. When Karin decides to sign her name in a manner appropriate to the game (4), she is making a witty comment about the nature of drawing at a level of insightfulness we might not expect from a seven year-old. If we compare the bird in one of her dot-drawings (5) with the drawing of a duck made just half an hour earlier, we are struck by the fact that she is evidently capable of rather acute observation, although it required the setting up of unfamiliar, and presumably challenging, circumstances to allow her to exercise that capability. What becomes clear, in fact, is that there is a significant difference between an image of a bird, and an image of an image of a bird. The earlier drawing is less a duck than it is a toy duck, less the result of observing what the real world is like than it is the result of learning what drawings — of the world — are supposed to look like. It is conventional in the precise sense that its conventions are the common property of that sub-culture we call children, where their stability is maintained both by the children's desire to conform and by the adult desire that they should.
*****************************
fig 5
I incline strongly to the view that we all spend our lives -- not merely our childhood -- trying to effect an acceptable and workable compromise between the internal demands for the satisfaction of our individual psychic needs, and the demands made upon us by the culture within which we live, for the sake of the stability, if not necessarily the ultimate well-being, of the culture itself. This is not to say that the things we do, like drawings, singing, talking, page 3
do not grow from the most fundamental patterns of the mind but that the cultural rules imposed for their exercise may lead to behavioral patterns quite at variance with these deeper ones.
fig 6
Most children are able to build their early images without difficulty with marks which result directly from simple physical movements, just as the African sculptor has no difficulty satisfying his representational needs with conceptually simple forms requiring simple manufacturing skills (6). The notion of representation which held sway in Europe for nearly five hundred years, on the other hand, requires the student to spend a minimum of three years persuading his eye to see what it is supposed to see, and disciplining his hand to move as it is supposed to move. These movements are arbitrary with respect to the individual, since they have to be determined by events in the world — the random play of light and shade on objects -- which have nothing to do with the way his shoulder and wrist are articulated. The reconciliation which the artist in this tradition is obliged to make is a striking example of the sort of compromise I am referring to. We do not pay for our membership of the culture on a oneday-on, one-day-off basis. All our behavior is acculturated to some degree, and any attempt to isolate a discreet behavioral mode which we might think of as "natural" would be fruitless. Yet we might still find in the underlying structures of behavior aspects which are evidently not fashioned by the constraints of any particular culture, and this would be as close as we might come to a notion of "naturalness". It will be the tracking down of these aspects with which I will be concerned, knowing very well that their separation from other aspects is a theoretical one. Much of our mental activity seems to involve complex schema
page 4
of entities standing for other entities, and we would probably agree that the externalizing and manipulation of images, as such, grows directly from basic mind functions. But that area of symbol-manipulation which is directed towards communication between individuals and between groups must obviously involve highly acculturated performance. For a symbolic structure to stand any reasonable chance of being unambiguously understood, its maker must both have clear knowledge of the expectations which the reader will bring to its reading, and be prepared to accept the constraints imposed by those expectations. Communication is possible within a culture only because of existing agreements as to what entity is to stand for what entity, and how it is to be presented to be recognized as doing so. At an even more basic level, this implies also that all the involved parties know about the same entities: which may be true, more or less, within the same culture, but is unlikely to be true from one culture to another. These would seem not to be very promising conditions for the exercise of imagination, inventiveness, and all those other virtues we associate with the making of art, or, indeed, for our understanding of art produced by any culture other than our own. But I think we have to conclude that art never has been devoted primarily to the cultural function of communication, and indeed it may never have been thought that it did before our own time. The more historic view within our own culture pictures the artist in communion with variously-conceived extra-human sources of inspiration and wisdom,, explicitly acknowledging the fact that if he speaks on behalf of the community, he does not speak with its voice or in terms which will necessarily be understood. Art history deals with the problem of tracking and identifying the transformations which continuously modify the significance of symbols within the changing cultural continuum. But there are other problems of a more fundamental kind which fall outside the scope of orthodox iconology. Any art theory which begins with a view of the artist as serving primarily the cultural need to formulate and transmit explicit meanings inevitably ends up viewing the whole system as a sort of noisy telephone network, in which
page 5
the receiver strives constantly to reconstruct the original message. Yet the cultural mismatch between artist and viewer must then be a major source of noise in the system, and we account for the discrepancies between what the artist "has in mind" and what the viewer thinks he understands, by the notion of "interpretation". We do not necessarily have any evidence beyond our own "interpretations", however, as to what, if anything, the artist had in mind in e first place. This emphasis upon the specifically cultural use of symbols has left us without any account of the underlying structures of image-generating behavior more convincing than the Divine Muse, and some contemporary variant of that theme usually passes for explanation. I am always a little shocked to recall that it is only about fifty years since Paul Klee declared that it is a sin against the Creative Spirit for the artist to work when not inspired. After nearly thirty years spent in making art, in the company of other artists, I am prepared to declare that the artist has no hot-line to the infinite, and no uniquely delineated mind functions. What he does, he does with the same general-purpose equipment that everybody has, and if his use of it is in any respect unusual, that very fact points to the need for a model of image-generating behavior which concentrates specifically upon behavioral mechanisms rather than upon products. In particular, I believe we will need to adopt a view of the artist as indulging in the generation of what I will call image-rich material as a self-satisfying procedure primarily, and only secondarily involved in the manipulation of culturally stabilized symbols: performing that secondary function, moreover, in a manner more in keeping with the essentially self-seeking character of the primary one.
*****************************
page 6
You will see that I am back to the fundamental dichotomy between the internal psychic, and the external cultural determinants to an individual's behavior. The two are not available for examination in isolation of each other, for the rather obvious reason that human beings live in cultures. As far as image-generating behavior is concerned, however, it seems reasonable to speculate that image-rich material arises from the innately human domain, for the reason that the cultural determinants which act upon the individual tend, by definition, towards conventionalizing; towards the rigid binding of symbol to stabilized meaning. To reflect broader experience more accurately, we have to look, not for symbols which are unambiguously understood within their own culture — however powerfully they may function there — but for material which can flow between cultures, and which is constantly re-used to mesh with new and diverse meanings as it does so. What we think of as our culture is no more than a moment in time, a cross-section of a continuum. All but an infinitesimally small part of all the symbols and symbol-potent material which reaches us comes to us from other points in time and from other more or less remote cultural states. In some cases, what we find ourselves responding to comes from cultures so remote that we simply have to acknowledge that we cannot possibly know what its original significance was. I am thinking particularly of the petroglyphs which are to be found throughout Nevada and California (7). We know nothing to speak of concerning the people who made them or what they made them for, or even how long ago they were made. We cannot seriously pretend even to misunderstand their original significance, and what speculation exists is based upon evidence quite extrinsic to the marks themselves. Yet the generations of anthropologists who have added their speculations to an increasing but unrevealing literature bear witness to the power of the glyphs: the power, not to communicate explicit meanings within the culture within which they arose, but to trigger and direct our own innate propensities for attaching significance to events. To account for the pressure which these marks are capable of exerting over so total a cultural void, would we not
page 7
fig 7
have to assume that their power derives from the essentially human determinants to their making? that it reflects patterns of behavior so deep-rooted in the human organism as to be considered as constant for all human beings regardless of their particular patterns of acculturation? The question would be entirely speculative, not to say gratuitous, if we could proceed only by the analysis of existing examples, for the reason that what is present for analysis is the object, not the behavior which generated it. Any plausible conclusion would be exactly as good as any other plausible conclusion in the absence of any possible verification. I will not claim that my own work offers definitive verification of any conclusion, but I will claim it as an attempt, at least, to deal with behavior rather than with page 8
objects. Analysis of a range of objects, from the Californian petroglyphs at one extreme to my own drawing at the other, has served mainly to suggest a sort of minimum configuration of deep-level behavioral mechanisms, which have then been used as the basis of a computer program capable of generating and dealing with graphic material. In other words, the choice of mechanisms was largely intuitive and arbitrary. I suspected that I would be on reasonably safe ground if I limited myself, at the outset at least, to what I assumed to be perceptual primitives, and I selected three: the ability to differentiate between figure and ground, to differentiate between open forms and closed forms, and to differentiate between insideness and outsideness. Since the choice was arbitrary, I did not think it needed further justification at that stage of a questionable undertaking. Yet I thought that actually justification could be found: both in the fact that young children evidently differentiate between closed forms like circles and triangles, and open forms like crosses, well before they are able to differentiate between circles and triangles: and also because of the persistence, throughout the long human history of mark-making, of motifs like mazes. It seemed to me that much of what we grace with the name "primitive" actually demonstrates a sophisticated awareness of the nature of the perceptual open/closed duality, for the fascination of the maze — the image, I mean, rather than the physical maze — must surely rest on the difficulty of knowing at a single glance whether it is open or closed. The point of the strategy — the building of a computer program — was not to see whether the presence of these behavioral primitives would add a sense of authenticity to the output. It was to see whether the program could generate image-rich material In a controlled context where it would be clear that the effect was not the result of something else. That would certainly not have been the case if I had tried to limit myself to any particular set of behavioral primitives, and I have taken some care to see that I do not influence the running of the program. As it has been designed, it operates without any human assistance
page 9
or intervention. There is no way to interfere with it while it is running, and no convenient way to change its parameters before the start of any drawing. Much more important, it has no data at its lexicon of previously-described forms which it out, run through a variety of transformations, into a picture. As a matter of fact, it has no tions available to it, either.
disposal: no could pull and assemble transforma-
An argument could be made, of course, that the whole program constitutes a process description of its output, although it would then have to be seen as the description common to an endless array of different drawings, since the program never produces the same drawing twice. But the significance of the lack of data is a more complex one. There is no difficulty about writing a computer program which generates drawings endlessly, depending at least upon what "different" is understood to mean. Here the question was whether the degree and kind of differentness would correspond to the variety we might expect from a human
page 10
Fig 9
image-maker. Would the individual drawings, generated in the absence of any knowledge of the world and its objects, nevertheless function as though they were made by a human image-maker, in the sense that they might appear to be making reference to the world and its objects? The answer seems to be affirmative, at least to the degree that most people evidently have some difficulty in believing that the drawings (8-10) were not made by a human artist: an artist, moreover, with a distinct sense of humor and a marked tendency towards narrative. As the prime mover of these drawings — I still have some difficulty regarding myself as their maker in any conventional sense — I find myself in a curious position involving a not-too-serious parody on the notion of divine inspiration. It takes about two weeks after seeing one of the drawings for the first time for me to lose my awareness of it as machine output. I can hardly regard it as my own, because I have no recollection of having participated physically in its making, and it seems to have come to me from
page 11
fig 10
another time and place. We might see this as a comment on the persistence of myths, perhaps. But if romance dies hard, the facts are left to be accounted for. If we find elements in these drawings reminiscent of African masks and comets, figures suggestive of turtles and submarines (10), it is a fact that the elements and figures which evoke those objects were made by the program. It is also a fact that the program knows nothing of African masks, comets, turtles or submarines.
*****************************
Explaining how these effects come about in the absence of any specific intentionality is difficult, primarily because they cannot be identified with the action of individual parts of the program. There is, I mean, nothing like a submarine subroutine. In form, the program is a production
page 12
fig 11
system; and, like other such systems, this one accomplishes two things. It describes the conditions which may arise in the world of the program — in this case the developing drawing — and it lists the acceptable responses to particular combinations of these conditions. The left part of a production tests for the patterns, the particular combinations of conditions which characterize the state of the world at any moment. The right part of a production changes the state of the world, since all the acceptable responses act upon the world directly or indirectly. The new combinations of conditions will then be trapped by other productions; and the process continues, in this event-driven fashion, from the initial empty state until one of several world states elicits the response that the drawing is done. The left part of a production is able to recognize that a form is closed rather than open, just as the right part is able to produce a closed form, or effect closure upon an open one. A complete production might recognize that part
page 13
fig 12
of the field of the drawing is occupied by a closed form with another closed form inside it; and that it is surrounded by similar closed forms, all of which have been shaded in one way or another ( 11 ). And it might respond — for example — by shading the figure, leaving the inner one as a hole in the middle. But references to closure, to space-filling, and to repetition occur throughout the production system in both the left and the right parts. They constitute, not a set of rules so much as a set of protocols, the complex intertwining of which gives the entire program its particular identity. They are best considered as characterizing the program's world rather than as controlling how the program is to behave within that world; as characterizing — if I risk anthropomorphizing a little too far — what the program understands its world to be like.
Space-filling and repetition are two of several protocols which have been added to the program since the outset, most of them simply extending upon the initial ones. I mean that page 14
shading is a way of underlining the closedness of a closed figure, and the program now knows a number of ways in which that can be done (12). A recent extension to the figureground protocol requires the program to respect the territorial integrity of previously drawn figures. This one results in some of the more unpredictable and evocative configurations; though it is never easy, even watching the drawings being done, to keep track of what is causing what. Adding a single new protocol to a program is more like adding a whole new conceptual complex to a human's world model than it is like adding a new behavioral rule, and it should not be surprising that the complexity of the drawings increases rapidly for each added protocol. This seems to suggest that the program structure is appropriate to the requirement of variety which I noted earlier, since it seems unlikely that human output increases in variety only at the cost of extremely large rule-sets. I have not yet had sufficient time working with a reasonably well-developed program to reach detailed conclusions on the nature of that variety, and on how the enmeshing of the different protocols produces it. But it does seem clear that it is the enmeshing, not the individual protocols, which is responsible. Note, for example, that although one drawing may exhibit more sophisticated space filling — shading — abilities than another, it will not have the same evocative force as a "simpler" drawing which exercises both open and closed protocols (cf 10,12). In fact, I think there is evidence to suggest that in the presence of closed forms, open forms take on a distinctly differentiated function, providing a kind of semantic connective tissue for the semantically dominant — more obviously object-like — closed forms. It is certainly the case that the spatial relatedness of the figures significantly affects their individual reading.
****************************
page 15
There is one further aspect to the program, having to do with task-oriented behavior rather than with perceptual behavior, which I should touch on briefly. It controls the way in which the program goes about the actual production, and the physical articulation, of the simulated freehand line from which the drawings are built. I quickly came to the general conclusion, when I first became involved in computing, that human drawings are potentially interesting to human beings at least in large part because they have been made by other human beings; and that for a machine to inspire a similar kind of interest in its products it would have to make its drawings in the same sort of way that humans produce theirs. Of course, everything I have been talking about has been an effort to elucidate what that "same sort of way" might be, but I am thinking now specifically about the lowest-level business of driving a pencil from one place to another. What seemed certain to me, and still does, is that freehand drawing involves an elaborate feedback mechanism, a continuous matching of current state against desired end state and a continuous correction of deviation, essentially like the mechanisms we use to thread a needle, or drink a glass of water, or drive a car. Most of the time the feedback is required -- and the artist can claim no exemptions in this regard — by the unpredictability of the equipment we use, whether that unpredictability is caused by arthritis or worn bearings, lack of muscular coordination or sloppy steering. We do not optimize in freehand drawing, and it never seemed to me that the dynamic qualities of drawing would be captured by spline interpolations. Indeed, it never seemed to me that those qualities would be reproducible by trying to mimic appearance at all. Imagine the problem of driving your car off a main road, where you are facing in one direction, into a narrow driveway at an arbitrary angle to it. Unless you would proceed by planning your whole course in advance and then closing your eyes and stepping on the gas, you will probably be doing very much what the program does. Given the task of getting from one place, facing in one direction, to another place and facing in another direction, it never knows how
page 16
to accomplish the entire task, but "imagines" a series of temporary destinations, each of which will bring it a little closer to approaching its goal from the specified direction (13)' A degree of randomizing is provided as an
fig 13 analogue for arthritic joints, and as it never had any precisely defined path to follow anyway it corrects for accumulating discrepancies only when they become big enough to jeopardize its chances of ever reaching its final destination. It never knows in advance what will constitute a complete path, and it never fails to complete its path. This part of the program is non-trivial, and certainly not optimal, involving as it does a complex series of decisions for every one of the small line segments which go into the building of a line. But I believe the simulation is a good one, and I have found it possible, moreover, to modify the character of the line — the artist's "handwriting" — by the manipulation of such thoroughly practical factors as the rate at which sampling is done, the suddenness with which correction is applied, and the frequency with which the program sets up new "imagined" destinations along its path.
****************************
page 17
It seems to me that most of the things one might say about image-building might be said equally about image-reading. The reason for this, I think, is that the element common to both — the propensity for attaching significance to events, for endowing entities with identities — is also an overwhelmingly important one. It is not the unique property of artists, obviously. This is not to say that the identity which the viewer attaches to a complex of marks is exclusively a function of the viewer's propensity, or even that any complex of marks would serve equally well to trigger that propensity. The natural world is full of complex forms, and if we sometimes play with them -- clouds, for example — we are well aware that their "meanings" are our own invention. Marks which we recognize as being man-made, on the other hand, -- and in particular those man-made marks which we see as arising from an intent on the part of the maker to communicate -these we treat in a special way, not merely assigning significance to them but insisting that that significance has been carried by the marks themselves. I believe that in searching images for evidence of their origins the mind is surprisingly literalistic. If a machine program is able to produce image-rich material, it does so by virtue of persuading the viewer that the maker was a human being living in a human world, and that his intent was to communicate something about that world. The assumption of intentionality precedes the "reconstruction" of intent. In this case the simulated perceptual mechanisms give evidence of the underlying humanness of the drawing's manufacture and the drawer's world — though perhaps any other set of reasonably low-level mechanisms would have served equally well — and the constant complex decision-making which actually takes place, and which is clearly evident in the articulation of the line, confirms the viewer's belief in the artist's intentionality. This conclusion is not adequate to account for the more highly particularized readings which seem to attach to the drawings — notably the humor and the sense of narrative —
page 18
and I do not know at this stage how they are to be accounted for.
**************************** **************************** ****************************
It became evident from the questions and the private discussions which followed this paper that my use of the label "protocol" had done more to confuse than to elucidate the conceptual unit to which I had applied it. Reviewing that usage, it becomes apparent that my understanding of what the program is doing — what roles the different elements in its structure play — has shifted with time, and I have been careless enough to carry over to a slowly emerging construct a term inappropriate to it, but unfortunately still more or less appropriate to something else. The underlying confusion has been my own, of course, and I am glad to have been presented with this opportunity to try to resolve it. Given the choice between rewriting the paper and extending it with a post-scripted commentary, I have chosen the latter course. This gives me the chance also to deal in a more measured fashion with one particular question which is evidently quite troubling to a good many people.
***************************
page 19
I suggest above that what I call a protocol is best regarded as characterizing what the program understands its world to be like, not as a rule which controls how it is to behave in that world. A rule is expressed within the program by a production. A protocol is not fully expressed by a production. I would not want to change any of this — except the use of the word "protocol" itself — but the problem is that nothing has been said about the structure, or what we might call the dynamics, of the characterization. In the absence of any overview clearly differentiated from the rule-oriented schema to which the characterization must obviously relate, the mere assertion that a protocol is not simply a rule is hardly sufficient to expunge the sense that it is. So be it: let me return "protocol" to the rule-oriented domain whence it came. In its place, and hopefully more fully expressive of the conceptual complex it is meant to carry, I will use the term "epimorph". An epimorph characterizes what the program understands its world to be like, and the machine draws in a human, or quasi-human, fashion because its set of epimorphs are closely modeled on human epimorphs. We might go as far as to say that it exercises a subset of human epimorphs. In dealing with the dynamics of the characterization process, then — and thus in attempting to elucidate what an epimorph is — it may prove more revealing to consider an example of human, rather than machine, performance. Here is one taken from the drawing class mentioned earlier. A brief background account is in order. Two weeks into a deliberately dislocative class — people bring such rigidly formulated notions about drawing to a beginning class! — one struggling student volunteered the view that drawing was, as far as he could tell, "just a question of getting from one point to another". Always happy to take what is offered, I proposed that in that case they might get into the business of drawing more freely if they didn't have to worry about the points. Each of them could provide an array of dots for someone else, who would then only have to figure out how to get from one to
page 20
another. In practice, it required fairly rigorous measures to ensure that these dot arrays did not carry any representational weight of their own to constrain subsequent performance. Eventually we had two sets of drawings, thirty-four in all, pinned up for examination, and before any discussion began I asked the students whether they could write down the rules which they had followed in joining up the dots. They all wrote down the same three rules! — 1. see if you can see an image in the dots, and if so draw a line around it: 2. if you can't see an image, draw closed figures anyway: and, 3. if you can't do 1 or 2, fake it. "Faking it", on questioning, turned out to mean using open structures like short straight lines, zigzags, and so on, as space filling. Examination of the drawings themselves showed that there were several other rules of a more surprising kind operating. Consider that any dot in an array might potentially become the junction of an indeterminate number of lines joining it to any number of other dots. Of the simpler cases, the order-two case denotes a dot on a continuous line, the order-one case marks the end of a line, and the null case is a dot which has not been joined up to anything. Karin's "eyes" would be an example of the null case. Since the drawings all contained between a hundred and two hundred dots, we might guess that there would be considerable variety in the numbers of lines joining at these junctions: in fact, we found only three cases of order-four, and only two cases of order-more-than-four, junctions, in the entire set of thirty-four drawings. Over 99.5 t> of the dots had three lines or less attached to them! ( A similar situation will be observed in the drawings of both Karin and Sherry, figures 3 and 4.) The students were certainly unaware, until it was pointed out to them, that their behavior had been constrained in this way, and were even a little resentful of the suggestion that they had done anything according to rules of any sort. Yet, curiously enough, there were a few cases where "extra" dots had occurred when two lines had been allowed
page 21
to cross, and in all these cases the students concerned reported a strong sense of having done something wrong, broken some powerful though unstated rule. The class as a whole evidently recognized an unstated interdiction against crossing lines also, and unanimously agreed that these "extra" dots should not be counted as order-four junctions. Consistent though this behavioral pattern was, it only required attention to be focused upon it for It to change. The discussion which followed the making of these drawings evidently identified "junctionality" as an issue, and although nothing was said about what might constitute acceptable behavior in relation to this issue, the drawings which followed in subsequent weeks all contained a much richer distribution of order-more-than-four junctions; we discovered also that their use involved increasingly complex, but hardly less consistent, rule-sets than we had found at the beginning. We need not go into detail here on the precise nature of these new rules. The point is that they could all be described by a production-like paradigm involving assessment of the current state of the drawing — in relation to junctionality among other things ~ on the left side, and some action resulting in a change of state — through the manipulation of junctionality among other things — on the right side. The notion of junctionality itself would not be adequately expressed by any one of these productions, however, and it clearly exists on a "higher" heirarchical level than that of the individual productions. It has become one of the issues which the student believes to be significant in relation to the domain of drawing, and thus characterizes what he believes that domain to be like. It is in this sense isomorphic with those other issues of territoriality, openness/closedness, containment and repetition, which I said characterized what the program understood its world to be like. It will be clear from this account that of these, at least openness/closedness is also an active epimorph for the human: but I am sure that more extensive evidence will be found in a wide variety of material, and in domains not limited to drawing activity.
***************************
page 22
One of the questions I was asked — not for the first time, by any means — was: am I proposing that the machine program constitutes a model of human creative behavior? Is it a sort of automated surrogate Harold Cohen? A full answer would go far beyond my present scope — and, indeed, my present abilities -- and would involve all those other troubling philosophical questions which the existence of the computer inevitably raises. A short answer would be that human beings live in a real world, and their internal representations of that world include reference to its objects: the current state of the program knows nothing of the real world or of its objects. Human beings learn from experience: the program begins each new drawing without any memory of previous drawings, and with its production system unmodified by having made them. In these and in other respects the machine's performance is not merely less than, but is unlike, human performance. It should be stressed, however, that these are limitations in the current state of this program, and are not to be regarded as intrinsic to programs in general. Most searching questions about the nature of the machine turn out to be questions about the nature of people, and this one is no exception. Before we could venture a more complete answer we would need to consider what we really mean by creative behavior, for if that is to be judged exclusively in terms of the manifest results of its exercise — we know so-and-so is creative because he makes a great many original images — then clearly the machine is extremely creative. It's drawings are probably as good, as original, as any I ever made myself, and I am hopelessly outclassed by it in terms of productivity. But once we have stripped off these layers of the artist's activity which have to do with marketable objects, with the desire for approval, for fame or for notoriety, with propaganda for this religious belief or that economic system: once, in short, we have stripped off the artist's public and cultural functions, how will we characterize the remaining private, essentially self-serving, functions? What does the artist make images for?
page 23
My own view can be stated briefly and without oversimplifying too far. I believe that the artist is engaged, as everybody else is, in building internal representations of his world, and that his behavior is remarkable in only two major respects. The first — and this seems to me to be a feature common to art-making, science-doing, philosophy, mathematics, and most other higher intellectual pursuits — is that the formulating and continuous reformulating of mental models is carried on as a foreground, and as a highly structured, activity: not as a background activity. The second is that he exhibits a high level of preoccupation with the structure of representation as such. In neither of these respects does he require special mental equipment, and indeed I would assume that the cultural value of his activity, the extraordinary regard in which images are held, rests upon the fundamental normality of the mental functions exercised. I mean that the basic structure of all internal model building is the assignment of associative reference: what we might call the "standing-for-ness" principle. We would not be going too far to regard art as an endless explorative game built around the presumably universally human fact that things can stand for other things. The playing out of this game produces images, normally embodied in objects, which may be valued by the culture for any of a number of reasons. For the artist, it is the playing out of the game, and thus the making of the object — rather than the object itself — which is important. If object-making is the means to an end, the end is not the object — art objects are interesting to the degree that they stand for something outside themselves — but the continuous development of new moves in the game. Externalization is a part of the artist's methodology in the building of internal representations of his world: a world which includes representations as a central feature. We are now in a position to generate a slightly more complete answer to the original question, and I think we will find that the view which the question proposed — that the program is an artificial artist capable of creative behavior — is both more than and less than adequate. The
page 24
program does not develop new game-states: it plays the legal moves in the current game. It says "Let me tell you about my world", but rich though that world may be, the telling does not result in any further enrichment. We thus have no reason to say that the machine has any interest in the one feature I have chosen to regard as fundamental to human art-making — the continuous development of the internal representation of the world. To this degree, it is clearly an inadequate model of human performance: which is not to say that no program could ever provide an adequate one. On the other hand, it does not merely model the playing of legal moves in the game, it actually plays them. To this degree the program is not a model of human performance at all. It carries out a real, and rather extensive, part of the art-making procedure, and its output is in every important respect interchangeable — both culturally and privately — with output which might result from more orthodox art-making procedures. MY world changes as a result of the program telling about it, and in the long term the program changes also. I assume from this that I will go on working on the one program indefinitely, without ever feeling the need to abandon it and start on a completely new one. Some caution is in order. I have reached this point in many conversations to be told "Oh, you mean that the computer is just a tool." The answer to this is that the advent of the electronic computer requires a total rethinking of what tools might be, for if the thermostat and the speed governor are exactly equivalent to biological feedback systems, computer programs are potentially exactly equivalent to intellectual feedback systems. We have a long way to go before we fully comprehend the shift in significance of "tools" capable of the independent exercise of reason. I have said several times that the limitations attaching to this program should not be regarded as fundamental limitations in programs. I do not know what will change, for example, or how they will change, when this program does have some knowledge of the world, and can make decisions about the drawing in terms of that knowledge: or when it can use its memory of past drawings as a determinant in
page 25
building new ones. Prediction is a hazardous game, and I will limit myself here to only one. I do not believe that any program will ever produce art unless it was written by an artist — as the words have been defined by this discussion — and its running serves a vital role for that individual in the changing patterns of his internal model building. The Sci-Fi fantasy of putting an artist's "genius" on tape and flooding the world with his work after his death, or of becoming a great composer in the twentieth century by writing a program to generate Bach: these merely reflect the confusion of art with its objects.
*************************** *************************** ***************************
page 26
What is an Image?
Harold Cohen University of California at San Diego B-027, UCSD, La Jolla, CA 92093
Image-making, and more particularly art-making, are considered as rule-based activities in which certain fundamental rule-sets are bound to low-level cognitive processes. AARON, a computerprogram, models some aspects of image-making behavior through the action of these rules, and generates, in consequence, an extremely large set of highly evocative "freehand" drawings. The program is described, and examples of its output given. The theoretical basis for the formulation of the program is discussed in terms of cultural considerations, particularly with respect to our relationship to the images of remote cultures. An art-museum environment implementation involving a special-purpose drawing device is discussed. Some speculation is offered concerning the function of randomizing in creative behavior, and an account given of the use of randomness in the program. The conclusions offered bear upon the nature of meaning as a function of an image-mediated transaction rather than as a function of intentionality. They propose also that the structure of all drawn images, derives from the nature of visual, cognition.
1. INTRODUCTION human performance simply does not arise. AARON is a computer program designed to model some aspects of human art-making behavior, and to produce as a result "freehand" drawings of a highly evocative kind (figs 1,2). This paper describes the program, and offers in its conclusions a number of propositions concerning the nature of evocation and the nature of the transaction – the making and reading of images - in which evocation occurs. Perhaps unexpectedly – for the program has no access to visual data – some of these conclusions bear upon the nature of visual representation. This may suggest a view of image-making as a broadly referential activity in which various differentiable modes, including what we call visual representation (note 1), share a significant body of common characteristics. in some respects the methodology used in this work relates to the modeling of "expert systems" (note 2), and it does in fact rely heavily upon my own "expert" knowledge of image-making. But in its motivations it cones closer to research in the computer simulation of cognition. This is one area, I believe, in which the investigator has no choice but to model the human prototype. Art is valuable to human beings by virtue of being made by other human beings, and the question of finding more efficient modes than those which characterize
My expertise in the area of image-making rests upon many years of professional activity as an artist — a painter, to be precise (note 3) — and it will be clear that my activities as an artist have continued through my last ten years of work in computer-modeling. The motivation for this work has been the desire to understand more about the nature of art-making processes than the making of art itself allows, for under normal circumstances the artist provides a near-perfect example of an obviously-present, but virtually inaccessible body of knowledge. The work has been informal, and qua psychology lacks methodological rigor. It is to be hoped, however, that the body of highly specialized knowledge brought to bear on an elusive problem will be some compensation. AARON is a knowledge-based program, in which knowledge of image-making is represented in rule form. As I have indicated I have been my own source of specialized knowledge, and I have served also as my own knowledge-engineer. before embarking on a detailed account of the program's workings, I will describe in general terms what sort of program it is, and what it purports to do. First, what it is 'not. It is not an "artists' tool". I mean that it is not interactive, it is not designed to implement key decisions made by
the user, and it does not do transformations upon input data. in short, it is not an instrument, in the sense that most computer applications in the arts, and in music particularly, have identified the machine in essentially instrument-like terms.
loosely-understood sense, aesthetically pleasing, though it does in practice turn out pleasing drawings. It is to permit the examination of a particular property of freehand drawing which I will call, in a deliberately general fashion, standing-forness.
AARON is not a transformation device. There is no input, no data, upon which transformations could be done: in fact it has no data at all which it does not generate for itself in making its drawings. There is no lexicon of shapes, or parts of shapes, to be put together, assembly line fashion, into a complete drawing.
The Photographic "Norm"
It is a complete and functionally independent entity, capable of generating autonomously an endless succession of different drawings (note 4). The program starts each drawing with a clean sheet of paper — no data — and generates everything it needs as it goes along, building up as it proceeds an internal representation of what it is doing, which is then used in determining subsequent developments. It is event driven, but in the special sense that the program itself generates the events which drive it. It is not a learning program, has no archival memory, is quite simple and not particularly clever. It is able to knock off a pretty good drawing — thousands, in fact — but has no critical judgment that would enable it to declare that one of its drawings was "better" than another. That has never been part of the aim. Whether or not it might be possible to demonstrate that the artist moves towards higher goals, and however he might do so through his work, art-making in general lacks clear internal goal-seeking structures. There is no rational way of determining whether a "move" is good or bad the way one might judge a move in a game of chess, and thus no immediately apparent way to exercise critical judgment in a simulation. This lack of internal goal-orientation carries with it a number of difficulties for anyone attempting to model art-making processes: for one thing, evaluation of the model must necessarily be informal. In the case of AARON, however, there has been extensive testing. Before describing the testing procedure it will be necessary to say with more care distinguishing here between the program's goals and my own — what AARON is supposed to do. Task Definition. It is not the intent of the AARON model to turn out drawings which are, in some ill-defined and
One of the aims of this paper is to give clearer definition to what may seem intuitively obvious about standing-for-ness, but even at the outset the "intuitively obvious" will need to be treated with some caution, in particular, we should recognize that unguarded assumptions about the nature of "visual" imagery are almost certain to be colored by the XXth century's deep preoccupation with photography as the "normal" image-making mode. The view that a drawn image is either: 1. representational (concerned with the appearance of things) , or 2. an abstraction (i.e. fundamentally appearance-oriented, but transformed in the interest of other aims) or, 3. abstract (i.e. it doesn’t stand for anything at all), betrays just this pro-photographic filtering, and is a long way from the historical truth. There is a great wealth of imagistic material which fits none of these paradigms, and it is by no means clear even that a photograph carries its load of standing-for-ness by virtue of recording the varying light intensities of a particular view at a particular moment in time. It is for this reason that image-making will be discussed here as the set of modes which contains visual representation as one of its members. It is also why I used the word "evocative" in the first paragraph rather than "meaningful". My domain of enquiry here is not the way in which particular meanings are transmitted through images and how they are changed in the process, but more generally the nature of image-mediated transactions. What would be the minimum condition under which a set of marks may function as an image? This question characterizes economically the scope of the enquiry, and it also says a good deal about how the word "image" is to be used in this paper, though a more complete definition must wait until the end.
Art-making and Image-making. The reader may detect some reluctance to say firmly that this research deals with art-making rather than with image-making, or vice-versa. The two are presented as continuous. Art-making is almost always a highly sophisticated activity involving the interlocking of complex patterns of belief and experience, while in the most general sense of the term image-making appears to be as "natural" as talking. All the same, art-making is a case of image-making, and part of what AARON suggests is that art-making rests upon cognitive processes which are absolutely normal and perfectly common.
functions which normally require an artist to perform them, and thus it requires the whole art-making process to be carried forward as a testing context. The program's output has to be acceptable to a sophisticated audience on the same terms as any other art, implying thereby that it must be seen as original and of high quality, not merely as a pastiche of existing work. A valid testing procedure must therefore contain a sophisticated art-viewing audience, and the informal in situ evaluation of the simulation has been carried out in museum environments: the DOCUMENTA 6 international
figure 3. Evaluation. A simulation program models only a small piece of the action, and it requires a context in which to determine whether it functions as one expects that piece to function. AARON is not an artist. It simply takes over some of the
exhibition in Kassel, Germany, and the prestigious Stedelijk Museum in Amsterdam, the two exhibits together running for almost five months and with a total audience of almost half a million museum-goers, in both of these shows drawings were produced continuously on a Tektronix 4014 display terminal and also with an unconventional hard-copy device ( to be
described later) . A PDP 11/34 ran the program in full view of the gallery visitors (fig 3).
typical cross-section of museum-goers responses.
In addition and at other times the program’s output has been exhibited in a more orthodox mode in museums and galleries in the US and in Europe.
Even without formal evaluation, it might reasonably be claimed that the program provides a convincing simulation of human performance.
These exhibits were not set up as scientific experiments. Nor could they have been without distorting the expectations of the audience, and thus the significance of any results. No formal records were kept of the hundreds of conversations which took place between the artist and members of the audience. This report is therefore essentially narrative, but offered with some confidence. Audience Response. A virtually universal first assumption of the audiences was that the drawings they were watching being made by the machine had actually been made in advance by the "real" artist, and somehow "fed" to the machine. After it had been explained that this was not the case viewers would talk about the machine as if it were a human artist. There appeared to be a general consensus that the machine exhibited a goodnatured and even witty artistic personality, and that its drawings were quite droll (fig 4). Some of the viewers, who knew my work from my pro-computing, European, days claimed that they could "recognize my hand" in the new drawings. This last is particularly interesting, since, while I certainly made use of my own body of knowledge concerning image-making in writing the program, the appearance of my own work never consciously served as a model for what the program was supposed to do. More to the point, while a very small number of people insisted that the drawings were nothing but a bunch of random squiggles, the majority clearly saw them in referential terms. Many would stand for long periods watching, and describing to each other what was being drawn; always in terms of objects in the real world. The drawings seem to be viewed mostly as landscapes inhabited by "creatures", which would be "recognized" as animals, fish, birds and bugs. Occasionally a viewer would "recognize" a landscape, and once the machine's home was identified as San Francisco, since it had just drawn Twin Peaks. It might be correctly anticipated that on those other occasions when drawings have simply been framed and exhibited without any reference to their origins, the question of their origins has never arisen, and they have met with a
The next part of this paper is divided into five sections. In the first, a general description of the production system as a whole is given. The following three sections deal with particular parts of the production system: the MOVEMENT CONTROL part, the PLANNING part, and the part which handles the internal representation of the drawings as they proceed. The second of these, on PLANNING, also gives an account of the theoretical basis for the program. The fifth section has something to say about the function of randomness in the program, and also discusses to what extent it might be thought to parallel the use of randomness in human art-making behavior. The third and final part draws conclusions.
****************
2. THE PROGRAM "AARON"
2.1 THE PRODUCTION SYSTEM. The main program (note 5) has about three hundred productions. Many of these are to be regarded as micro-productions in the sense that each of them handles only a small part — an "action-atom" — of a larger conceptual unit of action. For example, the drawing of a single line, conceptually a single act, actually involves twenty or thirty productions on at least three levels of the system. This finegrain control over the drawing process subscribes both to its generality — most of these action-atoms are invoked by many different situations — and to its flexibility, since it allows a process to be interrupted at any point for further consideration by higherlevel processes. Levels of Organization. The organization of the system is hierarchical, in the sense that the higher levels are responsible for decisions which constrain the domain of action for the lower levels (fig 5). Each level of the system is responsible only for its own domain of decision-making, and there is no conceptual homunculus sitting on the top holding a blueprint and directing the whole operation. No single part knows what the drawing should turn out to be like. There is some practical advantage to this level-wise splitting up of the system, but the program was designed this way primarily for reasons of conceptual clarity, and from a desire to have the program structure itself — as well as the material contained within it — reflect my understanding of what the human image-making process might be like. I believe that the constant shifting of attention to different levels of detail and conceptualization provides this human process with some of its important characteristics. Thus the left part of each production searches for combinations of up to five or six conditions, and each right part may perform an arbitrary number of actions or action-atoms, one of which may involve a jump to another level of the system. "ARTWORK" The topmost level of the system, the ARTWORK level, is responsible for decisions relating to the organization of the drawing as a whole. It decides how to start, makes some preliminary decisions which may later determine when and how it is to finish, and eventually makes that determination. The program currently has no
archival memory, and begins each drawing as if it has never done one before. (One can easily imagine the addition of a higher level designed to model the changes which the human artist deliberately makes in his work from one piece to the next; this level would presumably be called EXHIBITION.) ARTWORK also handles some of the more important aspects of spatial distribution. It is my belief that the power of an image to convince us that it is a representation of some feature of the visual world rests in large part upon the image's fine-grain structure: the degree to which it seems to reflect patterns in the changes of information density across the field of vision which the cognitive processes themselves impose upon visual experience. Put crudely, this means, for example, that a decision on the part of the reader of an image that one set of marks is a detail of another set of marks rather than standing autonomously, is largely a function of such issues as relative scale and proximity. This function is quite apart from the more obviously affective issue of shape ( and hence "semantic") relationship, it is the overall control of the varying density of information in the drawing, rather than the control of inter-figural relationships, which is handled by ARTWORK. "MAPPING" and "PLANNING" All problems involving the finding and allocation of space for the making of individual elements in the drawing is handled by MAPPING, though its functions are not always hierarchically higher than those of PLANNING, which is responsible for the development of these individual figures. Sometimes PLANNING may decide on a figure and ask MAPPING to provide space, while at other times MAPPING may announce the existence of a space and then PLANNING will decide what to do on the basis of its availability. Sometimes, indeed, MAPPING may override a PLANNING decision by announcing that an appropriate space is not available. A good example of this occurs when PLANNING decides to do something inside an existing closed figure and MAPPING rules that there isn't enough room, or that what there is the wrong shape. MAPPING will be referred to again in relation to the data-structures which constitute the program's internal representation of what it is doing, and PLANNING also as one of the centrally important parts of the program.
LINES AND SECTORS Below this level the hierarchical structure of the system is fairly straightforward. Each figure is the result of (potentially) several developments, each provided by a return of control to PLANNING. Each of these developments may consist of several lines, and for each of the successive lines of each development of any figure LINES must generate a starting point and an ending point, each having a direction associated with it (fig 6). It also generates a number of parameters on the basis of specifications drawn up in PLANNING which
determine how the line is to be drawn: whether reasonably straight, wiggly, or strongly curved, and, if various overlapping modes are called for (fig 7), how they are to be handled. As I have indicated, lines are not drawn as the result of a single production. When LINES passes control to SECTORS the program does not know exactly where the line will go, since the constraint that it must start and end facing specified directions does not specify a path: there are an indeterminate number of paths which would satisfy the constraint. The program does not choose one, it generates one. SECTORS produces a series of "imagined" partial destinations — signposts, as it were (fig 8) — each designed to bring the line closer to its final end-state. On setting up each of these signposts it passes control to CURVES, whose function is to generate a series of movements of the pen which will make it veer towards, rather than actually to reach, the current signpost. Control is passed back to SECTORS when the pen has gone far enough towards the current signpost that it is time to look further ahead, and it is passed back to LINES when the current line has been completed and a new one is demanded by the development still in progress. 2.2 MOVEMENT CONTROL We are now down to the lowest level of the program, and to the way in which the curves which make up the drawing are actually generated. This part is not discontinuous from the rest, of course. The flexibility of the program rests in large part upon the fact that the hierarchy of control extends downwards to the finest-grained decisions: no part of the control structure is considered to be so
automatic that it should fall below the interface line. Thus, the story of how the pen gets moved around follows on from the description of how the intermediate signposts are set up.
set at some known arbitrary angle to it. This is clearly not a dead-reckoning task for the human driver, but one which involves continuous feedback and a successive-approximation strategy.
Abstract Displays and Real Devices
It seemed quite reasonable, therefore, to be faced at some point with the problem of constructing an actual vehicle which would carry a real pen and make real drawings on real sheets of paper. That situation arose in the Fall of '76 when I was preparing to do the museum exhibitions which I mentioned earlier, and decided that if I wanted to make the drawing process visible to a large number of people simultaneously, I would need to use something a good deal bigger than the usual graphic display with its 20-inch screen.
In the earlier versions of the program all the development work was done exclusively on a graphic display, and the "pen" was handled as an abstract, dimensionless entity without real-world constraints upon its movements. Conceptually, however, I always thought of the problem of moving the pen from point A facing direction alpha, to point B facing direction beta, as being rather like the task of driving a car off a main road into a narrow driveway
The Turtle. The answer turned out to be a small two-wheeled turtle (fig 9), each of its wheels independently driven by a stepping motor, so that the turtle could be steered by stepping the two motors at appropriate rates. It is thus capable of drawing arcs of circles whose radius depends upon the ratio of the two stepping rates. Since the two wheels can be driven at the same speed in opposite directions, the turtle can be spun around on the spot and headed off in a straight line, so that this kind of device is capable of simulating a conventional x-y plotter. But it seemed entirely unreasonable to have built a device which could be driven like a car and then use it to simulate a plotter. In consequence the pen-driving procedures already in the program were re-written to generate the stepping rates for the motors directly — to stay as close as possible to the human model's performance — rather than calculating these rates as a function of decisions already made. The advantage here was a conceptual one, with some practical bonus in the fact that the turtle does not spend a large part of its time spinning instead of drawing. It also turned out unexpectedly that the generating algorithm simplified enormously, and the quality of the freehand simulation improved noticeably. Feedback. The program does not now seek to any place — in Cartesian terms — but concerns itself exclusively with steering: thus the turtle's Cartesian position at the end of executing a single command is not known in advance. Nor is this calculation necessary when the turtle is operating in the real world. It was not designed as a precision drawing device, and since it cannot perform by dead-reckoning for long without accumulating errors, the principle of feedback operation was extended down into this real-world part of the program, the device makes use of a sonar navigation system (fig 10) by means of which the program keeps track of where it actually is. instead of telling it to "go to x,y" as one would tell a conventional plotter, the program tells it "do the following and then say where you are". A more detailed account of the turtle system, and it's effect upon the simulation of freehand drawing dynamics, is given in Appendix 1.
figure 10.
2.3 "PLANNING" No single level of the program can be described adequately without reference to the other levels with which it interacts: it has already been mentioned, for example, that MAPPING may either precede PLANNING in determining what is to be done next, or it may serve PLANNING by finding a space specified there. Additionally, any development determined in PLANNING may be modified subsequently either as a result of an imminent collision with another figure or because provision exists in the program for "stacking" the current development in order to do something not originally envisaged (fig lla,b). All the same, the drawing is conceived predominantly as an agglomeration of figures, and to that extent PLANNING, which is responsible for the development of individual figures, is of central importance. Behavioral Protocols in Image-Making. Of the entire program, it is also the part least obviously related to the effects which it accomplishes. While the formal results of its actions are clear enough — an action calling for the closure of a shape will cause it to close, for example — it is not at all clear why those actions result in the specifically evocative quality which the viewer experiences.
A rule-by rule account of this effect is not appropriate, because the individual rules do no more than implement conceptual entities — which I will call behavioral protocols — which are the fundamental units from which the program is built. These protocols are never explicitly stated in the program, but their existence is what authorizes the rules. Thus, before describing in detail what is in PLANNING I should give an account of the thinking which proceeded the writing of the program, and try to make clear what I mean by a protocol. Background. It is a matter of fact that by far the greatest part of all the imagery to which we attach the name of "art" comes to us from cultures more or less remote from our own. it is also a matter of fact that within our own culture, and in relation to its recent past, our understanding of imagery rest to a great extent upon prior common understandings, prior cultural agreements, as to what is to stand for what — prior, that is to say, to the viewing of any particular image. It is unlikely that a Renaissance depletion of the Crucifixion ("of Christ" being understood here by means of just such an agreement !) would carry any great
weight of meaning if we were not already familiar both with the story of Christ and with the established conventions for dealing with the various parts of the story, indeed, we might be quite confused to find a depletion of a beardless, curly-headed youth on the cross unless we happened to possess the non-obtuse knowledge that Christ was depicted that way — attaching a new set of meanings to the old convention for the representation of Dionysus — until well into the 7th century, in general, we are no longer party to the agreements which make this form acceptable and understandable. We must evidently distinguish between what is understandable without abstruse knowledge — we can, indeed, recognize the figure on the cross as a figure — and what is understandable only by virtue of such knowledge. In the most general sense, all cultural conditions are remote from us, and differ only in the degree of their remoteness. We cannot really comprehend why the Egyptians made sphinxes, what Michelangelo thought the ancient world had in common with Christianity, or how the internal combustion engine was viewed by the Italian Futurists seventy years ago who wanted to tear down the museums in its honor. What abstruse knowledge we can gain by reading Michelangelo's writings, or the Futurist Manifesto, does not place us into the cultural
continue
figure lla,b
environment in which the work is embedded. A culture is a continuum, not a static event: its understandings and meanings shift constantly, and their survival may appear without close scrutiny to be largely arbitrary, in the extreme case, we find ourselves surrounded by the work of earlier peoples so utterly remote from us that we cannot pretend to know anything about the people themselves, much less about the meanings and purposes of their surviving images.
mediated transactions, not an abnormal condition. It evidently extends below the level at which we can recognize the figure, but not what the figure stands for, since so much of the available imagery is not in any very obvious sense "representational" at all. The paradox is enacted every time we look at a few marks on a scrap of paper and proclaim them to be a face, when we know perfectly well that they are nothing of the sort. Cognitive Bases for Image Structure.
The Paradox of insistent Meaningfulness. There is an implicit paradox in the fact that we persist in regarding as meaningful — not on the basis of careful and scholarly detective work, but on a more directly confrontational basis — images whose original meanings we cannot possibly know, including many that bear no explicitly visual resemblance to the things in the world. Presumably this state of affairs arises in part from a fundamental cultural egocentrism — what, we ask, would we have intended by this image and the act of making it? — which is fundamentally distortive. There has also been a particular confusion in this century through the widespread acceptance of what we might call the telecommunications model of our transactions through imagery, particularly since in applying that model no differentiation has been observed between the culture we live in and the cultures of the remote past. In the view of this model, original meanings have been encoded in the image, and the appearance of the image in the world effects the transmission of the meanings. Allowing for noise in the system — the inevitability of which gives rise to the notion, in art theory, of "interpretation" — the reception and decoding of the image makes the original meanings available. However useful the model is as a basis for examining real telecommunication-like situations, in which the intended meanings and their transformations can be known and tracked, it provides a general account of our transactions through images which is quite inadequate. The encoding and decoding of messages requires access to the same code-book by both the image-maker and the image-reader, and that code-book is precisely what is not carried across from one culture to another. I think it is clear also that the paradox of insistent meaningfulness, as we might call it, constitutes the normal condition of image-
In short, my tentative hypothesis in starting work on AARON was that all image-making and all image-reading is mediated by cognitive processes of a rather low-level kind, presumably processes by means of which we are able to cope also with the real world. In the absence of cannon cultural agreements these cognitive processes would still unite imagemaker and image-viewer in a single transaction. On this level — but not on the more complex culture-bound level of specific iconological intentionality — the viewer's egocentricity might be justified, since he could correctly identify cognitive processes of a familiar kind in the making of the image. But let me detail this position with some care. I am not proposing that these processes make it possible for us to understand the intended meanings of some remotely-generated image: I am proposing that the intended meanings of the maker play only a relatively small part in the sense of meaningfulness. That sense of meaningfulness is generated for us by the structure of the image rather than by its content. I hope I may be excused for dealing in so abbreviated a fashion with issues which are a good deal less than self-evident. The notion of non-enculturated behavior — and that notion lurks behind the last few paragraphs, obviously — is a suspect one, since all human behavior is enculturated to some degree: but my purpose was not to say what part of human behavior is dependent upon enculturating processes and what is not. It was simply to identify some of the determinants to a general image-structure which could be seen to be common to a wide range of enculturating patterns. The implication seemed strong — and still does — that the minimum condition for generating a sense of meaningfulness did not need to include the assumption of an intent to communicate: that the exercise of an appropriate set of these cognitive processes would itself be sufficient to generate a sense of meaningfulness.
Cognitive Skills. The task then was to define a suitable set. I have no doubt that the options are wide, and that my own choices are not exclusive. I chose at the outset to include: 1. the ability to differentiate between figure and ground, 2. the ability to differentiate between open and closed forms, and
12), rather than the agglomeration of different actions. The first productions to deal with the first development of any figure decide, on the basis of frequency considerations, that this figure will be closed, that it will be open, or that it will be, for the moment, "uncommitted" — that is, a line or a complex of lines will be drawn, but only at a later stage will it be decided tether or not to close. If the primary decision is for closure,
3. the ability to differentiate between insideness and outsideness (note 6) . AARON has developed a good deal from that starting point, and some of its current abilities clearly reflect highly enculturated patterns of behavior. For example, the program is now able to shade figures in a mode distinctly linked to Renaissance and postRenaissance representational modes: other cultures have not concerned themselves with the fall of light on the surfaces of objects in the same way. Nevertheless, a large part of the program is involved still in demonstrating its awareness of the more primitive differentiations. Protocols and Rules. Against this background, I use the term protocol to mean the procedural instantiation of a formal awareness. This is clearly a definition which rests upon cognitive, rather than perceptual, modes, since it involves the awareness of relational structures. Thus, for example, the program's ability to differentiate between form and ground makes possible an awareness of the spatial relationships between forms, and generates finally a set of avoidance protocols, the function of which is to prohibit the program from ignoring the existence of one figure in drawing another one. The protocols themselves are not explicitly present in the program, and are manifested only through their enactment by the rules which describe what to do in particular circumstances where the overlapping of figures is threatened. Figure Development in keeping with the hierarchical structuring which informs the program as a whole, PLANNING considers a figure to be the result of a number of developments, each determined in part by what has gone before. The program enacts a number of repetition protocols, and a single development in the making of a figure can often involve the repetition of a single action (fig
figure 12. then PLANNING will decide between a number of options, mostly having to do with size and shape — MAPPING permitting — and with configuration, in sane cases it will not actually draw the boundary of a closed form at all, and will leave the definition of the occupied space to await subsequent spacefilling moves. If the decision is for a non-closed form, then again a number of options are open. In both cases the available options are stated largely in terms of repetition protocols, the enactment of which determines the formal characteristics of the resulting configuration. These characteristics are not uniquely defining, however, and a number of different formal subgroups may result from a single repetition protocol and its rules. For example, one such protocol, involving a single line in this case, requires the line to move a given distance (more-or-less) and then change direction, continuing this cycle a given number of times. All the figures marked in (fig 13) result from this: the details of implementation in the individual cases are responsive to their unique environmental conditions, and in any case may be changed at any point by the overriding avoidance protocol, which guarantees the territorial integrity of existing figures.
Thus the program will know at the beginning of each development what the current intention is, but will not know what shape will result. A closed form generated by a "go, turn, repeat" cycle may in fact turn out to be extremely long and narrow (fig 14), and a number of second developments associated with a closed-form first development will then be unavailable: there will be a limit, for example, upon what can be drawn inside it, though it may develop in other ways, as this one does.
Proliferation. Even with constraints of this sort there is a significant proliferation in the number of productions associated with the second development of any figure. A typical first development might be initiated by: figure 13. If (this is a first development and the last figure was open and at least n figures have been done and at least q of them were open and at least t units of space are now available) Then This figure will be closed specifications for repetition specifications for configuration to move on from this point: If (this is a second development and and a. b. c. d.
the first was closed its properties were (size) (proportions) (complexity) (proximity to ...)
Then either 1. divide it ... specifications ... or 2. shade it ... specifications... or 3. add a closed form to it ... specifications... 4. do a closed form inside ... specifications... or 5. do an open form inside ... specifications...
f
figure 15.
This is a prototype for an expanding class of productions, each responding to a different combination of properties in the first development. Similarly, continuation will require... If (this is a third development and the first was a closed form ... properties... and the second was a closed form .. . properties. . . ) Then shade the entire figure: specification 1: a boulder with a hole in it or specification 2: a flat shape with a hole or specification 3: a penumbra. If (this is a third development and the first was closed and the second was a series of parallel lines inside it ... and the remaining inside space is at least s... ) Then do another series of lines: specification 1: perpendicular to first... or specification 2: alongside the first... or specification 3: do a closed form in available space..
(note 7). The Relationship of Forms.
Closed
Forms
and
Open
The same proliferation of options occurs for open-lined structures also, but not to the same degree. One of the interesting things to come out of this program is the fact that open-line structures appear to function quite differently when they are alone in an image than when they appear in the presence of closed forms. There seems to be no doubt that closed forms exert a special authority in an image — perhaps because they appear to refer to objects — and in their presence open-lined structures which in other circumstances might exert similar pressure on the viewer are relegated to a sort of spatial connective-tissue function. A similar context-dependency is manifested when material is presented inside a closed form (fig 15): it is "adopted", and becomes either a detail of the form, or markings upon it. This seems to depend upon particular configurational issues, and especially the scale relationship between the "parent" form and the newly introduced material. This manifestation is important, I believe, in understanding why we are able to recognize as "faces" so wide a range of closed forms with an equally wide range of internal markings following only a very loose distribution specification.
Limits on Development. At the present time no figure in the program goes beyond three developments, and few go that far, for a number of reasons, in the first place, most of the (formal) behavior patterns in the program were initially intended to model a quite primitive level of cognitive performance, and for most of these a single development is actually adequate. Once a zigzag line has been generated, repetition, for example —as it is found in existing primitive models — seems limited to those shown in (fig 16). It has remained quite difficult to come up with new material general enough for the purposes of the program. It is the generality of the protocols which guarantees the generality of the whole, and new material is initiated by the introduction of new protocols. On the level of the procedures which carry out the action parts of the subsequently-developed productions, the approach has been to avoid accumulation of special routines to do special things. There is only one single procedure adapting the protocols of repetition and reversal to the generation of a range of zigzag-like forms, for example (fig 13). But there has been another, and equally significant reason, for the limitation upon permissible developments. It is the lack of
adequate, and adequately important, differentiations in the existing figures. For the primitive model represented by the earlier states of the program it was almost enough to have a set of abilities called up by the most perfunctory consideration of the current state of the drawing: the stress was on the definition of a suitable set of abilities (as represented by the right-hand parts of the productions), and as it turned out it was quite difficult to exercise those abilities without generating moderately interesting results. But for a more sophisticated model it is clearly not enough merely to extend that set of abilities, and the problem of determining why the program should do this rather than that becomes more pressing. The limitation here can be considered in two ways. One is that I had reached the point of exhausting temporarily my own insights into the image-building process. The other is that I had not made provision in the first versions of the program for being able to recognize the kind of differentiations I would want to deal with — since I could not know at the outset what they were going too be — and thus lacked a structure for developing new insights. This leads to a consideration of my next topic: how the program builds its own representation of what it has done up to any point in the making of the drawing.
figure 16.
2.4 INTERNAL REPRESENTATION
In the earlier stages of the development of the program, provision had been made for progressive access to the information stored in the data-structure, following the principal that it should not have to access more than it actually needed for the making of any particular decision, in practice, a great deal more was stored than was ever accessed. At the first level of detail the program made use of a quite coarse matrix representation, in each cell of which was stored an identifier for the figure which occupied it, and a number of codes which designated the various events which might have occurred in it: a line belonging to a closed form, a line belonging to an open form, a line junction, an unused space inside a closed figure, and so on. Obviously, it was not possible to record a great deal in this way, and data concerning the connectivity of the figure in particular required a second level of the structure. This was an unpleasantly elaborate linked-list structure of an orthodox kind. By definition, the kind of drawing AARON makes is not merely a growing, but a continuously-changing, structure. What was a point on a line becomes a node when another line intersects it, and this change has to be recorded by updating the existing structure, which must now ideally show the four paths connecting this node to four adjacent nodes. Both updating this structure and accessing the information contained within it proved to be quite tiresome, and the scheme was never general enough to admit of further development. As a result, it was used less and less, and decision -making has been based almost exclusively on the information contained in the matrix on the one hand, and in a third level of the structure, a simple property-list attaching to each figure, on the other. The most surprising thing about this simplistic and distinctly ad-hoc scheme is that it was actually quite adequate to the needs of the program. Explicit Data and Implicit Data. Human beings presumably get first-order information about a picture by looking at the picture. I have always found it quite frustrating that the program could not do the same thing: not because it made any difference to the program, but because it made it difficult for me to think about the kind of issues I believed to be significant. Part of
the problem of using a linked-list structure to represent the connectivity of a figure, for example, derived from the fact that connectivity had to be explicitly recorded as it happened: it would have been much too difficult to traverse a structure of this kind post-hoc in order to discover facts about connectivity. If one could traverse the figure the way the eye does — loosely speaking! — it would not be necessary to give so much attention to recording explicitly all the data in the world without regard for whether it would ever be looked at again. in short, the primary decision to be made was whether to accept the absolute non-similarity of picture and representation as given, devise a more sophisticated list-structure and drop the matrix representation altogether, or to drop the list-structure and develop the matrix representation to the point where it could be very easily traversed to generate information which was implicit within it. I opted for the latter. A description is included in Appendix 2, though at the time of writing (December '78) the implementation is not yet complete.
2.5 THE FUNCTION OF RANDOMNESS. This section does not deal with any single part of AARON: randomness is an active decisionmaking principle throughout the program, and I think it is important to say why that is the case. As a preface, it might be worth recording that beyond the limits of a mathematically sophisticated community most people evidently view randomness in a thoroughly absolutist fashion, and as the opposite to an equally absolute determinism. There is a firmly-held popular belief that a machine either does exactly what it has been programmed to do, or it acts "randomly". The fact that AARON produces non-random drawings, which its programmer has never seen, has given many people a good deal of trouble. What I mean by "randomness" is the impossibility of predicting the outcome of a choice on the basis of previously-made choices. It follows, of course, that "randomness", in this sense, can never be absolute: if the domain of choice is the set of positive integers, one must be able to predict that the outcome will be a positive integer, not a cow or a color. In AARON the domain of choice is always a great deal more constrained than that, however. The corollary to the notion of randomness as a decision-making principle is
the precise delineation of the choice space: in practice, the introduction into the program of a new decision characteristically involves the setting of rather wide limits, which are then gradually brought in until the range is quite small.
"precisely define a space within which any choice will do exactly as well as any other choice". in AARON, the implementation of the low-order rule has the following form: If (a and b and ...n)
Randomness by Design and by Default. AI researchers in more demonstrably goaloriented fields of intellectual activity must obviously spend much time and effort in trying to bring to the surface performance rules which the expert must surely have, since he performs so well. I am not in a position to know to what extent "Let's try x" would constitute a powerful rule in other activities: I am convinced that it is a very powerful rule indeed in art-making, and more generally in what we call creative behavior, provided that "x" is a member of a rigorously constrained set. A number of artists in this century — perhaps more in music than in the visual arts — have deliberately and consciously employed randomizing procedures: tossing coins, rolling dice, disposing the parts of a sculpture by throwing them on the floor, and so on. But this simply derives a strategy from a principle, and examples of both can be found at almost any point in history. It is almost a truism in the trade that great colorists use dirty brushes. Leonardo recommended that the difficulty of starting a new painting on a clean panel — every painter knows how hard that first mark is to make — could be overcome by throwing a dirty sponge at it (note 8). But one suspects that Leonardo got to be pretty good with the sponge! An artist like Rubens would himself only paint the heads and hands in his figure compositions, leaving the clothing to one assistant, the landscape to another, and so on. All the assistants were highly-qualified artists in their own right, however. The process was not unlike the workings of a modern film crew: the delegation of responsibility reduces the director's direct control, and randomizes the implementation of his intentions, while the expertise and commonlyheld concerns of the crew provide the limits (note 9). Randomizing in the Program: Rules and Meta-rules For the human artist, then, randomizing is not unconstrained, and therefore cannot be characterized by the rule "If you don't know what to do, do anything". Rather, one suspects the existence of a meta-rule which says,
Then p% of the time do (x); q% of the time do (y); r% of the time do (z); which fills out the description of the format discussed in PLANNING. The same frequencycontrolled format is used within the action part of a production in determining specifications: make a closed loop: specification 1: number of sides 50% of the time, 2 sides (simple loop) 32% of the time, 3 sides . . . specification 2: proportion 50% of the time, between 1:4 and 1:6 12% of the time, between 3:4 and 7:8 . . . specification 3. AARON has only the simplest form of these meta-rules, which are used to determine the bounds of the choice space: if(a) lowbound is La, highbound is Ha if(b) lowbound is Lb, highbound is Hb if(n) lowbound is Ln, highbound is Hn specification taken randomly between lowbound and higbound where a,b,n are varying conditions in the state of the drawing. No consistent attempt has been made to develop more sophisticated meta-rules. in the final analysis, the existence of such rules implies a judgmental view of the task at hand, and they are consequently beyond the scope of a program like AARON, which is not a learning program and has no idea whether it is doing well or badly. The Value of Randomness. What does randomness do for the image-maker? Primarily, I believe its function is to produce proliferation of the decision space without requiring the artist to "invent" constantly. One result of that function is obviously the generation of a much greater number of discreet terminations than would otherwise be possible,
and consequently the sense that the rule-set is a great deal more complex than is actually the case. A second result is that the artist faces himself constantly with unfamiliar situations rather than following the sane path unendingly, and is obliged to pay more attention, to work harder to resolve unanticipated juxtapositions. it is a device for enforcing his own heightened participation in the generating process. This last might seem less important in AARON: the program's attention is absolute, after all. But for the viewer the fact that AARON exercises the function is quite important. There is one level of our transactions with images on which we respond with some astuteness to what is actually there. The fact that AARON literally makes decisions every few microseconds — not binary decisions only, but also concerning quantitative specifications — shows clearly in the continuously changing direction of the line, in every nuance of shape, and succeeds in convincing the viewer that there is, indeed, an intelligent process at work behind the making of the drawings.
****************
3. CONCLUSIONS
AARON produces drawings of an evocative kind. It does so without user intervention; without recourse to user-provided data; and without the repertoire of transformational manipulations normal to "computer graphics". It remains now, if not to propose a coherent theory of imagemaking, at least to pull together those fragments of explanation already given into something resembling a plausible account of why AARON works. This will be largely a matter of putting things in the right places. Art-making and Image-making First: no adequate justification has yet been given for the many references to art and artmaking, as opposed to images and image-making, beyond saying that the first are a special case of the second. What makes than special? Art is a bit like truth. Every culture has, and acts out, the conviction that truth and art exist, no two cultures will necessarily agree about what they are. There is no doubt, for
example, that we use the word "art" to denote activities in other cultures quite unlike what our own artists do today, for the quite inadequate reason that those earlier acts have resulted in objects which we choose to regard as art objects. If it is surprisingly difficult to say what art is, it is not only because it is never the same for very long, but also because we evidently have no choice but to say what it is for us. All the same, no justification is possible for making reference to it without attempting to say — once again! — what it is, and doing so in terms general enough to cover the greatest number of examples. Also, those terms should do something to account for the extraordinary persistence of the idea of art, which transcends all of its many examples. Briefly, my view is that this persistence stems from a persistent and fundamental aspect of the mind itself. It would be slating the obvious here to propose that the mind may be regarded as a symbol processor of power and flexibility. I will propose, rather, to regard it as devoted primarily to establishing symbolic relationships: to attaching significance to events, and asserting that this stands for that. This is, surely, a large part of what we mean by understanding. As for art: in its specifically cultural aspects art externalizes specific assertions — the number three stands for the perfection of God, the racing car stands for the spirit of modern man, the swastika stands for the semimythical migrations of the Hopi people, or for a number of other things in a number of other cultures. But on a deeper level, art is an elaborate and sophisticated game played around the curious fact that within the mind things can stand for other things. It is almost always characterized by a deep preoccupation with the structures of standing-for-ness, and a fascination with the apparently endless diversity of which those structures are capable. What we see in the museums results from a complex interweaving of the highly individuated and the highly enculturated, and in consequence any single manifestation is bound firmly to the culture within which it was generated: or it is rehabilitated to serve new ends in a new culture. But ultimately, art itself, as opposed to its manifestations, is universal because it is a celebration of the human mind itself.
The Embeddedness of Knowledge Second: much of what has come out of the writing of AARON has to be regarded simply as extensions to the body of knowledge which the program was intended to externalize. Writing it was not merely a demonstrative undertaking, and it is far from clear what has been raised to the surface and what newly discovered. I have regarded the program as an investigative tool, though for present purposes the distinction is not important. It remains impossible to give an adequate account of this knowledge other than by reference to the program itself. There are several reasons for this. In the first place, this knowledge does not present itself initially as predominantly prescriptive. The first intuition of its existence comes in the form of an awareness that an issue — closure, repetition, spatial distribution — is significant: the program should be structured in terms of that issue, as well as in terms of all the other issues already contained. In this sense the left parts of the productions might eventually be taken together to represent the set of issues which AARON believes to be worth attending to in the making of an mage. But this stage comes much later, and by this time an individual production functions as part of a fabric of issues, with so many threads tying it to so many knowledge sources, that a one-to-one account of how it achieves its effect is generally out of the question.
In fact, there is only a single example I can call to mind in which an effect can be ascribed with certainty to a single production; a particular class of junction in a meandering horizontal line will infallibly generate strong landscape reference, though only if the
In general, this particular class of junction — it is more easily characterized visually than verbally — tends strongly to denote spatial overlap: but the specific effect is evidently quite context-dependant, and dependant also upon the precise configuration of the junction itself. "Personality" as a Function of Complexity. At the higher end of the scale of effects, the problem of saying what causes what becomes more difficult still. I have never been able to understand how there can be such general agreement about the "personality" which AARON's drawings project, or why that "personality" appears to be like my own in a number of respects. Personality has never been an issue on the conscious level of writing code, and I know of nothing in the program to account for it. To put the problem another way, I would not know how to go about changing the program to project a different "personality". I assume that the personality projected by an image is simply a part of a continuous spectrum of projection, not distinguishable in type from any other part. But I am forced now to the conclusion that these more elusive elements of evocation — personality is only one of them, presumably — are generated out of the complexity of the program as a whole, and not from the action of program parts; that given an adequate level of complexity any program will develop a "personality". This "personality" may be more or less clear in individual cases, and may perhaps depend upon how many people have worked on the program — AARON is almost exclusively my own work — but it will in any case be a function of the program, and outside the willful control of the programmer, if this is the case it seems extremely unlikely that any complete causal account of the workings of a program would ever be possible. The Continuousness Image-reading.
of
image-making
and
Third: I want to return to the question which lies at the root of this work. What constitutes a minimum condition under which a set of marks will function as an image?
branching at the junction goes off on the lower side of the line (fig 17). This degree of specificity is certainly exceptional, but less powerful as an evacuator rather than more so.
The reader will have noted that much of what has been written here appears to bear as much upon the business of image-reading as it does upon image-making, there is no contradiction: the central issue being addressed is the image-mediated transaction itself, and imagemaking in particular has no meaningful, or examinable, existence outside of that
transaction. Knowledge about image-making is knowledge about image-reading: both rest upon the same cognitive processes. Thus the skilled artist does not need to enquire what the viewer sees in his work: the satisfaction of his own requirements guarantees it a reading in the world, and the explicit individual readings which it will have are irrelevant to him. The trainee artist, the student, on the other hand, frequently responds to his teacher's reading of his work by objecting, "You're not supposed to see it that way", evidently unaware that the reading does not yield to conscious control. Lack of skill in image-making more often than not involves a failure to discern the difference between what is in the image-maker's mind and what he has actually put on the canvas. It is equally true, I believe, that imagereading has no meaningful existence outside the transactional context: not because the whole event is always present — it almost never is — but because every act of image-reading is initiated by the unspoken assertion "What I see is the result of a willful human act". That is a part of what we mean by the word "image". However much we may amuse ourselves seeing dinosaurs in clouds or dragons in the fireplace, we have no difficulty in differentiating between marks and shapes made by man, and marks and shapes made by nature, and we do not hesitate to assign meaning in the one case where we deny it in the other: unless we belong to a culture with a more animistic attitude to nature than this one has. In short, I believe that the first requirement of the condition in the question is the undenied assumption of human will (note 10). The rest of the condition is given by the display of behavior which draws attention to a particular group of cognitive elements, in other words, evidence of cognitive process may be substituted for the results of an act of cognition. An actual desire to communicate — which may include the simple desire to record the appearance of the world — is not a necessary condition. AARON's strength lies in the fact that it is designed to operate within, and feed into, the transactional context, not to reproduce the aesthetic qualities of existing art objects. It takes full advantage of the viewers' predispositions and does nothing to disabuse them: indeed, it might fairly be judged that some parts :)f the program — the simulation of freehand dynamics, for example — are aimed primarily at sustaining an illusion (note II).
But the illusion can only be sustained fully by satisfying the conditions given above, and once that is accomplished the transactions which its drawings generate are real, not illusory. Like its human counterpart, AARON succeeds in delineating a meaning-space for the viewer, and as in any normal transaction not totally prescribed by prior cultural agreements, the viewer provides plausible meanings. Standing-for-ness. Fourthly: there is a multitude of ways in which something can stand for something else, and in adopting the general term "standing-for-ness" I intended for the moment to avoid the excess meanings which cling to words like "symbol", "referent", "metaphor", "sign", and so on: words which abound in art theory and art history. An image, I have said, is something which stands for something else, and of course it is quite plain that I have been discussing only a very small subset of such things. What are subset?
the defining characteristics of this
Before attempting to answer that question, it should be noted that, while AARON's performance is based upon vision-specific cognitive modes (note 12), there are two closely related questions which cannot be asked about AARON at all. Images of the World and its Objects. The first of these has to do with the fact that in the real world people make images of things. How do people decide what marks relation to those things?
to make
in
It is difficult to avoid the conclusion that image-making as a whole is vision-based, even though it bears directly on the issue of appearances only occasionally. It is my belief that even when an image is not purposively referential — as is the case with AARON — or when the artist seeks to refer to some element of experience which has no visual counterpart, it is his ability to echo the structure of visual experience which gives the image its plausibility (note 13). The Persistence of Motifs The second question has to do with the fact that actual image elements, motifs, have been used over and over again throughout human history, appearing in totally disconnected cultural settings, and bearing quite different
Figure 18.
meanings as they do so. what is it that makes the zigzag, the cross, the swastika, squares, triangles, spirals, mandalas, parallel lines, combs (fig 18), ubiquitous, so desirable as imagistic raw material?
been discussing, then it will be clear that the form of an image is a function both of what is presented to the eye and of the possession of appropriate modes. Representation.
My own answer to this question is that the cognitive modes and their dependant behavioral protocols are absolutely ubiquitous, and that the recurring appearance of these motifs is
figure 19. hardly even surprising (note 14). In fact, we have only to start cataloguing the motifs to realize that most of them are simply formed through the combination of simple procedures. The swastika, for example, is both cross and zigzag, just as the mandala is cross and closed form, and the so-called diamond-backed rattlesnake motif of the Californian Indians is a symmetrically repeated zigzag (fig 19). Taken together, these two questions point to the dualistic nature of image-making. If, as I believe to be the case, it can be shown that the representation of the world and its objects by means of images follows the same cognitionbound procedures as the simpler images I have
I said at the outset that my conclusions would bear upon the nature of visual representation, as distinct from what the AI/Cognitive Science community means by the word "representation". It is still the case that my specific concerns are with what people do when they make marks on flat surfaces to represent what they see, or think they see, in the world. All the same, some speculation is justified about possible correspondences between the two uses of the word. It is important, for example, to note that the lines which the artist draws to represent the outline of an object do not actually correspond to its cadges, in the sense that an edge-finding algorithm will replace an abrupt tonal discontinuity with a line. In fact, the edges of an object in the real world are almost never delineated by an unbroken string of abrupt tonal discontinuities. If the artist is unperturbed by the disappearance of the edge, it is likely to be because he isn't using that edge, rather than because he has some efficient algorithm for filling in the gaps. Similarly, most of the objects in the world are occluded by other objects, yet it would not normally occur to the artist that the shape of a face is the part left visible by an occluding hand (fig 20).
figure 20.
The face evidently exists for him as a cognitive unit, and will be recorded by means of whatever strategies are appropriate and available for the representation (note 15). It is as true to your meaning of "representation" as to mine, not only that it rests upon the possession of appropriate and available strategies, but also that new strategies may be developed to fit particular concerns. Both are bound by entity-specific considerations, however: considerations, that is to say, which are independent of the particular event or object being represented and take their form from the underlying structures of the entity — the artist's cognitive modes on the one hand and the structural integrity of a computer program on the other. What is a Representation "Like"? It could not be seriously maintained that a computer program is "like" a human being in a general sense, and it should not be necessary to point out that a representation in my meaning of the word is not "like" the thing represented, other than in precisely defined senses of likeness. That may not be quite obvious, however, when we consider the idea that a portrait is "like" the sitter. Even though we may be careful enough to say that the portrait LOOKS like the sitter, or that a musical passage SOUNDS like the rustling of leaves, we tend to stop short of that level of detail at which it becomes clear that the appearance of a painted portrait and the appearance of a person actually have very little in common. A representation may be about appearance, but we never confuse the representation with the reality, no matter how "lifelike" it is. in fact, we might rather believe that all representations of a given class are more like each other than any of them is like the thing represented. Life follows its laws, representations follow theirs. What is an Image? The purpose of an act of representation is to draw attention to some particular aspect of the represented object, to differentiate that aspect from its context, not to reconstitute the object itself. To that degree we might regard a visual representation as constituting a partial theory of that object and its existence, just as we might regard a computer program as constituting a theory of the process it models. But neither the artist nor the program designer has any choice but to proceed in terms of the modes which are available or
which they are capable of developing. In the case of the visual representation, the making of an image, I have tried to demonstrate the cognitive bases of those modes, and also, through my own program AARON, to demonstrate their raw power in the image-mediated transaction. That, finally, defines my use of the word "image". An image is a reference to some aspect of the world which contains within its own structure and in terms of its own structure a reference to the act of cognition which generated it. It must say, not that the world is like this, but that it was recognized to have been like this by the image-maker, who leaves behind this record: not of the world, but of the act.
APPENDIX I
THE TURTLE SYSTEM.
When the real turtle is not running, the program simulates its path, and calculates where it would have been in an error-free world after completing each command. In this case it substitutes a chord for the arc which the real turtle would have traced out. (The straight line segments which may just be visible in the illustrations here are due to the fact that they were photographed off the Tektronix 4014 display, not from an actual turtle drawing.) The Navigation System. The navigation system is correct to about .2 inches: that is an absolute determined by the sonar operating frequency — about 40KHz — and does not change with the size of the drawing. Even with so coarse a resolution the feedback operation is efficient enough for the turtle to do everything on the floor that the program can do on the screen; indeed, if the turtle is picked up while it is drawing and put down in the wrong place it is able to find its way back to the right place and facing the correct direction. The Dynamics of Freehand Drawing.
There are several complexities in this part of the program which are worth mentioning. One of them is that the program has to be able to accomplish dramatic shifts in scale in the drawing, to make small things which look like small examples of big things: smoothly-curved closed forms should not turn into polygons as they get smaller. This is required both on the issue of shifts in information density and also to maintain implied semantic relationships between forms. A second complexity is that the movement of the line should convincingly reflect the dynamics of a freehand drawn line, and this should mean, roughly, that the "speed" of a line should be inversely related to the rate of change of curvature: the pen should be able to move further on a single command if it's path is not curving too radically. (The converse of this is that the amount of information needed to specify an arbitrary line should be a function of its rate of change of direction, with the straight line, specified by its two end points, as the limiting case.)
Movement Scaling.
Third, the pen should proceed more "carefully" when it is close to some final, critical position than when it has relatively far to go and plenty of time left to correct for carelessness. This, too, implies a scaling of movement in relation to the state of the local task. Finally, there is the practical problem that for any given number of cycles of a stepping pattern, the actual distance traversed by the pen will vary with the ratio of the turtle's two wheel speeds. Unfortunately, this relationship is not linear, and neither does it provide a useful simulation of freehand dynamics. Briefly, the line-generating procedure concludes that, given the present position and direction of travel of the pen in relation to the current signpost and to the final destination, it will be appropriate to drive the two wheels at stepping rates rl and r2, taking n steps on the faster of the two. in doing so it takes account of all of the above considerations. The ratio determined for the two speeds is a function of two variables; the angle A between the current direction and the direction to the current signpost, and a scaling factor given by the remaining distance Dd to the final destination as a proportion of the original distance Do (fig II) . This speed ratio then becomes one of the two variables in a function which yields the number of steps to be taken — the distance to be traveled — by the fast wheel: the other variable being the relative size of the block of space allocated to the current figure.
These functions have to be tuned with some care to be sure that each variable is correctly weighted, and to compensate for the turndistance ratio of the turtle geometry itself. But none of this — or any other part of the program — involves any significant mathematical precision. There are only fifteen stepping rates available, symmetrically disposed between fast forward and fast reverse. The whole program, including extensive trigonometric operations, uses integer arithmetic — this for historical reasons as well as limitations of available hardware —
and the geometry of the current turtle determines that it can only change direction in increments of about one sixth of a degree. (The turtle was not until recently interrupt-driven, and for design reasons this incremental direction-change factor was one degree in the earlier version.) Everything relies upon the feedback mode of operation to provide correction and to prevent error accumulation. The point is that a good car driver can drive a car with sloppy steering as well as a car with tight steering up to the point where feedback correction cannot be applied fast enough.
APPENDIX II — MATRIX REPRESENTATION.
This description is given here primarily because it offers some insight into the kinds of considerations which the program believes to be important, and the way in which these considerations are accessed: not because there is anything particularly original from a data-structure point of view. Much of the detail of the implementation is demanded by the word-length of the machine, and would go away in a larger machine. The intent is to make all the information relating to a particular part of the drawing effectively reside in a particular cell. The program uses the single words representing matrix cells in different ways according to is happening in the cel1s:-
what
A "simple" event means, essentially, that all the data will be contained within this one word, although it will be seen that its simplicity relates to its use in a more meaningful sense:-
Before beginning work on the drawing, the program "roughens" the surface: that is, it declares some parts to be unuseable for the allocation of space to a new figure, although a developing figure may go into this "rough" space. This is done in order to maximize the rate of change of density across the image:-
"use" may involve either a line or some special spatial designations:-
in either case, the cell will now have a figure identifier associated with it. The new version of the program uses less figures than the earlier one, and develops than further: a maximum of 32 figures is permitted:-
If the cell contains a line, then it can be dealt with as a simple event provided that it is not a line junction of a special kind. In this case the entry designates a line function type;-
Cell Linking.
The forward and backward links are a very important device here. Lines are mapped onto the matrix as they are drawn, using an adapted form of Bresham's Algorithm to ensure that strings of cells never include corner-to-corner contiguity, This also means that for any given cell, the line it contains must have entered it from, and will subsequently leave it into, only one of four cells: thus the four-bit linking permits a complete traversal of any series of line segments not involving a complex event.
At this point, the single word is inadequate, and it is used as a pointer, words now being allocated in pairs from a freelist. Here again, one level down from the matrix, the words will be variously decoded. In particular, in the event that the cell is occupied by two figures, the two words are each used as pointers to new pairs of words, one for each figure:-
A cell at this level may contain complex events from one or both of two classes: connective and configurational. Configurational events frequently involve order-2 nodes — nodes, that is, which fall on a continuous line — and include sharp angles, strong curvature, and so on. In practice, the program forces complex events so that they always occur within an 8-cell displacement in x and y from another cell, and the location of the next event can then be recorded rather cheaply:-
"Sense", here, means convex or concave if the line is the boundary of a closed figure, and up/right or down/left if it is not. If this is a figure node of any order other than two, one entry will be needed for each adjacent node:-
in addition to the displacements which chain this node to each of its connected nodes. This means that the traversal of the figure as it is represented by the matrix can continue from this point until the next node is reached. Thus, the entire structure is contained essentially within the matrix, and the short lists which may be tacked onto any single cell serve merely to extend the effective capacity of that cell. Ideally, this matrix should be as fine as possible: since the resolution of high-grade video is only 1024x1024, a matrix of this size would obviously constitute an extremely good representation. However, there are two considerations which make so fine a grain unnecessary. The first is that the program keeps a full list of all the actual coordinate pairs for each figure as it is drawing it, and can access it should some very precise intersection be required. The second is that the program is designed to simulate freehand drawing, not to do mechanical drawings, and once a figure is completed some approximation to it for purposes of avoidance or even intersection is unobjectionable. The maximum error induced by assuming a point to be at the center of a cell in a matrix of 90x160 will be about 7/8th of an inch in a sixteen-foot drawing: only three times the thickness of the line.
NOTES ON THE TEXT
note 1. The word "representation" is used here in a more general sense than it now carries within the A.I. community: the problem of formulating an internal (machine) representation of some set of knowledge differs from the more general problem primarily in its technological aspects. note 2. "The Art of Artificial Intelligence: 1, Teams and Case Studies of Knowledge Engineering," Ed Feigenbaum, Proceedings of IJCA15, 1977; pp.1014-1029. note 3. In the decade before I became involved in my present concerns my work was exhibited at all of the most serious international shows, and I represented my country at many of them, including the Venice Biennale; as well as in some fifty one-man shows in London, New York and other major cities. note 4. Different from each other, loosely speaking, in the way one might expect a human artist's drawings to differ one from another over a short period of time. note 5. Written in “C”, under the UNIX operating system. note 6. I am referring here to differentiations performed in relation to the image, not in relation to the real world, with which the program has had no visual contact. note 7. The program does not attach semantic descriptors to the things it draws: the terms "penumbra", "boulder" and so on are my own descriptions, and are used here for the sake of simplicity. note 8. Significantly, from the point of view of my argument here, the dirty marks were intended to "suggest" the elements of a composition. note 9. The one unconstrained randomizing agent in this scenario, the final cutting of the film by the producer rather than the director, has also demonstrated itself too be devastatingly non-creative. note 10. "Undenied" is stressed here because there exists an odd case in which the will of the artist is to produce objects which demand the contemplation of their own qualities for
their own sake — what they are rather than what they stand for — and which thus seek to deny the viewer his normal assumptions. To the degree that this aim can actually be achieved the resulting object could not properly be called an image, and I doubt whether aesthetic contemplation could properly be called reading. Thus much of XXth Century abstract art falls outside this discussion. note II. It is worth noting, though, that AARON did mechanical straight-line shading for about two years — it ran faster that way – and in that time only two people ever remarked on the inconsistency. note 12. I will leave aside the interesting question of whether there are not more general underlying structures which are common to all physical experience. It is presumably no accident that terms like "repetition", "closure", and others I have used in relation to visual cognition are freely used in relation to music, for example. note 13. The control of the rate of change of information density across the surface of the image, to which I referred earlier, is the most powerful example I know in this regard. The eye is capable of handling units as small as a speck of dust and as large as the sky, but the processes which drive the eye seem always to adjust some threshold to yield a preferred distribution spanning only a few octaves. note 14. In fact, the more theatrical explanations which range from world-wide migrations to the influence of extraterrestrial voyagers are not even necessary. note 15. He is unlikely to treat the boundary between face and hand as part of the face, but as part of the hand, and may very well indicate the full boundary of the face as if he could actually see it.
HOW TO MAKE A DRAWING.
Harold Cohen Science Colloquium National Bureau of Standards Washington DC December 17 1982.
Let's begin with a story. Once upon a time there was an entity named Aaron. With Christmas upon us, that seems an appropriate way to begin my story, but this story does not end with the hero marrying the princess and living happily ever after. Most of the story concerns Aaron's education, which began at Stanford University in 1973. Not very promising as the plot for a good story, you might think: but it is not simply an excuse for assailing you with arguments about the merits of a liberal arts education over a scientific one, or vice-versa. Aaron's education was actually quite unusual. There were no courses in US history, no calculus, no languages: in fact, there were no courses at all, and Aaron was awarded no degree. We might best summarize this unorthodox education by saying that it was aimed exclusively — literally exclusively — at teaching the student how to make drawings. Aaron was not in any sense an orthodox student, though. Gifted in some ways with quite remarkable abilities, in others Aaron was devoid of the most basic skills, so that even the most elementary considerations — how the human hand moves a drawing implement around on a sheet of paper, for example — had to be taught from scratch. That part of the teaching, at least, went fast. Right from the beginning, Aaron exhibited an excellent memory, to the point of being able to retain long lists of instructions, which were always followed to the letter, provided that they were given in a clear and unambiguous way: the kind of instructions that would require continuous decision-making based on a careful assessment of the changing state of the drawing as it proceeded. And Aaron would make all the necessary decisions without requiring further detailed instructions. Judged against those hordes of students who can't follow instructions as to where to HOW TO MAKE A DRAWING
page 1
write their names on a sheet of paper, this student would have to be regarded as unusually intelligent: if, that is to say, following complex instructions may be regarded as evidence of intelligence. Yet, what seemed to be lacking, what we might normally consider to "be a necessary complement to the most minimal intelligence, was the pre-existence of even a primitive set of cognitive skills, the sort of skills which develop very early in children, and are almost certainly "built into their hardware, so to speak: the ability to distinguish between figure and ground, for example, or to distinguish between closed forms and open forms. These skills were not built into Aaron's hardware, and they had to be acquired, in much the same way that children acquire the rules of arithmetic or grammar. They were acquired quite quickly. looking back over Aaron's output of drawings in the first couple of years, though, one has the impression that they were produced largely in order to demonstrate the student's newly-acquired possession of these skills: a bit like the way young children show off a newly-acquired ability to count. And that analogy may come very close to the truth. Now, any serious educational procedure ought to teach the teacher as much as it teaches the student, and in this case the teacher was learning a good deal. For one thing, he became aware that much of what the viewer of a drawing needs from it is not "what the artist had in mind," but simply evidence of another human being's intentional activity. People use art for their own purposes, to carry their own meanings, not for the artist's purposes and meanings, concerning which they probably know very little. It is the evidence of intention in the work that lends authority to the viewer's private meanings, by allowing them to be assigned to the artist, whether that evidence is actual or illusory. And, the teacher realized, Aaron's almost exclusive emphasis on a few low-level cognitive skills was generating something very like evidence of intention, if he were to judge by the responses of Aaron's public. From very early on the drawing~ were treated as "imagistic:" that is, as standing for things in the world. Yet the teacher was quite certain, when viewers of his student's drawings found reference to animals and landscapes, that Aaron had had no intentions about representing such things' Aaron remained bound to the act of drawing, and had less knowledge about the appearance of animals and landscapes than a two-year old child might have. He "became aware also, not only that Aaron generated much richer, more diversified output than he had himself envisaged when he was instructing the student, but also that there were aspects of the drawings which didn't seem to arise from the instructions at all. Many of those who had known the teacher's work a decade earlier thought they recognized his hand in the student's work, "but he himself remained unconvinced, seeing in the work a certain innocence he did not associate with his own output.
HOW TO MAKE A DRAWING
page 2
He firmly rejected the notion that Aaron was beginning to "take off," bringing a unique and original voice to the business of image-making: for the reason that he knew ail of Aaron's shortcomings, and was aware that, in spite of Aaron's undeniable abilities, the student was totally incapable of learning from experience, from the act of drawing itself. As good as Aaron's memory was of the drawing in process, that drawing vanished into oblivion the moment it was completed, leaving no trace of its existence behind, no new body of knowledge upon which its maker might subsequently draw, and each new drawing was made as if it were the first ever to be done. Aaron was learning only in the sense of being able to handle increasingly complex instructions. It seemed unlikely that an intelligence of so limited a kind might develop a personal "voice." All the same, the teacher found the student's work engaging, to the point where he began to see his own role as something between teacher and collaborator. Knowing perfectly well that Aaron didn't have the first idea about color, yet feeling that the drawings cried out for color, he took to coloring them himself. He felt no discomfort about signing them with his own name — without his efforts and his instructions, after all, Aaron would never have existed in the domain of art — and when presented with several mural commissions he had no hesitation in using Aaron's drawings rather than his own. He had no others of his own, because a couple of years after the student's education began he had given up drawing himself: given up moving the pen around with his own hand, that is to say. Aaron drew so much better than he did. Aaron peaked out, at around the age of six, about three years ago, at a time when the work — or, more precisely, Aaron itself — was getting to be in some demand. Perhaps that demand was part of the reason: it is certainly the case that the teacher was spending much of his energy on mural commissions and exhibitions. But the truth is that the teacher was losing interest in this student, developing serious doubts about whether a student with Aaron's limitations would ever be able to go beyond current achievements. It must surely have been the case, the teacher thought, that Aaron's limitations, like its achievements, resulted from the educational process for which he had been responsible. If he had a chance to begin over, how differently would he proceed, knowing what he knew now? Would it be possible to produce a less limited entity than the first Aaron had proved to be? In particular, he wondered, what would he need to do to guarantee that a new student would behave more creatively — though he was not entirely sure what the word meant — than Aaron had done? And Aaron was simply put aside, while the teacher began to ponder in detail the problem he had set himself. You may find it odd that an entity with so distinguished a record, achieved at so tender an age, should have been put aside HOW TO MAKE A DRAWING
page 3
so heartlessly. In order -to understand that, to see that the teacher was not simply a heartless villain who had used up the innocent for his own purposes — such is the meat of fairy stories, after all — we have to remind ourselves that Aaron was a computer program, not a person. *********************************
Perhaps it did not need my Christmas story to emphasize the confusion which arises from anthropomorphizing the intelligent products of the new electronic technologies. It is obvious, isn't it, that there are massive differences "between computer programs and people? Even the least intelligent human being learns something from experience, while Aaron learned nothing: which is not to say that intelligent programs are innately incapable of learning, simply that Aaron was, and managed to perform its tasks nevertheless. Even the clumsiest human being develops physical skills, simply through the continuing use of his or her own body and the use of various tools. Aaron had no physical existence, never felt the pressure of pen against paper, and hardly knew one drawing device from another: electronic display, plotter, mechanical turtle — they were all functionally interchangeable, and played no part in the convincing emulation Aaron gave of human freehand drawing. This rested upon a careful consideration — its programmer's, not its own — of the dynamics of the human hand, driven, in feedback mode, by the human cognitive system. As to this cognitive system, which seems to spring directly from the nervous system in human beings: Aaron never had any such hardware, and its software emulation, the ability to distinguish between figure and ground, for example, or to distinguish between insideness and outsideness, had to be formulated for it into precisely-stated behavioral rules. Yet even that isn't quite right: what we should stress, before we begin once again to build an image of a person-like entity being GIVEN a range of abilities, is that Aaron was not GIVEN all these rules and instructions. Aaron WAS the rules and instructions. Adding new rules to the program was not changing what Aaron HAD, it was changing what Aaron WAS, its very structure. There are conceptual difficulties in this distinction, as I have come to recognize. I have been asked many times, in several languages, and in tones ranging from wonder to outrage, as I have stood in various museums, watching Aaron produce a series of original drawings, none of which I had ever seen before, "Who is making the drawings? Who is responsible? Is the program an artist? What part of all this is art?" The differences between programs and people might be obvious, but they seem difficult to keep in focus. Most people evidently believe that machines must either be programmed to do the PARTICULAR thing they happen to be doing — by this view, Aaron must have been somehow "fed" the drawing it was making — or they HOW TO MAKE A DRAWING
page 4
must be behaving randomly. But Aaron always appeared to act rather purposefully, and over and over again I have watched peoples' faces register the confusion which accompanies a successful assault upon deeply-held "beliefs, as it came home to them that this entity was following neither of the only two paradigms they had to hold on to. "I see," some people would say, "the program is really just a tool!". Well, it is and it isn't. What they meant by a tool was something with a handle at one end and a use at the other: a hammer, a scythe. But suppose one had a hammer that was capable of going around a building site, searching out and thumping any nail that protruded more than a thirtysecond of an inch above the surface? Would we still call that a tool? If one were to write a computer program which allows a composer to sit down at a keyboard and compose music in an essentially orthodox fashion, albeit with an infinitely extensible orchestra, one might reasonably think of THAT as a tool in an orthodox sense, because making a BIG difference is not the same as making a FUNDAMENTAL difference. But what of a program that knows the rules of composition, and generates, without input from a keyboard, an endless string of original musical compositions? Would that be an orthodox tool? Aaron was clearly not a tool in an orthodox sense. It was closer to being a sort of assistant, if the need for an human analogue persists, but not an assistant which could learn what I wanted done by looking at what I did myself, the way any of Rubens' assistants could see perfectly well for themselves what a Rubens painting was supposed to look like. This was not an assistant which could perform any better for having done a thousand drawings, not an assistant which could bring anything approximating to a human cognitive system to bear on the production of drawings intended for human use. A computer program is not a human being. But it IS the case, presumably, that any entity capable of adapting its performance to circumstances which were unpredictable when its performance began exhibits intelligence: whether that entity is human or not. We are living on the crest of a cultural shock-wave of unprecedented proportions, which thrusts a new kind of entity into our world: something less than human, perhaps, but potentially capable of many of the higher intellectual functions — it is too early still to guess HOW many — we have supposed to be uniquely human. We are in the process of coming to terms with the fact that "intelligence" no longer means, uniquely, "human intelligence." Not all computer programs are intelligent, needless to say. Programs are written by people, and if they are written as tools of an essentially orthodox kind, they surely won't be intelligent. For those which are, the question to be asked is not, what ARE they? — Was Aaron an artist? for example — but, HOW TO MAKE A DRAWING
page 5
what will they DO? The word "artist" implies human-ness, for obvious reasons. We might as usefully argue about whether Aaron was an artist on the evidence that it didn't wear jeans, didn't drink beer, and didn't want to be famous, as to argue from the fact that it didn't possess a human nervous system and knew nothing about the culture it served. What we do need to know, rather, is the part to be played by Aaron-like programs and successor programs which will be to Aaron what chess is to tictac-toe, in the cultural enterprise of art-making. And that isn't the kind of question to which one can venture an answer with any great confidence today: much less so if it is extended to intelligent programs as a whole. It is certainly the case that some problems in computing have proved to be appallingly intractable: the understanding of natural speech in an unlimited domain of discourse, for example. On the other hand, the limitations I have described in Aaron are not inherent in intelligent programs as such. They merely result from the attitudes and interests I brought to bear on the writing of the program: it could as easily have developed differently, as Aaron's successor has. And Aaron was not abandoned because of its limitations with respect to what it was designed to do, but because it lacked the flexibility to allow it to be adapted to new purposes, that's normal for programs developed in an ad-hoc manner, as Aaron was. By the time I had been patched Aaron up with string and masking tape for five years, by the time I had completely rewritten it three times, it was obvious that that, on the one hand, a program would need to be able to exercise more originality than Aaron had to satisfy me in the future, and that, on the other hand, Aaron's current structure would prevent it ever achieving any such thing. **************************************
Stated baldly, though not at all clearly, the new Aaron — I will call it Aaron2 for the sake of clarity — was intended to be creative. Once I had actually managed to state that intention to myself, it took some time to accommodate to the fact that, while I thought I knew creativity when I saw it, I evidently had no very useful notion of what the word "creative" could possibly mean. I was sure that some individuals use their intellectual resources with abnormal efficiency: I was equally convinced that such aberrations ARE a question of the USE of resources, and not one of abnormal resources. And when I came to scan the age-old literature on the subject of creativity, it seemed to me that I was not alone in lacking any functional model, any idea of what happens internally when someone has an original idea. Since I never really believed that the Muse of the ancient Greeks sits on the artist's shoulder and whispers in his or her ear, I had no reason to suppose that there would be a Muse of the Electronic Age waiting to sit on the shoulder of a suitably endowed, or HOW TO MAKE A DRAWING
page 6
suitably seductive, program. My program would not "be written as an incantation for the seduction of electronic muses, then, obviously: but what would it be? What can it mean to talk about a program "being creative?" How would one know that it had been? Let me take a few minutes to make a number of general observations, by way of explaining what I thought about all this, and why eventually the new program was designed the way it was. In the first place, nothing I have said about the appearance in our world of non-human intelligence was meant to deny that, for most matters involving the exercise of the higher intellectual functions, human intelligence is the only prototype we have. It might not always be that way, but for anyone designing intelligent programs today, I do not see how the modeling of the human intelligence CAN be avoided, or, indeed, WHY it should be. This must be the case particularly for a program whose output is intended to correspond, on an intimate level of detail, to something as intimately human as a human freehand drawing. I believe one captures the essence of the human performance by modeling the performance itself, and never by attempting to duplicate the appearance of the OUTCOME of the performance. Thus I seemed to be on a head-on collision course with the need to say, in functional terms, what constitutes creativity, and there seemed to be no way around it. (I should make clear, by the way, that this view is not intended to refer to the implementation levels of programs built around devices which are fundamentally unlike what the human being uses. The video camera being used in computer vision systems, for example, has very little in common with the human visual system, and, to the degree that much of what goes on in vision programs has to do with inferring the state of the external world from the incoming data, there would seem to be no compelling reason to use human visual data processing as a model.) Secondly, apropos of drawing: like its predecessor, Aaron2 would be making drawings, but not the same KIND of drawings. I need to say something about the differences, and about drawing in general: any classification is to some degree arbitrary, and I should make clear what my own is. The most inclusive way of regarding a drawing, probably, is as a set of ordered marks, or perhaps we should say INTENTIONALLY ordered marks, since there are all sorts of ordered marks in the world we don't regard as drawings: for example, the tracks of cars in the snow, the veins in a leaf, the cracks in a mud flat... or, for that matter, a musical score or a printed page of text. The question of intentionality is of paramount importance, notwithstanding the fact that intention has to be inferred from forms rather than perceived directly, as forms are perceived. So, for example, we might readily agree that an alphabetic character HOW TO MAKE A DRAWING
page 7
drawn by a typographer in "the process of designing a type face is a drawing, while denying that the same character appearing in a printed text is a drawing. The crucial factor is that the designer's character stands for what later appears on the printed page, while the character on the printed page doesn't stand for anything. It IS what it is. This implies that a drawing is a drawing, not merely because it stands for something other than itself, but because we find in it evidence that the reference to that other something results from an intentional act. Which is not to say that all drawing is representational, in the sense that it makes reference to the outside world in terms of the world's appearances. I suspect that very little of it has been: in fact, it may be that in the whole of man's history, only Western European art from the Renaissance on has ever busied itself with appearances to the degree that it has. It IS a question of degree, of course. A drawing is a set of assertions about the nature of the world, and the form in which those assertions are made derive from the operation of the visual cognitive apparatus, whether or not the marks are intended to refer to appearances. As an example: all human beings at all times have represented the solid objects of the world, on flat surfaces, as closed forms. But at the same time, closed forms, and the distinction between closed forms and open forms, has functioned as fundamental raw material from which all images are built. It would seem, then, that the making of drawings would be inextricably linked to the possession of a cognitive apparatus, and of cognitive skills. And for a human being it certainly is. But I have been careful to say that a drawing contains the IMPLICATION of intention, as I have also said that the viewer actually assigns his or her own intentions to the artist rather than the other way about. For a program, what is required is enough knowledge about the way images are made and used to be able to generate the IMPLICATION of intention: which is what Aaron did. Aaron did not make representations, in the sense of dealing with appearances. It made images, evocative drawings: which is to say, drawings which facilitated the assignment of the viewer's intentions and meanings. Its successor, however, was designed to make representations. Now, in asserting that the structure of representations takes its character from the nature of the visual cognitive system, I do not intend to imply that a representation is, in any useful sense, a transformation of the external world onto a sheet of paper. I am quite sure that it is not. What I said was that a representation is a set of assertions about the external world, made in terms of the world's apprehend ability. That does not imply the existence of any one-to-one mapping of the world onto the representation, such as one finds in a photograph, and, its ubiquity notwithstanding, photography is quite uncharacteristic of representation-building in general. HOW TO MAKE A DRAWING
page 8
There is nothing particularly original in this nontransformational view of representation-building: every sophisticated artist knows perfectly well that a drawing is an invention, built from whatever raw material he or she can muster, and aimed at plausibility rather than truth. In fact, the idea of truthfulness, realism, is itself just such an invention, one which simply uses the appearance of the world as a hook upon which to hang its claims to plausibility. But if we take this view at face value, disentangle it from the photographic, transformational bias of our time, some interesting questions emerge. In some superficial sense a representation represents the external world, but then it isn't clear HOW it represents that world, or what aspect of the world is being represented. In another sense a representation represents one's internal world — that is to say, one's beliefs about what the external world is like — and it is produced, externalized, in order to check the plausibility of one's beliefs against the primary data collected by one's cognitive apparatus. Obviously, this view of representations as externalizations of an internal world is not limited to drawings, but to any forms by means of which the individual is able to examine his or her own internal state. And at that point I thought I had my first real hold on the question of creativity, which I was determined to characterize in terms of normal functions, and without falling back upon some superman theory. If this checking process in the normal mind is put to the service of confirmation, of reassuring the individual that the world is the way he or she believes it to be, we might suppose that its function in the creative mind is to DISconfirm, to test the individual's internal model to the limit, and to force the generation of new models. In other words, the essence of creativity would lie in self-modification, and its measure would be the degree to which the individual is capable of continuously reformulating his or her internal world models: not randomly, obviously, but in perceptive response to the testing of current models. Thirdly: to talk of one's internal model of the world is to talk of a representation, clearly. But it is not a fixed, coherent representation, the way a representation on a sheet of paper may be thought of as fixed and coherent. It takes very little introspection to discover that the pictures we conjure up in our heads are anything but complete. Try conjuring a picture of your favorite person's face, and then ask yourself a question about it — what is the distance between the eyes, for example — to see how volatile the mental image is, and how little information is carried in it. Ask a question about something quite different, and a quite different mental image may spontaneously emerge to replace the image of the face. Evidently, there is some store of material below the level of these mental images, and we should probably regard these images as a sort of semi-externalized representation of the material at the lower levels. What we mean when we speak of one's internal model of the world is not a HOW TO MAKE A DRAWING
page 9
mental image, or a succession of mental images, but rather the entire internal procedure "by means of which those images are continuously produced. Representations represent lower-order representations, and exist as a series of momentary crosssections in a continuous unfolding, a continuous reconstruction of the world from the debris of experience. We ought to be able to characterize creativity in terms of this normal representation-building: that is to say, we should expect to find creativeness exercised, not as another kind of function entirely, but in highly particularized modes for the reconstruction of mental models from low level experiential material. It is not surprising, then, to find Albert Einstein, one of the few to have written about the nature of creativeness from within and in a convincing way, speaking of the part played by this lower-order material in thinking: "It is by no means necessary that a concept must be connected with a sensorily cognizable and reproducible sign (word: in our context, mental image)... All our thinking is of this nature of a free play with concepts... For me it is not dubious that our thinking goes on for the most part without use of signs, and beyond that to a considerable degree unconsciously. " We might conclude that in Einstein's case, creativity involved an extension of the domain of "thinkability," manipulability, to a level on which most of us find mental material to be unmanipulable. Fourth: a very large part of what the individual has in his or her head is knowledge about how to do things. And people don't behave creatively unless they know how to do a great many things, just as they don't behave creatively unless they are capable of abstraction. There is nothing creative about ignorance. How, then, could one expect a program to exhibit creativeness, selfmodification, unless it, too, first knew how to do a rather large number of things, whether it had acquired that knowledge experientially, or had it provided, hand-crafted, by the programmer. The ability to acquire experience would need to be built into the program at the outset, but the self-modification which might proceed from that experience, would probably come at a late stage in the programs development. That implies, of course, that the program would need to be able to store, in some appropriate form, everything it had ever done. Which leads to the fifth observation, and to what is perhaps the most teasing of all problems relating to the mind. The mind evidently stores all its knowledge, all the experience of its owner's life, in some amazingly compact fashion. What happens to your knowledge about how to cross the road when you are not crossing the road? When you sit down to play a game of chess, do you find all you know about the game stored in one big lump, like HOW TO MAKE A DRAWING
page 10
a book on a shelf, somewhere in your memory? Can you access it all at once, form a single mental image of it? Presumably not. When you need to find an appropriate rule for crossing the road, do you need to review and examine all the rules you have for playing chess, and for eating spaghetti, and for tying your shoelaces, on the way, in order to determine whether any of them are appropriate to the current situation? Presumably not. What we mean by a rule is not an imperative — WATCH OUT, EAT YOUR FOOD —it is a conditional. — if you can't beat 'em, join 'em: if the cap fits, wear it: if they can't get bread, let them eat cake — and the condition which triggers the required action seems to lead us directly to what the action is. Roles for the tying of shoelaces appear to live with the shoelaces, and rules for eating spaghetti live with the spaghetti. Or, to put the matter another way, rules for the use of things are simply part of our conceptual grasp, our internal representations, OF those things. Of course, most rules in real life are a good deal more complex than these examples, if only for the reason that things in the world interact with each other. Rules link events: if 'a' is the case, and either 'b' is or 'c' is provided that 'd' isn't... and so on. Also, many rules belong to classes of things, classes of behavior, rather than to individual things and individual behaviors. The rule which says "If you are eating spaghetti AND wearing a new jacket, proceed with caution" is a rule belonging to a whole class of messy foods which stain clothes, and is invoked by the appearance on the table of a dish of spaghetti, by a process we might call inheritance, by virtue of the fact that membership of the class "messy foods" is part of what we understand by spaghetti. *********************************
You will recognize that these remarks are directed at WHAT the mind does, and make no assumptions about HOW it performs its feats of information processing. On that question I know nothing, nor do I believe it is central. My aim was to identify, in a few essential characteristics of human intellectual activity, the informing principles of a program, not to replicate the processes through which the mind runs its own programs. Let me summarize those principles. Firstly: Aaron2, unlike Aaron1, should have a permanent memory. In this memory should be stored, in extremely compacted form, every drawing the program makes, together with everything that the program knows about drawing, whether that knowledge is programmed by hand or acquired through experience of drawing. But, compacted though it should be, that stored material should be structured so as to inform its own regeneration into more complete specifications for the making of a new drawing. I mean this as an analog for the building of representations in the mind HOW TO MAKE A DRAWING
page 11
from lower-order representations, up to and including the generation of external representations. However, this process in the program should tie flexible enough to reflect the associative quality of the process in the mind. (I have neglected to mention association up to this point, largely through lack of time: nevertheless, my suspicion is that creativeness is not a function of "correctness" in representation-building so much as it is a function of the slightly messy, apparently somewhat random, action of association.) Secondly: the knowledge the program should have, its domain of expertise, should concern, predominantly, the making of "visual" representations: that is, it should know enough about the nature of the visual field, and about the way people derive knowledge of the three-dimensional world from it, that it would be able to generate a convincing sense of depth, regardless of the lack of any data concerning the objects in the visible world. This principle was actually quite arbitrary with respect to the program's planned structure, though it made sense to pick a domain in which I felt I had a good deal of expert knowledge readily available, and it was certainly justified as an excellent example of the final stage of the externalizing process. But you will have recognized that almost none of my remarks have been directed specifically to drawing, and I tend to think the program could as easily deal with other material. Thirdly: the rules which determine how its knowledge of drawing is to be applied in the making of particular drawings should accessed by the program as it accesses the knowledge itself. Perhaps I should have explained that Aaron1 was what we call a production system: simply a long list of rules — if some condition holds true, do this, otherwise if something else is the case, do that, otherwise ... — in which the program simply cycles through the list until it finds, and activates, an appropriate rule. One of the conceptual problems of this kind of program is that the knowledge of how to do things is split up, between the rules on the one hand and the subroutines invoked BY the rules — the "do this, do that" part — on the other. Thus, Aaron2 should provide a more coherent representation of "how to do it" knowledge than its predecessor. Fourthly: the program's knowledge of drawing should include conceptual knowledge, at least to the degree that it should be able to particularize from general rules. I mean, for example, that it should not only know that there is a general class of things called closed forms, but should know about all the members of the class and be able to decide that one was more appropriate in a particular situation than another. Conversely, it should also be able to remember that it had used a closed form for some reason without necessarily having to remember which closed form it was. HOW TO MAKE A DRAWING
page 12
Fifthly: Aaron2 should be -treated as a potentially "late developer." I mean that it should be anticipated that the programmer would need to put in rather a lot of material by hand before the program would be ready to take off. But if the program is ever to take off, it should then be able to make use of the same mode of entering material as the programmer had, or at least be able to generate material in the same form. *********************************
And so things are working out. Aaron2 is still in its infancy and a very long way from becoming self-modifying. In order to support the long range need for building up the program's store of knowledge, early work on the program involved the writing of an editor, by means of which the programmer is able to build items of knowledge by hand. These items are, indeed, extremely compact: in memory they consist simply of sets of tokens, unique names. Once an item is accessed by the program, however, it is regenerated into a generalized tree structure, and the individual tokens are enacted. Perhaps this is a little abstract: what it means in practice is that the programmer, having written a set of subroutines that describe how a particular kind of closed form may be generated — let's call it a "shape," for example — uses the editor to implant in the program's memory the fact that it now knows how to generate these "shapes." At this point the memory item will consist of the single token "shape," together with a marker which identifies the token as the smallest unit of "how to do it" knowledge, which we will call a "system". Any time this item is accessed, the marker will cause the program to activate the generative subroutines to which the token refers, and a "shape" will be produced. Suppose now that the programmer writes another set of subroutines for adding a kind of appendage to a closed form — we'll call it a "base" this time — and uses the editor in the same way to implant another item of memory. Now, because of the way they are generated, "bases" can only be appended to closed forms, and it follows that in due course the programmer will want to add a rule to this memory item which will prevent it from being activated for any other purposes. For the moment, however, the programmer uses the editor to build another memory item, this one carrying a marker identifying it as a figure — not simply a system — which has, as they say in computer-talk, two children, each of which is a system. The first system is the token "shape," while its sibling is the token "base," and in implanting this more complex item in memory, the editor will create a token by which the item will henceforth be known: it is civilized enough to make it pronounceable, if not sensible. This single token is now all that is required to generate a figure consisting of a shape and a base: or a shape with a whole series of bases, actually, since HOW TO MAKE A DRAWING
page 13
repeat factors may be assigned to any token in a memory item. It is not difficult to see how the editor may be used to create groups of figures, each of which will have systems as children, and pictures, which will have groups as its children, each of which will have figures as its children, each of which will have systems as its children, each of which may have other systems as its children, and so on. Thus, by the time the programmer has been working for a short time, the program will have in its memory, not merely a number of items, but items of different levels of complexity. If we look at the items in detail, moreover, it will be seen that they do not simply exist in isolation. Each item may have within it what we will call a HASA list, which will define the sets of which this item is a member, an ISA list, which defines the item's properties, and an "ASSOCiation" list, in addition to its RULE list. If the programmer, in creating the system "shape," had declared that a "shape" ISA closed-form, then the editor would automatically have created a new "closed-form" item with a "concept" marker — assuming that one hadn't existed already — and would have entered "shape" in its HASA list. Similarly, the programmer may have created a concept item by hand. In either case the assertion of an ISA association will cause the automatic generation of a HASA association in the appropriate item. This facility is completely general, so that eventually the program may know that one system is an example of a curvilinear closedform while another is an example of a rectilinear-closed form, both of these sets being members of the superset "closed-forms," while this, in its turn, may be a member of the set "formsuseable-for-the- depiction-of-solid-objects." This is what will allow the program both to generalize and to particularize, and to substitute one member of a set for another. It is also this mechanism which will permit what I referred to earlier as inheritance: the application of a rule belonging to a class to any member of that class. The ASSOCiation list functions as a linking mechanism of a much more general kind, and is intended to allow the modeling of just what the name implies: those connections of items in human memory which may be extremely strong, though without necessarily having any very obvious reasons for existing. As I have said, Aaron2 is now in its infancy. It has in its memory no more than about twenty items, three or four of which represent complete pictures: or, more precisely, classes of pictures, since the same item could be enacted a thousand times without ever producing the same drawing twice. Most of the things it knows how to make are readily discernable in its drawings, and once you know what you are looking for it is obvious how few things it knows how to do: far too few to move to the next major step. HOW TO MAKE A DRAWING
page 14
That step will involve providing Aaron2 with a number of criteria, which it will be able to apply to its own performance. Suffice it, for the moment, to say that these criteria will reflect what I think of as cognitive constants, and that the program will judge the enactment of any item of memory by how closely it has matched one or another of these constants: or, to put it more simply, how "like" the visual field the current drawing is. Having generated a closed form, for example, it may judge that its outline is quite short in relation to its area, implying that the form is not yet complex enough to "match" the structure of the visual field. In that case it will be able to make use of any of the links it has to traverse memory in search of something it knows how to do which will add to the complexity of the figure and better satisfy this particular criterion of complexity. If it succeeds in doing so, it will have learned how to do something it hadn't known how to do previously, and, using the same editor that built its memory in earlier days, it will commit to memory this new piece of knowledge. You will see why I insisted that a program like this would need to know a great deal before it is ready to be let loose. Once it is let loose, my guess is that it will develop quite rapidly, and I am prepared to believe that in a short time its drawings will be unpredictable, not in the simple sense that Aaron1 's drawings were unpredictable, but in the more profound sense that they were produced by a program which had changed since it was written. What will its drawings be like? Obviously, I can't know in detail, though I think I would be quite surprised if Aaron2 generated a Leonardo. Will they be wonderful? Will they become so unlike the externalizations of the human mind that they cease to function as those cultural artifacts we call works of art? Who can tell. But I am preparing now to devote some years to finding out.
HOW TO MAKE A DRAWING
page 15
Off the shelf Harold Cohen Visual Arts Department, University of California, San Diego, La Jolla, CA 92093, USA
From time to time someone will ask me what I call what I do. Do I call it computer art? No, I say, I do not call it computer art. I don't want anything to do with computer art. Why is that? I am asked. Because computer art seems very reactionary to me, very old-fashioned. Because it hasn't changed in any fundamental way in twenty years and that seems to me a sure sign of infertility. Because I've rarely seen any that I didn't find simple-minded and boring. Because so much of it is done by the technology and not by the artist. Because... (when I get started on this particular topic I've been known to go on for hours.) But if what you do isn't computer art, they say, what do you call it?
You have to call it something. No, I do not have to call it something, I say; I have to do it. Well, they say, that's ok for you, but what do we call it? Why ask me? I say; call it what you like. Well, then, they say, we'll call it computer art. I rather suspected you would, I say. Occasionally someone asks me what it is that I do: then I do my best to tell them what I do. The trivial reason for naming something is to let you know where to put it. The serious reason for naming something is to tell you where you can't put it. Serious naming implies a prior act of discrimination. You begin by noticing that A is unlike B, not in the sense of marginal differentiation but in a way that demands
Fig. 1. Free standing screen wall. Buhl science Center, Pittsburgh, Pennsylvania, 1984 The Visual Computer (1986) 2 : 191 - 194 c Springer – Verlag 1986
attention because the existence of A challenges how you think about all the B's. (Marginal difference is the difference between two boxes of Uniquely Wonderful Cereal on the Wonderful Cereal shelf in the supermarket.) If you accept the challenge to your beliefs which this act of discrimination implies, you may be in a position to summarize it in a single word. Or you may need to write a book. In either case the activity is called criticism. That computer art has lacked criticism almost completely is perhaps the most important reason why I don't want anything to do with it. Computer art exhibitions are like mail-order catalogs: everything marvelous, everything up-to-the-minute or just dressedup, and nothing ever presented or discussed, under any circumstances, in terms of its significance. W does polynomial equations and X does rotating polygons and Y does abstract expressionist paintings with an electronic paintbox and Z's got hold of a tame Solids Modeling programmer and is doing Solids Modeling Art. So? So nothing.
Fig. 2. Mural. Ontario Science Center, Toronto, Ontario, 1984
191
Fig. 3. Computer generated black and white drawing, 1985
Fig. 4. Hand colored computer generated drawing, 1984
Fig. 5. Hand colored computer generated drawing, 1985
Fig. 6. Computer generated drawing. Laser print with colored pencil, 1986
Fig. 7. Computer generated drawing. Laser print with colored pencil, 1986
Fig. 8. Computer generated Figure study. Laser print, 1986
Everything exactly equivalent to everything else and all neatly stacked on the Computer Art shelf in the supermarket.
has failed to stir the imagination of any part of the serious art community. Come to think of it, I've never met a computer artist who didn't think that most of computer art has been extremely dreary. In fact what I've never been able to understand is why computer artists want anything to do with each other. They must all be lonely. Better a spot on a shelf identified with all the other innocuous breakfast cereals than no place - no name! - at all. Personally, I would prefer to be placeless and nameless.
Shall we blame it on the critics? Not so fast! Critics may be as lazy as the rest of us about coming up to speed on things they don't know about but they make their livings writing about things they find exciting and the things they perceive the rest of the community to find exciting. Computer art hasn't merely failed to stir the imagination of serious critics, it
192
For several years now voices on the telephone have been explaining to me that I owe it to the computer art community to exhibit with them: after all, I am told, we all believe in the same things, don't we? No, I try to explain, we do not believe in the same things, we are not on the same side, I do not owe it to anyone to go sit on their shelf. One of the things concerning which our beliefs differ profoundly is the desirability of sitting on shelves with big labels to tell people what kind of cereal you are. But I do use
computers, don't I? I do. I also pick my nose when no-one is looking, try with only moderate success to keep my weight down and tend to snore if I sleep on my back. I do not assert fellowship with other nose-pickers, weight-watchers or snorers. There are important similarities and there are trivial similarities. My work has no fellowship with polynomial equations, rotating polygons, abstract expressionism - with or without a computer or solids modeling. Leave me alone. I don't owe you anything. Let me tell you what I believe. Let me tell you what I do. Let me tell you, in the first place, that those two questions are extremely closely linked for me: not simply by virtue of the fact that I do the things I do because I believe the things I believe, but because the doing of those things involves the examination and the modification of my beliefs. Art is one of the ways the individual has of bringing belief under scrutiny and under the authority of his/her intelligence. Belief which is not brought under the authority of the individual's intelligence is dogma, prejudice. I have always felt that if you want to test the validity of a belief, you bash it up against a wall as hard as you can to see if it breaks. I write programs which are intended to throw some light upon what people do, in a cognitive sense, when they make images: not upon what their images look like. Art is a series of acts, not a series of objects. From which it follows that nobody ever made original art - with or without a computer - by mimicking the appearances of existing original art.
My programs function as models of the things people do - the things I believe they do, that is to say - when they make images. The way the programs perform tells me something about the plausibility of the things I believe. I came to computing in 1968, after twenty years as a painter. In career terms I was doing just Fine as a painter, and in those same career terms a painter had to be crazy to get involved in computing in 1968. I will not pretend that I had any very clear idea why I chose to embark upon this particular craziness, or what I expected of it. I have never believed that the artist has any a priori obligation to be on the culture's technological cutting edge and I am not particularly interested in machines. What I did know was that painting was no longer providing me with a hard enough wall against which to bash my beliefs. I suppose I sensed - not more than that, but that was enough that programming would provide a harder one: and it has. I do not believe there is any other worthwhile thing I get from the computer that I really need. In the early days my work was limited to the modeling of a small subset of cognitive "primitives:" closure, repetition, figure-ground and a few things of that sort. Much of what gets written about my program, AARON, still discusses it in those terms, as if nothing has happened in the past decade. I see AARON as a single program, not as a series of different programs: but it is a program that has gone through several stages of something corresponding to human cognitive development, so that it currently has a relationship to the AARON of
1972 analogous to that of an adult to a small child. (Where AARON could make a drawing in about two minutes on a PDPII/45 Five or six years ago, today it takes all of twenty minutes on a MicroVaxII with 5 Megabytes of memory and 200 Megabytes of disk.) The cognitive "primitives" of the earlier work still stand as the basis of AARON's representational modes, just as the human cognitive apparatus provides the basis for the ways that people make representations. However, I was never able to identify more than a very small number of these "primitives," and it started to dawn upon me around 1983 that there was another determinant to the nature of cognition that I had not considered. The human cognitive apparatus develops in the context of a real world: so that in some sense cognition is the way it is because the world is the way it is. The result of considering this proposition is that, where the earlier AARON had been limited to knowledge of image-making strategies, the new AARON is more explicitly concerned with knowledge of the external world and the function of that knowledge in image-making. And it has a modest body of knowledge of its own about the outside world, as its drawings of 1985 and '86 demonstrate. AARON is an autonomous intelligent entity: not very autonomous, or very intelligent, or very knowledgeable, but very different, fundamentally different, from programs designed to be "just" tools. Electronic paint boxes, for example. And its use is equally different from the way computer artists use electronic paint boxes:
193
I don't work with the program, that is to say, I work on the program. The goal is the program's autonomy, not the making of a better - orthodox - tool. (I've been insisting publicly on the need to program for so many years now that it's time to insist upon something else: if all you want is a "better" version of a orthodox tool that exists already, don't bother.) I've been aware all along that my own work has barely scratched the surface of an array of potentially interesting and valuable ideas. Yet I am convinced that all of those potentials will necessarily face the same question that I have faced, because it is really the only question which differentiates the computer fundamentally from other tools. It is not, what can you do with a program, but, what can the program do? For the artist, the essence of the computer is its autonomy.
194
AARON is autonomous to the degree that it can generate original drawings in large numbers without my assistance or interference. It can call upon its knowledge of image-making - and, more recently, its knowledge of the world - to provide the basis for what it does. And it is smart enough to wiggle out of difficulties in much the way an intelligent human being would. It is not yet capable of self-modification: it is not able, that is to say, to modify its given knowledge on the results of its own experience. I look forward to the day when AARON will surprise me with its drawings: not in the simple sense that it does something I had not anticipated when I wrote the program, but in the more profound sense that it does something which could not have been done by the program as I wrote it. It will take some time to do that, but it has to come.
No, I do not have a name for what I do and I don't feel any the worse for it. I am reasonably sure that if I had one to offer I would see it subverted into a supermarket label before I could turn around. I've noticed that the computer art telephone callers are starting to profess a deep involvement with Artificial Intelligence: I surely cannot deny my fellowship with that, can I? Oh, but I can! I know where I stand with respect to Artificial Intelligence. I also know the difference between a name that differentiates and a label that prevents differentiation. And I know a supermarket shelf when I see one. If you think you can do anything worthwhile sitting on a supermarket shelf whatever the label says - go for it. I'll just keep doing what I do.
HOW TO DRAW THREE PEOPLE IN A BOTANICAL GARDEN.
Harold Cohen The University of California at San Diego Department of Visual Arts, La Jolla, Ca 92093
Abstract AARON is a program designed to investigate the cognitive principles underlying visual representation. Under continuous development for fifteen years, it is now able autonomously to make "freehand" drawings of people in garden-like settings. This has required a complex interplay between two bodies of knowledge: objectspecific knowledge of how people are constructed and how they move, together with mor-
phological knowledge of plant growth: and procedural knowledge of representational strategy. AARON's development through the events leading up to this recently-implemented knowledgebased form is discussed as an example of an "expert's system" as opposed to an "expert system." AARON demonstrates that, given appropriate interaction between domain knowledge and knowledge of representational strategy, relatively rich representations may result from sparse information.
Figure 1: AARON drawing, 1987
1 Preamble Brother Giorgio is a 12th Century scholar-monk whose task it is to record what is known of the world's geography, and he is currently making a map of Australia, an island just off the coast of India. Since an essential part of map-making involves representing the animals of the country, he is making a drawing of a kangaroo. Now Brother Giorgio has never seen a kangaroo. But he understands from what he has been told that the kangaroo is a large rat-like creature with a pouch, and with an exceptionally thick tail. And he draws it accordingly (figure 2a).
Figure 2c
Figure 2a
While he is so engaged, a traveler visits the monastery, and he tells Giorgio that his drawing is wrong. For one thing - and Giorgio finds this quite implausible - the kangaroo doesn't carry a pouch - its pouch is part of its belly! And, says the traveler, it doesn't go on all fours: it stands upright, on rear legs much bigger and thicker than the front legs (figure 2b).
Figure 2b
And the tail doesn't slick straight out, it rests on the ground. Giorgio completes all the necessary changes, and the traveler assures him that though he hasn't got it quite right, it's close (Figure 2c). AARON, late in the 20th Century, is a knowledge-based program that is capable of the autonomous generation of original "freehand" drawings, like the one in Figure 1. Like Brother Giorgio, AARON has never seen the things it draws.
It, too, is obliged to rely upon what it is told. Unlike Giorgio, however, it cannot make use of associative knowledge. There would be no point in telling it that a kangaroo looks a bit like a rat, for example, not only because it doesn't know about rats, but because it has never looked at anything. What both Giorgio and AARON make clear is that the plausibility of a representation does not rest upon the correctness of the knowledge it embodies. Indeed, for anyone lacking knowledge of marsupials, the "correct" knowledge of the kangaroo's pouch is at least as implausible as Giorgio's initial understanding. Nor does plausibility rest upon the completeness of that knowledge, since representations only ever represents what is being represented with respect to an arbitrarily small set of properties. Given one important proviso that the representation-builder has general knowledge about how to make representations - there would appear to be no lower limit of knowledge below which the making of a representation is impossible. And that proviso points to the main thrust of this paper, it will show AARON's visual representations to involve a spectrum of representational procedures, and a spectrum of different kinds of world knowledge. It will also show the degree to which the particular quality of those representations depends upon the intimate meshing of the program's world knowledge with its knowledge of representing. AARON has been under continuous development for nearly fifteen years now and it has gone through many generations. At fifteen it may well be the oldest continuously-operational expert system in existence and perhaps the only computer program to have had something akin to a life-story. But perhaps AARON would be better described as an expert's system than as an expert system: not simply because I have served as both knowledge engineer and as resident expert, but because the program serves as a research tool for the expansion of my own expert knowledge rather than to encapsulate that knowledge for the use of others. The goal of this research is to understand the nature of visual representation. The term should not be understood to imply the various mechanical methods - perspective, photog-
raphy, ray-tracing - by which two-dimensional transforms of a three-dimensional world may be generated. All of these arc knowledge-free, in the sense that the photographic process neither knows, nor needs to know, what is in front of the lens. AARON helps to address questions that are both more fundamental and more general. What do computer programs - and, paradigmatically, human beings - need to know about the external world in order lo build plausible visual representations of it. What kind of cognitive activity is involved in the making and reading of those representations? The making of representational objects - the drawings, paintings, diagrams, sketches in which representations are embodied - constitutes the only directly-examinable evidence we have of "visual imagining." I mean those internal cognitive processes that underpin and inform the making of representational objects, and which we all enjoy to some extent, whether or not we make representational objects. I assume that the reading of representations involves essentially similar processes. But making requires more than reading does. It requires a special body of knowledge - knowledge of representation itself - that is part of the expertise of the artist, just as the representation of a body of knowledge within an expert system requires an analogous expertise of the knowledge engineer. Understanding the nature of visual representation requires asking what artists need to know in order to make representational objects: what they need lo know, not only about the world, but also about the nature and the strategies of representation itself. AARON's role in this investigation, then, has been to serve as a functional model for a developing theory of visual representation. The stress is on the word "functional," for the most convincing lest of a theory of representation is the model's ability to make representational objects, just as the plausibility of a theory of art resides in art-making. AARON was last reported in detail in 1979, in the proceedings of UCAI-6, at which lime it was making drawings like that in figure 3. The differences in its output have been matched, of course, by large changes in the program itself. But these have been developmental rather than radical changes, following a pattern analogous lo that of human cognitive development, and AARON has retained its identity and its name throughout.
Figure 3: AARON drawing, 1979
Part of my purpose here is to describe the current state of the program. The other part is to account for its development. That means, necessarily, to describe the domain of interaction between program and programmer, to delineate the purpose that the one serves for the other. AARON has been a research tool for me, but also something very like an artist's assistant, capable always of enacting, without human aid or interference, the understanding of art embodied in its structure. And my relationship to the program has become increasingly symbiotic. Without AARON's sophisticated enactment of my own understanding, that understanding would not have developed as it did. Without that developing understanding AARON could never have become the sophisticated adjunct artist that it is. My long-held conviction that AARON could only have been written by a single individual has been based on rather vague suspicions of cultural incompatibilities existing between the disciplines of knowledge engineering and art. Now I believe, rather more precisely, that the problem - and, indeed, a fundamental limitation of expert systems - lies in the artificial separation of two bodies of knowledge, that of domain-expert on the one hand and knowledge-system-expert on the other.
2 Aaron: Early Versions In all its versions prior to 1980, AARON dealt with exclusively internal aspects of human cognition. It was intended lo identify the functional primitives and differentiations used in the building of mental images and, consequently, in the making of drawings and paintings. The program was able to differentiate, for example, between figure and ground, and insideness and outsideness, and to function in terms of similarity, division and repetition. Without any object-specific knowledge of the external world, AARON constituted a severely limited model of human cognition, yet the few primitives it embodied proved to be remarkably powerful in generating highly evocative images: images, that is, that suggested, without describing, an external world [Cohen, 1979]. This result implied that experiential knowledge, inevitably less than constant across a culture and far less so between cultures, is less a determinant to the communicabilily of visual representations than is the fact that we all share a single cognitive architecture. From the program's inception around 1973, I had been convinced that AARON would need to be built upon a convincing simulation of freehand drawing, and gave much attention to modeling the feedback-dependent nature of human drawing behavior. As a consequence of this stress the program was formulated, initially, largely in terms of line generation. Closed forms, those universal signifiers for solid objects, also were generated from rules directing the development of lines: rather like the way one might drive a closed path in a parking lot by imagining a series of intermediate destinations, veering towards each in turn and finally returning to one's starting point [Cohen, Cohen, Nii, 1984]. Following a paradigm we see exemplified in rock drawings and paintings all over the world, AARON observed a general injunction against allowing closed forms lo overlap each other, and would be obliged to modify its closure plans frequently in order lo prevent over-
~---------
lap. This resulted in a richer, less predictable, set of forms than the unmodified closure rule would have permitted. But underlying this richness was the fact that AARON had no prior spatial conception of the closed forms it drew: their spatial presence, their identity, was the result, not the cause, of an entirely linear operation. Throughout this phase of AARON's development, a constant sub-goal was to increase the range and diversity of its output And in 1980 this desire led to the development of a new generating strategy for closed forms. It had its basis in an attempt to simulate the drawing behavior of young children, specifically at that immediately post-scribbling stage at which a round-and-round scribble migrates out from the scribblemass to become an enclosing form (figure 4). It was while this work was in progress that a colleague expressed an interest in having AARON make "realistic," as opposed to evocative, drawings. Could it, for example, make a drawing of an animal?
Figure 5a: AARON, animal drawing
Figure 5b: AARON, animal drawing Figure 4
I must avoid here what would be a lengthy digression on the nature of realism. Let it suffice lo say that I look my colleague's words lo imply a visual representation of an animal, as opposed, say, lo a diagram. Since I had never drawn animals and had little idea about their appearances I thought it unlikely that I could oblige. What little knowledge I could place at AARON's disposal was barely sufficient to construct a diagrammatic stick figure: a representation, certainly, but not a visual representation. Now it happens that the "enclosing" stage of children's drawing is also the stage at which they begin to assign representational significance to their drawings. If this was more than a coincidence, I speculated, perhaps it would be possible lo generate an adequate representation by enclosing a stick figure the way a child encloses a scribble. It proved to be a good guess. On the first attempt the program's drawings showed a startling resemblance lo the rock drawings of the African Bushmen (figure 5a). Encouraged by the result I amplified the program's knowledge to take some account of the bulk of the animal's body, and the drawings shifted their stylistic affiliations lo the caves of Northern Europe (figure 5b) [Cohen, 1981].
In retrospect it seems obvious that the closed forms of these drawings would have produced a richer evocation of "real" animals than a diagrammatic stick-figure could. The clear differentiation of style that resulted exclusively from the change in the enclosed, subsequently invisible, diagram is more problematic. however. Perhaps "style" in art is less a question of autography than of what the artist believes to be significant. AARON was now potentially able to generate a large variety of geometrically complex closed forms without requiring geometrically complex descriptions. The gain was obvious enough to ensure to this new strategy a permanent place in AARON's repertoire, even without the goal of visual representation. From that point forward, all closed forms involved the initial construction of a "conceptual core," corresponding to the child's scribble, around which the form was "embodied" (figure 6). One important result of the new strategy was to shift the stress in AARON's drawing mode away from its initial linearity, yet the greater gain had less to do with the growth of AARON's formal skills than with its "cognitive" development. For the first time AARON now had some concept of what it was to draw before it began to draw it.
Which did not mean that AARON proceeded to draw realworld objects; on the contrary, the representation of realworld objects seemed as unnecessary to my research goals as it was inconsistent with my own history as an artist. The animals disappeared from AARON's repertoire and no further attempt was made at that lime lo apply the new strategy lo the representation of real-world objects. Yet even in the absence of real-world knowledge, the new cognitive mode endowed AARON's images with an increasingly "thing-like" presence that seemed lo demand an explicitly visual space in which to exist. Thus, for example, where the earlier versions of the program had avoided overlapping figures, occlusion now became a fundamental principle of pictorial organization. By 1984 the earlier "rock-art" pictorial paradigm had given way entirely. The pressure to provide real-world knowledge of what AARON's new visual space contained became inescapable and the first of several knowledge-based versions of the program was constructed (figure 7).
I do not intend by this account to imply some metaphysical force guiding AARON's development and my own hand. Nor is it necessary to do so. Every system has its own logic, and the need to follow the dictates of that logic, to discover where it will lead, may easily transcend the private inclinations of the investigator.
3 AARON: Recent and Current Versions I said earlier that the goal of this research is to discover what the artist needs to know about the world in order to make plausible representations of it: not correct representations, or complete representations, or realistic representations - none of these notions hold up very well under examination - but plausible representations. If I had asked how much the artist needs to know, the answer would have been that the question is hardly relevant: we make representations with whatever we know. Given adequate knowledge of the representational procedures themselves, there is virtually no lower limit of worldknowledge below which representation is impossible. The goal, rather, is to discover how representational structures represent what they represent: how we use what we know to build those structures. What does AARON represent, and how - by means of what structures - is it represented?
Figure 6: AARON drawing, 1983
Figure 7: AARON drawing, 1985
As the title of this paper suggests, AARON represents a small part of the flora and the fauna of the world, with a little geology thrown in: a tiny part of the whole of nature. Because plausibility does not rest upon how much the image-maker knows about the world AARON has never been provided with a large object-specific knowledge base - large, that is. in the sense of referring to many different objects. And because object-specific knowledge is also purpose-specific, no attempt has been made to give it knowledge that might be considered essential for representations of other kinds than its own and within other disciplines. Most particularly, its object-specific knowledge contains very little about appearances, and the program's overall strategy rests upon being able to accumulate several different kinds of non-visual knowledge into visually representable forms. This is not a neatly sequential process. As I will show, different knowledge is called into play at different stages of accumulation; the program's representational knowledge is not simply invoked as a final step. In the category of object-specific knowledge the program has five levels, each with its own representational formulism. At the first level is AARON's declarative knowledge. For example: a human figure has a head, a torso, two arms and two legs. A plant has a trunk, branches and leaf clusters. This declarative knowledge is represented outside the program itself in frame-like forms that are read in as they are needed. So, also, is knowledge of several pictorial "classes." A class is characterized simply by what elements may be used in a given drawing and - since AARON does not use an eraser - the front-to-back order in which they may be used. Thus AARON begins a new drawing by selecting a pictorial class, and proceeds by expanding each entry in the class hierarchically into an internal tree-structure, at the lowest levels of which are the management procedures responsible for the generation of individual elements of the drawing. There is, for example, a
"hand" manager whose sole task is to produce examples of hands on demand, to satisfy the specifications that are developed for it. The expansion of externally-held declarative knowledge into internal tree structure is done on a depth-first basis, and AARON does not know in advance what the current class will require at a later stage; and it may, in fact, over-ride the demands made by the class in favor of constraints that have developed within the drawing. A class is only minimally prescriptive; it will call for "some" trees or people, rather than two trees or three people, where "some" may be specified, for example, as any number between zero and four. Consequently the expansion is not deterministic. Decision-making is relatively unconstrained at the start of the drawing and, though it becomes increasingly constrained as the drawing proceeds, AARON randomizes unless it has some clear reason for preferring one thing or one action over another, as people do. All higher-level decisions arc made in terms of the state of the drawing, so that the use and availability of space in particular are highly sensitive to the history of the program's decisions. AARON's first and ongoing task, then, has to do with the disposition of its few objects in a plausible visual space.
3.1 The Nature of Appearances When I first provided AARON with the knowledge it would need lo make blatantly representational drawings, I reasoned that, since anything one sees through a window is as real as anything else, pictorial composition was hardly relevant lo the issue of plausibility. I assumed, therefore, that I could safely tall back upon the simplest, and perhaps the most universal, of compositional paradigms: put it where you can find space for it. And this paradigm, extensively used in AARON's twodimensional days, remained valid in its new world to the extent that three people in open view make neither a better nor a worse composition than five people hiding in the foliage. A fundamental problem emerged, however, centered or: the ambiguity of the word "where." Until recently AARON has never had a fully 3-dimensional knowledge-base of the things it draws: foreshortening of arms or the slope of a foot in the representation were inferred from AARON's knowledge of the principles of appearance, not by constructing the figure in 3-space and generating a perspective projection. And it happened too frequently in the program's first efforts at representation that people in the picture would stand on each other's feet (figure 8). I've been using the term "plausible representations" to mean representations that are plausible with respect lo appearance, and I must now consider what appearance means and what it implies. Appearance implies what the world looks like. It implies the existence of a viewer, and a viewpoint that controls the disposition of objects within the viewer's visual field. Since much of what the viewer sees is illuminated surfaces, it implies also some condition of lighting that controls visibility in some particular way. And since lighting is arbitrary with respect to the object itself it follows that the appearance of objects - as opposed, for example, to their structure, their mass or their dimensions - is a transitory characteristic. In order for appearance to imply specific knowledge of how particular objects look under particular and transitory lighting
Figure 8: AARON drawing, 1986
conditions, we would have lo be able lo store and retrieve, not merely "visual fragments," but complete "menial photographs." And that is surely not the general case. On the other hand, we can regard the way solid objects occlude each other, the way objects take less space in the visual field as they get further away, the way light falls on surfaces and so on, as a set of principles. In theory we should be able to infer a particular appearance by applying the principles of appearance to a particular surface description; that is exactly what the various strategies of "solids modeling" do. But the human mind is rather poor at inferring appearance, partly because it rarely has adequate surface descriptions available to it - we use appearance lo provide those descriptions, not the other way around - and partly because the human cognitive system makes use of a gamut of "cognitive perspectives" quite unlike the self-consistent geometries upon which solids modeling relies. One result is that in the one period of history when art has concerned itself explicitly with appearance - the western world since the Italian Renaissance - it has inferred the appearance of simple surface configurations, but has relied heavily upon direct observation for the depiction of complex surfaces. For example, the artists of the Renaissance used perspective in depicting objects with simple surfaces - buildings, tesselated floors - but almost never attempted to use perspective in depicting the human figure (figure 9). And, of course, solids modeling has balked at the surface complexity of the human figure for the same reason: the difficulty of providing adequate surface descriptions.
sketch, in which 2-D space is allocated less to the objects to be represented than to the space they occupy, and in which space is increasingly committed as those objects "congeal" simultaneously within both their actual 2-D space and their referenced, implied 3-D world (figure 10).
Figure 9: Fra Angelico, Annunciation, 1437
3.2 Pictorial Organization versus Dimensional Plausibility Figure 10. Eugene Delacroix 1862 "Resurrection".
Fortunately, the cognitive system provides a convenient shorthand for the representation of surfaces. Since the eye functions as a contrast amplifier we are able to detect the bounding edges of surfaces very efficiently, and we make heavy use of the behavior of those edges to provide information about the surfaces inside them. In using edges as the basis for a representational mode, then, much of the problem of surface illumination is bypassed. Plausibility rests upon the behavior of the edge, and upon issues that can be addressed in terms of edges: notably occlusion and spatial distribution. Actually, very little is required, in terms of occlusion and perspective, in drawing a single figure or a single plant. However, the need to represent objects plausibly with respect to other objects requires a significant level of control over 2space placement and the relative sizes of objects within the representation, and requires more extended control of occlusion. This is more complicated than it may seem. As I remarked, visual representation in general rests upon a complex of cognitive "perspectives," not upon the automatic and knowledgefree 2-space mapping of the visual field provided by photography, or its computer-based equivalent, solids modeling. So great is the difference between the cognitive view and the automated view that experienced studio photographers habitually use polaroid instant film while setting up a shot in order to find out what the world they can see clearly in three dimensions will look like in two. Evidently 2-space organization cannot be adequately predicted or controlled exclusively through control of the 3-space arrangement of objects. Nor, conversely, is it possible to guarantee plausibility with respect to physical dimensionality by concentrating exclusively on pictorial organization. In constructing visual representations the human artist appears to work under two simultaneously-active and mutually-exclusive constraint-sets. The "imaginational planning" that marks this mode is best evidenced by the artist's
At present, AARON uses only a crude, static model of this essentially dynamic process. While it organizes primarily in 2-space terms, it also maintains a floor plan of the "real" world it is depicting. Space for an object is allocated initially on the plane of the representation. It is then projected back into the "real" world, where it is adjusted to ensure valid 3-space placement, and then it is projected forward again into the representation. In doing this, perspective is used only to the degree of determining where the bases of objects - the feet of figures - will fall, and how high the objects will be, in the representation. It thus ensures that real-world objects are placed plausibly with respect to their ground-plane while doing very little about planning in 3-space terms. People no longer stand on each others' feet, but a genuinely dynamic model of this imaginational planning remains a goal for the program's future development.
3.3 Levels of Knowledge During the expansion process, a second level of knowledge exemplary knowledge - is invoked to provide fuller specification for the management procedures. The determination that this particular figure will have a large head and long aims, for example, involves applying the descriptors "large" and "long" plausibly to a set of prototypical dimensions held in table form within the program. The further determination that this figure will hold a particular posture, requiring its right aim to be extended horizontally and its right hand lo be pointing, will require three further levels of amplification before an adequate specification can be generated. First: the figure is articulated, and AARON has to know where the articulations are (structural knowledge). Second: it must know what the legal range of movement is at each articulation (functional knowledge). Third, and most important, since a coherently-
articulated figure is more than a random collection of legal movements, there has to be knowledge of how a figure behaves; how it keeps its balance and how it gestures.
3.4 From Stick Figure to Solid Figure
AARON's knowledge of plants follows similar patterns of distribution. AARON understands plant morphology in terms if branching, limb thickness with respect to length and branching level, the clustering patterns in leaf formations, the size of the plant, and so on. It has no stored descriptions of particular plants, and its entire plausible flora is generated by the same small set of management procedures, through the manipulation of these morphological variables (figure 11).
Note, however, that these specifications contain no reference to appearances, and that they suffice only to inform the production of plausible stick-figure representations. Where the expression of object-specific knowledge spans a range of forms from conceptual to dimensional, the expression of visually representational knowledge requires visual, twodimensional terms, and the stick-figure, is, in fact, the first visual manifestation of AARON's amplified knowledge. All postural issues are determined in relation lo it alone.
Figure II: AARON drawing, 1987
The organization of AARON's object-specific knowledge is thus a five-tiered structure in which each successive level is accessed at the appropriate lime for what it can add to the whole. Broadly speaking these levels span a spectrum of knowledge types from wholly declarative and external lo wholly procedural and internal, and the program proceeds from the general, where it manipulates conceptual tokens like "hand," "pointing," "large," "some," to the specific, detailed and plausible instantiation of these tokens.
In the second stage, the slick-figure is used as an armature upon which to build a more extensive framework. This is the stage at which the exemplary knowledge of the thickness of the parts is invoked. The lines of this framework do not represent the external surfaces of the figure. They arc loosely associated with musculature and skeletal features - for example, the single line representing the hip-to-hip axis in the slick figure is expanded into a diagrammatic pelvis - but their primary function is to guarantee sufficient bulk to the figure in
whatever posture, and from whatever position, it is viewed. With the completion of this stage AARON has provided itself with the conceptual core of its representation, similar functionally to the young child's scribble. And it is around this conceptual core, in the thud and final stage, that the figure is embodied (Figure 12 shows an incomplete core taken from a current, fully 3-D version of the program).
evolved in the self-overlapping folding of outlines that convey so much about the appearance of complex threedimensional forms. Secondly, AARON knows what it is drawing, and it associates some particular degree of carefulness with the delineation of any particular element. This knowledge is expressed in the use of an additional feedback parameter: the distance from the core at which the path will be developed. Thus, for example, it will draw a thigh rather loosely - that is, at some distance from the conceptual core and with a relatively low sampling rate - while it will draw a hand close to the core and with a high sampling rate. Both of these are controlled by the placement and the frequency of the intermediate destinations around the marked-cell mass. AARON further adjusts its own sampling rate and correction with respect to the size of the element it is drawing relative to the size of the entire image.
Figure 12: partial core figure
Embodying involves generating a path around each of the parts of the conceptual core. These are taken, as the elements of the drawing are, in closest-first order. Part of the internal representation of the drawing that AARON maintains for itself nsists of a matrix of cells onto which are mapped the lines and the enclosed spaces of the drawing. Thus the conceptual core is now recorded as a mass of marked cells, to develop a path around which AARON uses what is, in essence, a simple maze-running procedure. However, its implementation rests heavily upon the fact that AARON draws, as the human artist does, in feedback mode. No line is ever fully planned in advance: it is generated through the process of matching its current state to a desired end state. As with any feedbackcontrolled system, AARON's performance is characterized by its sampling rate and by how radically it corrects. This part of the program most intimately determines AARON's "hand", and it has not changed greatly since the program's earliest versions. Unlike the earlier versions, however, the strategy for "imagining" the intermediate destinations around its path depends upon two things. Firstly, upon its ability to recognize and to deal with a number of special-case configurations in the core figure (figure 13a. b). These - and most particularity a configuration indicating a sharp concavity - are intimately
Figure 13a: strategy for concave configuration
Figure 13b. Edouard Manet 1862 “Study for a Woman at her Toilet”.
4 Conclusion In practice AARON makes drawings of whatever it knows about without requiring any further instructions for the making of a particular drawing — and, indeed, without possessing any mechanism through which it could take instructions. To the degree that it does nothing much more than enact what it knows, AARON provides an intuitively satisfying model of "visual imagining," in that it permits the expansion of relatively sparse real-world, object-specific knowledge into a convincing representation of a visual experience. I have described AARON's knowledge as falling into two broad categories: what it knows about a small range of world objects and what it knows about building visual representations. And I have proposed that these two categories must be intimately inter-related in any satisfactory model of human knowledge-based performance. The conclusion is an obvious one: we can only represent what is representable in terms of available representational strategies. I have no doubt, for example, that the program's development has been profoundly determined by the fact that it has been written in 'C' rather than in LISP. AARON's representational strategy, deriving as it does from the young child's relatively undifferentiated perceptions of the world, is well adapted to the representation of blob-like forms, or forms with a strong axis - heads and limbs, for example. Yet AARON is unable to deal with cubelike objects, the perception of which rests upon- high contrast edges in the center of a form as well as at its extremities. AARON will need new representational strategies, not merely more object-specific knowledge, before it can present a new view of the world, just as the young child is obliged to develop new strategies before it is able to put the sides on its representations of houses (figure 14).
aesthetically satisfactory, we must surely question the relevance of those principles to artistic production. This is not to say that AARON does not embody principles of its own, but that whether these are aesthetic principles is largely a matter of definition. I have to assume that the simple "findenough-space" rule to which I referred earlier contributes to the aesthetic appeal of the outcome, but it is quite different in kind from the aesthetic rules commonly believed to guide the production of works of art The fuller answer is that I regard "style" - surely the most difficult word in the entire vocabulary of art-as the signature of a complex system. I regard the aesthetics of AARON's performance as an emergent property arising from the interaction of so many interdependent processes, the result of so many decisions in the design of the program, that it becomes meaningless to ask how much any one of them is responsible for the outcome. If AARON has maintained a consistent aesthetic, a consistent identity, from its earliest endeavors, I have to assume it to reflect consistent patterns of my own in determining its development. If someone else wrote a similar program I would expect it to exhibit a different identity and a different aesthetic. * That answer would be begging the question, if the point of the question was to consider how an orthodox expert system might be built to generate objects of high artistic value. That isn't the point: given the orthodox separation of domain 'knowledge from representation knowledge, I do not believe it will be in the foreseeable future. This is one place where it seems not to be true that two heads are better than one.
References [Cohen, 1979] Harold Cohen. What is an image? Proceedings of UCAJ.6, Tokyo, 1979. [Cohen, 1981] Harold Cohen. On the Modeling of Creative Behavior. Internal Paper for the Rand Corporation. 1981. [Cohen, Cohen, Nii, 1984] Harold Cohen. Becky Cohen and Penny Nii. The First Artificial intelligence Coloring Book. William Kaufman. 1984.
Technical Note
Figure 14: childrens' drawings
Finally: I have claimed for AARON only that it makes plausible representations, and have left aside the consensus judgement that its drawings represent a high level of artistic accomplishment. Why have I had nothing to say about "aesthetic" principles like harmony and balance? The short answer is that AARON is entirely unaware of the existence of those principles, and that since its drawings are
While the earliest versions of AARON were built as production systems, all the more recent versions have been strongly object-oriented, as the above discussion might indicate. The program has about 14,000 lines of 'C' code and occupies almost a half-megabyte of compiled code, exclusive of external declarative knowledge structures and the internal representations of the developing drawing it makes for its own use. The most recent version was written under UNIX on a MicroVax-II, on which machine a single drawing takes about an hour of CPU-time, and has been ported lo several other UNIX machines. AARON has been developed largely on machines given by the Digital Equipment Corporation. Recent work was funded in part by a grant from the Kurzweil Foundation. Paul R. Cohen provided valuable help and advice on the writing of papers.
*The conclusion of this talk as given differed from the version printed in the proceedings. What follows is the conclusion as delivered.
I have characterized AARON as an expert's system as opposed to an expert system. In fact, it satisfies all the formal requirements of a successful expert system also. Productivity has been enhanced beyond any possible human capability; at a single exhibition at the Tate Gallery in London the program 'made, and I sold, a thousand original drawings. And to the degree that a thousand people were able to acquire original works of art for twenty-five dollars. it might even be said to satisfy the economic component required of expert systems. though in this case the wealth was distributed rather than accumulated. But it is surely obvious that increased productivity is not the point Ten drawings serve as well as a thousand, provided that those ten drawings arc wonderful, and that their making has served to enhance the understanding of their maker, to push back the boundaries of the individual's conceptual world and those of his audience. The difference between an expert's system and an expert system is that the one enhances the creativity of the expert, the other enhances the productivity of the nonexpert. Without that enhanced creativity, "more of the same" is a dismal and dangerous call-to-arms. It generates the illusion of increased choice while restricting choice. "The customer can have any color he wants, so long as it's black" said Henry Ford, and I think of the benefits of increased productivity every morning and evening sitting on the freeway on my way between home and work. Let me conclude by pushing this line of reasoning one stage further. I do not doubt the material and economic benefits that will accrue from what we do, or-to a lesser degree - the social benefits that will follow
from them. Much of what has been said at this conference has been directed to the goal of increasing the power of the computer and increasing those benefits. But unless I am much mistaken, the anticipated changes in the power of the machine are trivial alongside the changes that will take place within the human animal as a direct result of the increasing power of the machine. I believe that the computer and what we are doing with it constitutes an agent for evolutionary change, and that we are, indeed, now at the beginning of a significant evolutionary process. Of course, that implies a tremendous responsibility resting upon us, as architects, albeit ignorant and unwitting architects, of humanity future. And from that perspective I could not help but feel a deep disappointment that in an otherwise splendid talk. full of wisdom and obviously deriving from deeply humanistic preoccupations. Raj Reddy failed to list among his goals for the future one single area of those human endeavors by which human cultures have always been judged. Surely we are all aware that more people know the name of Dante than have ever heard of Fibonacci; that Bach has given more joy lo more people than Isaac Newton did; and that Cezanne and Monet will be remembered long after Brunei's bridges have collapsed and Riemann has been forgotten. I certainly do not believe that we will meet our responsibilities to the future by writing expert systems for artists, even if we knew how to do it. I do believe that figuring out how we are to meet them, figuring out how Al is to encompass more of human life and human needs than can be measured in economic terms, constitutes the greatest challenge of all to the field.
This is the year 1300. Brother Giorgio, scholar-monk, has the task of making a map of Australia, a big island just south of India. Maps must record what is known about the places they represent, and Giorgio has been told about a strange Australian animal, ratlike, but much bigger, with a long thick tail and a pouch. He draws it, and it comes out like this:
Harold Cohen Brother Giorgio's Kangaroo
Harold Cohen is a professor in the Visual Arts Department at the University of California at San Diego. In recent years he and AARON have been shown at the San Francisco Museum of Modern Art, the Stedelijk Museum in Amsterdam, the Tate Gallery in London, the Brooklyn Museum, the Ontario Science Center, and the Boston Science Museum. He has lectured widely on the subject of AARON, and AARON has reciprocated by providing Cohen with several thousand drawings, including designs for a series of mural projects. Cohen describes their relationship as symbiotic. Cohen is regarded as a pioneer in the application of Al to the visual arts.
A year later a world traveler is visiting Giorgio's monastery, and he tells our cartographer that he has the animal wrong. For one thing, it isn't carrying a pouch; the pouch is actually part of its belly. (Mercy! says Giorgio.) For another, it doesn't walk on all fours like a rat but on its hind legs, which are much bigger than its front legs. Giorgio redraws his picture:
But the tail rests on the ground. Giorgio tries once more. The traveler screws up his face in concentration, his eyes closed. I don't think that's quite right, he finally says, but I guess it's close enough.
The year is 1987. AARON, a computer program, has the task of drawing some people in a botanical garden—not just making a copy of an existing drawing, you understand, but generating as many unique drawings on this theme as may be required of it. What does it have to know in order to accomplish such a task? How could AARON, the program, get written at all? The problem will seem a lot less mystifying, though not necessarily less difficult, if we think of these two stories as having a lot in common. AARON has never seen a person or walked through a botanical garden. Giorgio has never seen a kangaroo. Since most of us today get most of our knowledge of the world indirectly and heavily wrapped in the understanding of other people from grade school teachers to television anchor persons, it should come as no surprise that a computer program doesn't have to experience the world itself in order to know about it. How did Giorgio know about kangaroos before the visitor started to refine his knowledge? He had been told that the animal was ratlike, but how much good would that have done him if he had never seen a rat? For people, the acquisition of knowledge is cumulative, as it clearly has to be. Nothing is ever understood from scratch. Even the new-born babe has a good deal of knowledge "hard-wired" before it starts. And when we tell each other about the world, it isn't practical or even possible to give a full description of something without referring to some thing else. That's as true for computer programs as it is for people. There is an important difference, though. For people, knowledge must eventually refer back to experience, and people experience the world with their bodies, their brains, their reproductive systems, which computers don't have.
Harold Cohen, computer artist. (Photo by Lou Jones)
Athletes, a hand-colored, computergenerated drawing by Harold Cohen. (Photo by George Johnson)
From the Bathers series of handcolored, computer-generated drawings by Harold Cohen. (Photo by George Johnson)
With this in mind, we might guess that AARON's knowledge of the world and the way AARON uses its knowledge are not likely to be exactly the same as the way we use what we have. Like us, its knowledge has been acquired cumulatively. Once it understands the concept of a leaf cluster, for example, it can make use of that knowledge whenever it needs it. But we can see what plants look like, and AARON can't. We don't need to understand the principles that govern plant growth in order to recognize and record the difference between a cactus and a willow tree in a drawing. AARON can only proceed by way of principles that we don't necessarily have. Plants exist for AARON in terms of their size, the thickness of limbs with respect to height, the rate at which limbs get thinner with respect to spreading, the degree of branching, the angular spread where branching occurs, and so on. Similar principles hold for the formation of leaves and leaf clusters. By manipulating these factors, AARON is able to generate a wide range of plant types and will never draw quite the same plant twice, even when it draws a number of plants recognizably of the same type. Interestingly enough, the way AARON accesses its knowledge of plant structure is itself quite treelike. It begins the generation of each new example with a general model and then branches from it. "Tree" is expanded into "big-tree/small-tree/shrub/grass/flower," "big tree" is expanded into "oak/willow/avocado/wideleaf" (the names are not intended literally), and so on, until each unique representation might be thought of as a single "leaf," the termination of a single path on a hugely proliferating "tree" of possibilities. Obviously, AARON has to have similar structural knowledge about the human figure, only more of it. In part, this extra knowledge is demanded by AARON's audience, which knows about bodies from the inside and is more fussy about representations of the body than it is about representations of trees. In part, more knowledge is required to cope with the fact that bodies move around. But it isn't only a question of needing more knowledge; there are three different kinds of knowledge required—different, that is, in needing to be represented in the program in different ways. First, AARON must obviously know what the body consists of, what the different parts are, and how big they are in relation to each other. Then it has to know how the parts of the body are articulated: what the type and range of movement is at each joint. Finally, because a coherently moving body is not merely a collection of independently moving parts, AARON has to know something about how body movements are coordinated: what the body has to do to keep its balance, for example. Conceptually, this isn't as difficult as it may seem, at least for standing positions with one or both feet on the ground. It's just a matter of keeping the center of gravity over the base and, where necessary, using the arms for fine tuning.
We started by asking what AARON would need to know to carry out its task. What I've outlined here constitutes an important part of that necessary knowledge, but not the whole of it. What else is necessary? Let's go back to Giorgio. Has it struck you that whatever Giorgio eventually knew about the relative sizes of the kangaroo's parts and its posture, he had been told nothing at all about its appearance' Yet his drawings somehow contrived to look sort of like the animal he thought he was representing, just as AARON's trees and people contrive to look like real trees and real people. That may not seem very puzzling with respect to Giorgio. In fact, it may seem so unpuzzling that you wonder why I raise the issue. Obviously, Giorgio simply knew how to draw. I suspect that most people who don't draw think of drawing as a simple process of copying what's in front of them. Actually it's a much more complicated process of regenerating what we know about what's in front of us or even about what is nor in front of us: Giorgio's kangaroo, for example. There's nothing simple about that regeneration process, though the fact that we can do it without having to think much about it may make it seem so. It is only in trying to teach a computer program the same skills that we begin to see how enormously complex a process is involved. A hand-colored, computer-generated drawing of figures and trees with rocks in the foreground, by Harold Cohen. (Photo by Linda Winters)
Black and White Drawing, a computer-generated drawing of figures and trees, by Harold Cohen. (Photo by Becky Cohen)
How do humans learn to draw? To some degree, obviously, we learn about drawing by looking at other peoples' drawings. That's why we are able to identify styles in art, and why most of the drawings coming out of Giorgio's monastery would have had a great deal in common and be distinguishably different from, say, the drawings made in a Zen Buddhist temple in Japan. At the same time, all children make very much the same drawings at any one stage of cognitive development without learning from each other or from adults. They don't need to be told to use closed forms in their drawings to stand for solid objects, for example. That equivalent is universal; all cultures have used closed forms to stand for solid objects. In short, knowledge of drawing has two components. Giorgio learned about style, about what was culturally acceptable and what was not, from his peers. But before cultural considerations ever arise, drawing is closely coupled to seeing—so closely coupled that we might guess all major visual modes of representation in human history to have sprung directly from the nature of the cognitive system. So Giorgio never had to be told how to draw or how to read drawings. He could see. He had to be told about kangaroos, not about how to draw kangaroos. Knowledge of drawing isn't object specific; if Giorgio
could draw a kangaroo, he could also draw an elephant or a castle or an angel of the Annunciation. If one can draw, then anything that can be described in structural terms can be represented in visual terms. That generality suggests that rather than thinking of knowledge of drawing as just one more chunk of knowledge, we should think of it as a sort of filter through which object-specific knowledge passes on its way from the mind to the drawing. Like Giorgio, AARON had to be told about things of the world. Unlike Giorgio in having no hard-wired cognitive system to provide a built-in knowledge of drawing, it had to be taught how to draw as well, given enough of a cognitive structure (the filter just referred to) to guarantee the required generality. If provided with object-specific knowledge, AARON should be able to make drawings of those objects without being given any additional knowledge of drawing. AARON's cognitive filter has three stages, of which the first two correspond roughly to the kinds of knowledge described above in relation to the human figure: knowledge of parts, articulation, and coordination. The third stage generates the appearance of the thing being drawn. Neither of the first two stages results in anything being drawn for the viewer, though they are drawn in AARON's imagination, so to speak, for its own use. First AARON constructs an articulated stick figure, the simplest representation that can embody what it knows about posture and movement. Then around the lines of this stick figure it builds a minimal framework of lines embodying in greater detail what it knows about the dimensions of the different parts. This framework doesn't represent the surface of the object. In the case of a figure, the lines actually correspond quite closely to musculature, although that is not their essential function. They are there to function as a sort of core around which the final stage will generate the visible results. Quite simply, AARON draws around the core figure it has "imagined." Well, no, not quite so simply. If you look at one of its drawings, it should be clear that the final embodying stage must be more complicated than I have said if only because AARON apparently draws hands and leaves with much greater attention than it affords to thighs and tree trunks. AARON's embodying procedures are not like the preliminary edge-finding routines of computer vision, which respond to changes in light intensity without regard to what caused them. AARON is concerned with what it is drawing and continuously modifies the performance of this final stage with respect to how much knowledge has already been represented in the core figure. The greater the level of detail already present, the more AARON relies upon it and the closer to the core the embodying line is drawn. Also, greater detail implies more rapidly changing line directions in the
core, and AARON ensures a sufficiently responsive embodying line by sampling its relation to the core more frequently. Nothing has been said here about how AARON's knowledge of the world is stored internally, about how its knowledge of drawing is actually implemented, or about its knowledge of composition, occlusion, and perspective. AARON's success as a program stands or falls on the quality of the art it makes, yet nothing much has been said about art and nothing at all about the acculturated knowledge of style, for which its programmer, like Giorgio's monastic peers, must admit or claim responsibility. All the same, there are interesting conclusions to be drawn from this abbreviated account. It should be evident, for example, that the knowledge that goes into the making of a visual representation, even a simple one, is quite diverse. I doubt that one could build a program capable of manipulating that knowledge and exhibiting the generality and flexibility of the human cognitive system other than by fashioning the program as an equivalent, artificial cognitive system. If nothing much has been said about art, it is because remarkably little of the program has anything to do with art: it constitutes a cognitive model of a reasonably general kind, and I even suspect that it could be adapted to other modes without too much distortion. But the lack of art specificity isn't as puzzling as it may seem at first glance. The principal difference between artists and nonartists is not a cognitive difference. It is simply that artists make art and nonartists don't.
THE FURTHER EXPLOITS OF AARON, PAINTER (Text for Stanford Humanities Review)
Harold Cohen Center for Research in Computing and the Arts UC San Diego La Jolla, Ca 92093. October 1994.
Snow had a point. While it has not been the case that my own work, straddling the apparently disjoint fields of art and artificial intelligence, has been ignored, I have found few art writers willing to exhibit their ignorance about technology, and as few in the science community able or willing to handle the art part — why we, on our side of the void, do the things we do. Into this rarefied space boldly strode Pamela McCorduck; her book, AARON's CODE, published in 1990, remains the most serious attempt to show that my work on the AARON program has been a single endeavor, not a case of cultural schizophrenia. I will always be truly grateful for that understanding. Books have a way of suggesting that the curtain came down on the last page, however, and AARON is still very much on-stage four years later. Several authors have referred to more recent work, but — I am a touch chagrined to note — my own last published report was in a paper, "How to Draw Three People in a Botanical Garden," which I wrote in 1985. Clearly, I can always find things I would rather do than write —some of the things I ought then to have written about, for example —and I am grateful in a perverse kind of way when circumstances corner me. In this case I had said I did not think there would be any point in contributing an essay unless it could be adequately illustrated, and when the editors generously offered to print several pages of reproductions in color, I was stuck. I must take advantage of this opportunity, then, to provide an overview of the past four years, during which time much has happened. It will be light on technical detail, necessarily; readers who would like to know more about the internals of the program might begin by referring to the several earlier papers which I am currently making available on the Internet by anonymous FTP from wendy.ucsd.edu. My intention is to make more technical reports similarly available, but I can see how this intention might not be taken too seriously. By this time AARON may be the oldest continuously developed program in computing history, and it has become virtually impossible to deal with its entire history in a single paper of reasonable length. For readers not aware of its earlier stages, however, a brief pre-1985 recapitulation is necessary.
AARON began its existence some time in the mid-seventies, in my attempt to answer what seemed then to be — but turned out not to be—a simple question. "What is the minimum condition under which a set of marks functions as an image?" On the simplest level it was not hard to propose a plausible answer: it required the spectator's belief that the marks had resulted from a purposeful human, or human-like, act. What I intended by "human-like" was that a program would need to exhibit cognitive capabilities quite like the ones we use ourselves to make and to understand images. The earliest versions of AARON could do very little more than to distinguish between figure and ground, closed forms and open forms, and to perform various simple manipulations on those structures. That may not have been enough, however, had AARON not performed, as humans do, in feedback mode. All its decisions about how to proceed with a drawing, from the lowest level of constructing a single line to higher-level issues of composition, were made by considering what it wanted to do in relation to what it had done already. Both of these elements were necessary: while there was an obvious superficial payoff from this feedback-oriented drawing strategy —AARON's drawings had a distinctly freehand, ad-hoc look quite at odds with popular assumptions at that time about machines —1 doubt whether freehand scribbling would have persuaded anyone that they were looking at anything more than scribbling. And to judge from the responses of art-museum audiences, AARON's marks quite evidently functioned as images (fig 1.) Simple though the program was, one thing was thus established in its infancy: AARON would always need to know what it was doing, and the key to what it would be able to do would always be constrained by the ways it would represent, internally, what it had already done. I had assumed, during this initial period, that I would continue to add cognitive primitives to AARON's repertoire, and that the program would continue to develop indefinitely. But by 19801 was beginning to suspect that the human mind is remarkable more for the way it is able to orchestrate its use of a rather small family of primitives than for the size of the family. At all events, if more were there to be discovered I was not discovering them. And, reluctantly, I was beginning to face the fact that the human cognitive system develops in the real world, not in the vacuum where AARON lived. 1980 marked a turning point. It was triggered by my examination of the scribbling behavior of young children, where I hoped to find some clue to further development. For two reasons I concentrated on the moment at which a scribble migrates outwards and becomes an enclosing form for the rest of the scribble; first, because this appears to be the moment at which the child becomes aware that the marks it makes "stand for" something in its world: second, because the geometry of enclosure — the physical relationship of the enclosing form to what is being enclosed —1 found quite baffling.
Page 2
▲ Figure 1. San Francisco Museum of Modern Art, 1979. Mural in background, turtle making drawing in the foreground. Photo: Becky Cohen. ◄ Figure 2. Detail, One of the Young Ladies Grew Up and Moved to Washington. Mural for the Capital Childrens Museum, 1980. The title was a reference to the Desmoiselles d’Avignon, with which the artist thought he saw some affinities. Photo: Becky Cohen.
None of my attempts to simulate this early human drawing behavior met with any measure of success, yet I became convinced that the range of forms AARON could generate would be greatly enhanced if its steering strategy could be made to find its way around a pre-existing "core figure;" the equivalent of the way the child's initial scribble evidently determines, though not fully determines, the path it traces to enclose it. This conviction proved in due course to be justified, to a much greater degree than I could possibly have predicted. The construction of simple core-figures, plus a simple strategy for tracing a path around them, yielded forms of a complexity I could not have generated, if indeed I could have generated them at all, without substantially greater cost. A veritable free lunch! This two-step strategy became AARON's standard mode for generating closed forms. What I did not anticipate at all was the jump in the "thing-like-ness" of those forms, and with it the increasing illusion that AARON was drawing "from visual experience" of the outside world (figure 2). Thus began a gradual slide downhill — or climb uphill, depending upon how you view it — into overt representation. By 1985 AARON had a set of trivial rules for the behavior of the outside world, and things moved rather quickly; later that year I succeeded in describing one particular figure — the Statue of Liberty — in enough detail to permit AARON to provide the final image for an exhibition on the history of images of the Statue (figure 9). Happily the program needed no knowledge of legs and feet for this one, but it had that knowledge by the following year, together with enough knowledge of human posture for it to generate a series of "Athlete" drawings. Then came the provision of a physical ambience for AARON's figures; a description of plant growth general enough to permit the generation of anything from a quasi-daisy to a quasi-oak-tree, and, in summary of what had been done up to that point, the paper referred to earlier, "How to Draw Three People in a Botanical Garden." That phase provided for a very large number of drawings, and for a series of paintings, the execution of which took me through to 1989 (figure 3). AARON's two-part representational strategy remained unchanged during this fouryear period. It viewed the human figure as a complex of connected parts, and its postural rules referred to the way these parts articulated. Each part — arm, head, leg — was represented internally as an array of points with its origin at its articulation to the next part: hand to forearm at the wrist, forearm to upper arm at the elbow, and so on. The complete body could be accumulated in any of a rather limited range of poses from the appropriately transformed arrays of points. The core-figure for each part was generated by connecting these points, and AARON would then proceed to trace an enclosing form around each part. The single exception to this procedure was its handling of the one facial feature it represented; the nose was seen simply as a set of marks drawn within the
Page 3
◄Figure 3. Meeting On Gauguin’s Beach, 1988. Oil on canvas, 90x68 inches. Collection: Gordon and Gwen Bell. Photo: Becky Cohen. ▼ Figure 4. Anatomical study, 1988. Photo: Becky Cohen.
bounding outline of the head, and it was used as a device to establish the head's orientation. I think one would need to look rather closely at AARON's drawings of this period to spot the fact that it's internal representation of the human figure involved only a symbolic three-dimensionality. It knew about perspective, to the degree that it could place figures into a spatial setting with things overlapping each other as they should. But the figures themselves were like cutouts placed into this space, representations generated in twodimensional terms directly. Viewed as an expert system, AARON's domain of expertise was drawing, not human anatomy; it did not need to construct a seated figure in 3-space for the thighs to appear foreshortened. AARON already knew that the thighs of seated figures should be drawn that way. I did not view this two-and-a-half dimensionality as a limitation; simply as a mode. Living as we do in a culture obsessed by appearances, saturated with photographic imagery for a hundred years, it comes as a shock to realize that 95% — I'm guessing — of all the images ever made follow other paradigms, exhibit virtually no interest in the reflection of light off the surfaces of objects, and direct the viewer's attention to abstractions, not to the appearance of physical objects. I have been much more aligned, as an artist, with this more general view of representation, and increasingly repelled by the fundamentally Euro centric view — my own history! — that underpins the current version of Renaissance perspective rendering, computer graphics. AARON was not, and is not, a "solids modeling" program. Nevertheless, I was experiencing some dissatisfaction with AARON's two-and-a-half dimensional world, though for reasons that might seem a little perverse. I had always colored some number of AARON's drawings; as the illustrations indicate, many of them had been turned into paintings and some half-dozen into murals. By 1986 it was beginning to seem inappropriate that a program smart enough to generate an endless stream of original drawings was incapable of doing its own coloring. I had no idea how to go about correcting that deficiency, but I was beginning to develop some intuitions about what would have to be done to clear the decks. For example, I thought it would be unwise to take on the problem of color in the context of the elaborately detailed drawings of that time. Surely I would stand a better chance if I had a simpler configuration of forms, something no more complicated than a torso-length portrait. At the same time it seemed to me that while AARON's drawing served well enough in the context of its then-current work, it would be too blunt, the line of the enclosing form inadequately articulated, to stand alone if enlarged up to the torso-length portrait I imagined the program doing. I thought the drawings would need the kind of visual complexity that results from the casual overlapping of one thing by another, the unexpected profile that emerges as a limb is rotated or a head lifted. AARON's two-and-
Page 4
a-half dimensional knowledge would not support that kind of complexity; in fact, the program had no way of placing an arm across the body, or breaking the outline of the head with an overlapping hand. As to the lack of articulation of the line itself; that was not a problem of AARON's method for constructing the enclosing form around the corefigure, but rather of the scant-ness of information contained within the core-figure itself. So, while I was completing the series of paintings based on the 1985-6 version of the program, I was also developing a new version involving a fully three-dimensional knowledge base. In effect, AARON became two interacting programs: the one applying a greatly enhanced rule-set governing posture to what was initially a small number of 3-space control points, generating "real-world" figures in a "real-world" environment; the other generating a two-dimensional representation of that "imagined" threedimensional world. This development marked the beginning of a far more realistic — whatever that means — phase of AARON's output, and I need to stress the fact that it did not constitute a retreat into the Euro centric world of surfaces. Neither at that time nor at any time since has AARON had any knowledge pertaining to the surfaces of things. Its data represented points within, not on, the figure; articulation points, muscular attachments, and, more generally, points designed to present an appropriate profile, in whatever posture and from whatever "viewpoint", to the outline-generating function (figure 4). These points, many more of them than had been required by the earlier versions, were derived from large medical illustrations of the skeleton, each point being measured, and recorded as a xyz-triple, relative to the origin of the body-part involved. Not surprisingly, it took more precise knowledge about the articulation of the body to take advantage of the potentially greater flexibility this new data offered, but not as much as I had anticipated. In gross terms posture is controlled by a single variable; the physical relationship of the center of gravity of the torso — the heaviest part — to the placing of the feet; placing the center of gravity in advance of the feet results in a sense of movement, for example. A program does not just "have" knowledge, however; the knowledge has to be represented within the program in computationally appropriate structures. I have always tried to have AARON specify its plans in terms of the highest level of abstraction possible, leaving it to the lower levels of the program to generate instances of those plans. In effect, I wanted AARON only to have to specify "seated" at the topmost level in order to control the placing of the center of gravity with respect to the backside rather than to the feet. That meant designing a structure containing all the variables that controlled posture, supplying ranges of possible values for those variables, and specifying how to relate the selection of appropriate values to the abstraction.
Page 5
Now AARON could, in principle, pose a figure in such a way that, from any "point of view," any part could be partially obscured by another part. In practice, it wasn't that easy. I should clarify the notion of "point of view." As I have indicated, AARON's figures are posed, constructed and placed in a virtual three-dimensional world. AARON places itself, its "eye," into this three-dimensional world also; in fact, its "eye" becomes the origin for a system essentially like the eye/picture-plane/object arrangement of Renaissance perspective. The xyz-triples representing the points of the figure are projected onto the picture-plane, and from that point on, AARON draws its enclosing forms from front to back and without erasing anything already drawn; the closest parts are drawn first, the furthest last. In other words, the data representing the figure are entirely three-dimensional, the construction of the representation of the figure is entirely two-dimensional; a duality that corresponds exactly to what happens when a human artist makes representations of the outside world. So what was the problem? If AARON had been a surface-oriented program there Would not have been one. There are off-the-shelf algorithms for hidden plane removal. But, as I have explained, AARON's primary representation of the "real" body exists as a cloud of points in space. Projected onto the picture-plane, these points had to be joined up, according to view-dependent rules, to make the core-figure for each part. Those parts, two forearms for example, might then overlap each other; but aside from the trivial case where both end-points of one part are known to be further away, in 3-space, than both end-points of the other part, there is no way of determining from the placement of the points on the picture plane which part occludes the other. Not all the parts of the body have end-points in this simple sense, and in any case it would be the not-yet drawn enclosing form, not the axis, that would be doing the occluding. In the absence of any straightforward mathematical solution to this problem, AARON was supplied with an extensive set of inference rules by means of which it could determine which part was in front of which on the basis of its knowledge of the figure. The problem was simplified by the realization that the hands move around much more than any other part of the body, and that a great deal could therefore be determined by examining the articulation of the arm. For example:if the left wrist is closer (in 3-space) than the left elbow, and the left elbow is closer (in 3-space) than the left shoulder, and the left wrist is to die right (in 2-space) of the left shoulder, and the left wrist is not higher (in 2-space) than the right shoulder: then the left arm will obviously obscure the torso, and will have to be drawn before the torso is drawn.
Page 6
This is, of course, a deliberately over-simplified and under-specified example which assumes that the figure is more or less facing the viewer. Other rules would apply if the figure was facing right or left. Also, the "left arm" is actually composed of three articulated parts — upper arm, forearm and hand — for which the order of drawing is similarly dependent upon the relative positions of the articulation points. Things become additionally complicated when, for example, the right hand overlaps the left upper-arm. Occlusion — the obscuring of elements in the visual field, the T-junctions that form when the edge of one object breaks off at the edge of another object — provides one of the strongest clues in depth perception, and it is unlikely that any coherent system of representation could exist that did not pay attention to it. AARON's method for implementing occlusion — as opposed to knowing simply what occludes what — has not changed fundamentally since its earliest days. Part of AARON's internal representation of its "imagined" 3-space world is a two-dimensional matrix of a size equal to the resolution of the workstation screen. As the lines comprising a core-figure is drawn, the cells through which the lines pass are marked. The fundamental algorithm for enclosure operates in relation to these marked cells, and the enclosing line it generates is similarly mapped onto the matrix. Once this nowcontinuous boundary of marked cells is complete, each cell inside it that has not already been claimed by a previously drawn form is marked to identify it as belonging to the current form. Only those areas of the matrix that remain unmarked are available for drawing through and filling. Since at this stage AARON was drawing only, it was enough to use small integers as cellmarkers serving to differentiate between one part and another, and between filled parts and ground. As we will see, coloring introduces new requirements, in relation to which the simple differentiation of parts by number is quite inadequate. From the first use of three-dimensional knowledge in 1988 until some time in 1990, the size and complexity of the knowledge-base increased relatively little. AARON was now doing the simpler drawings I had wanted, and I thought they exhibited the characteristics I had hoped for. I turned my attention to the building of a prototype painting machine, a small robot arm carried on a large flatbed xy-plotter. It was in operation by the end of 1990. But building a machine did not solve the still-intractable problems of color, and AARON never got beyond making a few black-and-white brush drawings with its new toy. It is always easy to think of one more thing that needs to be done before facing up to a problem one does not know how to solve, and there were surely enough things to do in relation to drawing to keep me busy. In this case I was beginning to feel that AARON's usual two-step procedure for generating closed forms was in need of attention, in order
Page 7
to bring it more into line with our own perception of things. Since it viewed each of the body's parts as articulated but separate, it drew each of them with its own outline. Our own perception is less simple. We view the shoulder as an area of transition between upper arm and torso, for example, rather than as a boundary between the two. We know that faces have noses, but we do not think of the nose as being enclosed within its own outline. During 19901 had finessed that problem by allowing the unbroken enclosing forms to serve at the junctions of parts as if they delineated clothing, while I was obliged to leave out facial features entirely. That was obviously a cheap fix; the problem was solved in 1991 through a modification to the underlying strategy. AARON continued internally to represent body parts as closed forms, but in drawing them it would leave out appropriate parts. The modification, simple in principle, proved to be extensive in its implementation, for the reason that the part to be left out would change as the angle of view changed. Unless the deleted part of an outline could be guaranteed not to become the bounding outline for the entire figure it would be all too easy to find an incomprehensible gap in the drawing; a hole where the shoulder should be, a head with a void instead of a nose. Since the information that generated the outlines was contained in the core figures, and the core figures were responsive to angle of view, guarding against missing shoulders and non-existent noses involved generating a slightly different core-figure for each part, for each segment of possible angles of view; a great deal of work and a large potential increase in code. I could not face the tedium involved in doing this for every possible angle of view, and the task never got beyond a fairly narrow range of more-or-less frontal positions (figure 5). I continued through '91 and into '92 to work on other aspects of the program in what I thought of as jockeying for the right position from which to move, finally, into color. I was beginning to sense that we use colors in different ways for different purposes, and while I had succeeded in providing a nice, simple format, I was no longer sure that I wanted to deal with color in this exclusively representational context. With this in mind I spent some weeks developing a functional description of decoration, general enough to allow a relatively small body of code to generate a rather wide range of decorative motifs. This decoration, applied to the wall behind the figure —sometimes within a frame, sometimes covering the entire available space — re-introduced a level of complexity in the series of paintings that followed (figure 6). By this time I had very little reason, beyond my own inability to see how to proceed, to prevent me from coming face to face with the problem that had eluded me for three years. Circumstances came once more to my rescue. My dear friend Jerome Rothenberg was about to celebrate his sixtieth birthday. Another friend, Pierre Joris, decided to publish
Page 8
▲ Figure 5. Two Friends with Potted Plant, 1991. Oil on canvas, 60x84 inches. Photo: Becky Cohen. ◄ Figure 6. Aaron, with Decorative Panel, 1992. Oil on canvas, 72x54 inches. Photo: Becky Cohen.
a small volume to commemorate the occasion, and I was told to produce the cover. Having been so recently involved in decoration, I thought I could provide an attractive decorative cover without difficulty. Then I started to wonder whether AARON could not be persuaded to produce a recognizable likeness of the poet. As a result, for almost two months I found myself developing AARON's knowledge of the structure of heads and faces. The number of data points involved grew from a few dozen to several hundred, though this increase in numbers alone gives little idea of the structural development of the program. The points were organized into parts — upper mouth, lower mouth, beard, forehead, eyelid, lower eye and soon—in such a way that the individual parts could be scaled and moved, three-dimensionally, at will. I did not much enjoy the fine-tuning by hand that was required to produce the likeness; it was a bit like playing with a police identikit, except that in this case the manipulated data was three-dimensional; once I had a likeness, the likeness would hold as the head turned and the facial expression changed. Along the way, however, AARON was generating make-believe people, many of them looking distinctly like people I knew; that I found interesting (figure 8). By the time the exercise was done, AARON's much-extended data base represented a prototype figure only, and the program had enough knowledge to generate from it a varied population of highly individualized physical and facial types, with a range of haircuts to match. It was now in the second half of 1992, and I began in earnest to develop a functional description of color and rules for its deployment. ********************
Let me try to clarify what my intentions were with respect to color, and to explain why I had so much difficulty in finding a starting place. In the first place, I do not much like the quality of electronic imagery, and I have never had a taste for the ephemerality of images produced in the medium. I like art to stay around long enough to unfold. Thus, while I am by no means a technology freak — my friends are much amused by the fact that I do not use money-dispensing machines and never learned to program my VCR —1 could see no alternative but to build a machine that would allow the production of large, colored images in the real world. And while I have built a number of drawing machines and had already made a prototype painting machine, I had no idea how long it would take to build the fully-functional version; or, for that matter, whether I could build it at all with whatever resources I could muster. In making the pragmatic decision to proceed with the problem on a computer workstation I knew I was facing a number of obstacles related to the control of the device itself. That is too long a story to tell here, and it is much less straightforward than one might imagine. More importantly, I was aware that there is a fundamental, perhaps irrecon-
Page 9
◄ Figure 7. Standing Figure with Decorated Background, 1993. Oil on canvas, 78x54 inches. Photo: Becky Cohen. ◄ Figure 8. Theo, 1992. Oil on canvas, 34x24 inches. Photo: Becky Cohen.
cilable, difference between the three-color additive mixing that these devices use, and the subtractive mixing of the fifteen or twenty dyes I have long used myself and proposed to have AARON use. I still do not know for sure how coherently knowledge gained in the one domain will map onto the other. Secondly, I needed to provide knowledge about color control, and I was struck by the fact that most of what we think of as color theory covers a range of topics — color measurement, color perception, ways of describing color space — but little of it constitutes a theory of color use. What there is — Albers, for example — is bound so tightly to a single image that it does not extrapolate. An apple green used to color a square band in an Albers painting will not generate the same response if it is used to color a face. Or, to extend that example: imagine how differently the color would be read on a face in a German Expressionist painting and on a face in an Italian Renaissance painting. I was not looking for anything off the shelf, I was perplexed by the fact that there did not seem to be anything of the sort ON the shelf. We do not even have an adequate vocabulary; names for individual colors, a couple of descriptors for comparing one color with another — one is lighter than the other, we say, this one is more vivid that that one — but no way of discussing in detail anything as complex as a color scheme. Isn't that surprising, considering how much we appear to value our experience of color, and how good some artists are at manipulating it? I concluded that color is one of those things that we do not exactly "think" about; I mean that we have ways of manipulating it in the head, but the manipulation does not follow the more regular traffic of externalization into verbal constructs. I am comfortable with the conclusion that not everything that goes on in the head is thinking, but how does one write a computer program to manipulate material one can not even describe in English? Well, of course, there are some things one can describe. The key to progress came, finally, with the realization that I knew what some of them were. For many years I have been insisting to my painting students that brightness is a far more important component of color than hue is; that it is more important to control how light or dark things are in a painting than where they fall on the spectrum. That is not as counter-intuitive as it might sound, given that the eye functions primarily as a brightness detector. Why it took me so long to bring into focus something I have known and used for half of my life I cannot say. Once it was in focus, I was able to proceed quite rapidly. I still did not know enough to provide a complex rule-set for color control, but it turned out that I did not have to. Before we pay much attention to how an artist performs we look to see what he, she or it believes to be worthy of performing, what issues are thought to be worth consideration. Evidently it was enough for AARON's almost exclusive preoccupation with brightness to give it's coloring a kind of simple authority, and by the end of 1992 I could see AARON functioning, on the screen, as a modestly able colorist.
Page 10
The rules became more complex, obviously, and AARON more able and varied in its performance, as I continued to work on the program. I devised a notion of color chords — ways of choosing colors in various spatial relationships from the entire color space — and AARON was able to construct these color chords, rather than control the brightness of more-or-less randomly selected hues, as a way of controlling the overall color structure of the image. Even so, the importance of brightness remained central; the structure of a chord, as I had designed it, demanded that all its components, however selected, retain some required level of vivacity as they become lighter or darker. By the summer I found I was able to use AARON's coloring to determine my own coloring decisions in making paintings, by hand, from its images. Rather than have my assistant enlarge up a slide made from a laser-print, she worked now directly from a slide shot off the workstation screen. It was like working from a color sketch, something I had never done before. Of course, there is a big difference between a 35mm slide and a seven-foot painting, and while in some cases I departed little from AARON's original (figures 7), in others the color changed a good deal, which is not to say that it necessarily got better, as the painting proceeded. In any case the old head-scratching system of color choice was over; just ask AARON. So far, so good. By this time my painting machine was in construction, and I had to confront what proved, unexpectedly, to be the most difficult single problem I have faced in twenty years of programming. As I have described, AARON's images are represented by a two-dimensional matrix of cells at the resolution of the workstation display —currently, 1280 by 1024. Each part is mapped onto this matrix, but, of course, rarely ends up there as a single bounded shape. That is of no consequence when the coloring is done, raster fashion — row by row, top to bottom — on the screen. Each cell is scanned, its identifier points to the part it belongs to, the data attaching to that part says what color is to be generated, and the cell is "painted" by the correct combination of the three gun values. But I did not go to the trouble of building a painting machine merely to reproduce this boringly mechanical way of filling in shapes. The machine is designed to use brushes, and on the assumption that it would be able to use those brushes in a more "natural" way. That means that AARON has to be able to isolate and deal separately with an arbitrary number of patches, all of which belong to the same part and consequently require the same color. It also means that AARON has to be able to cope with the filling of arbitrarily complex shapes once it has located them, under the overall constraint that it should attempt to keep a wet edge moving forward; as far as possible, it should not leave an edge to dry in one part of the shape while it is off working in another part. Let me try to give some sense of the nature of the problem. Look at the nearest potted plant; make a drawing in which you outline every patch you see, whether its boundary
Page 11
is determined by the edge of the leaf to which it belongs, that of other occluding leaves, or — most likely — both; and whether it is a part of a leaf or part of the background. Now assign a number to each patch, being sure that the same number is assigned to every patch belonging to any single leaf. Having got thus far, try to categorize the many strange shapes you will have drawn: entirely convex: partially concave: long and thin: short and fat: lobed: torus... Finally, try to devise a strategy for filling each of these many shapes that will function for every category of shape. It has to be a strategy that does not draw attention to the shape as an isolated event, but allows it to be seen as part of the object to which it belongs. I cannot describe my solution to these problems without going into more technical detail than is appropriate here. I tried, and abandoned, a number of methods which looked good as far as they went but then failed to make it to the end. The whole thing took the greater part of a year, not the two or three weeks I had thought it would need, but now I am able to watch, on the screen, while AARON generates what appears to be a good simulation of what will happen in the real world (figure 10). Interestingly, the final, successful, solution rides upon extensions to methods that have been fundamental to AARON since its first inception. A great deal remains to be done before AARON opens at the Computer Museum in Boston next April. But, as I write, the painting machine is in the final stages of "training;" learning where everything is, picking up and putting down brushes, dispensing and mixing dyes, parsing the files which AARON will send it. Another couple of weeks — he says, optimistically — and this article might have included the first published illustration of the first fully machine-generated and machine-executed work of art in human history. Stay tuned.
********************
On those rare occasions when I make the time to write, I tend to focus, as I have in this piece, upon what the program does; as a series of events that can be shown to have occurred and what needed to be done to enable them to occur. In part, that is because I recognize that I am uniquely placed in being able to give such an account, and because such accounts are a good deal more rare than I think they should be. It is also because, without such an account being given, one is reduced to talking in abstractions. Abstractions are properly challengeable for their appropriateness to the events, but if the events are not available the discourse becomes meaningless. The reader will note
Page 12
▲ Figure 9. Liberty and Friends, 1985. Ink and dyes on paper, 22x30 inches. Photo: Becky Cohen.
▼ Figure 10. Two Women with Decorated Background, October 1994. Image taken from screen. The variations in the colored areas indicate the path taken by the simulated brush in the filling algorithm.
that I did not say that AARON was exhibiting intelligence with respect to the pottedplant example, I said that it "has to be able to isolate and deal separately with an arbitrary number of patches... to cope with the filling of arbitrarily complex shapes..." Does that capability constitute intelligence? It does not constitute HUMAN intelligence. It is easy, in short, to assert that machines think, and equally easy to assert that they do not. If you do not know exactly what the machine did, both are equally fruitless in carrying our knowledge — including our self-knowledge — forward. I am quite sure that if it had been proposed to Dreyfuss twenty years ago that computers might do what AARON is doing today he would have produced something resembling an argument to deny the possibility. We know how it would have gone: art is an activity requiring self-awareness: computer programs cannot be aware of themselves: therefore computer programs cannot make art. That is a definition, not an argument. If Dreyfuss, Searle, Penrose, whoever, believe that art is something only human beings can make, then for them, obviously, what AARON makes cannot be art. That is nice and tidy, but it sidesteps a question that cannot be answered with a simple binary: it is art or it is not. AARON exists; it generates objects that hold their own more than adequately, in human terms, in any gathering of similar, but human-produced, objects, and it does so with a stylistic consistency that reveals an identity as clearly as any human artist's does. It does these things, moreover, without my own intervention. I do not believe that AARON constitutes an existence proof of the power of machines to think, or to be creative, or to be self-aware: or to display any of those attributes coined specifically to explain something about ourselves. It constitutes an existence proof of the power of machines to do some of the things we had assumed required thought, and which we still suppose would require thought — and creativity, and self-awareness — of a human being. If what AARON is making is not art, what is it exactly, and in what ways, other than its origin, does it differ from the "real thing?" If it is not thinking, what exactly is it doing?
Page 13
Colouring Without Seeing: a Problem in Machine Creativity Professor Harold Cohen Department of Visual Arts University of California at San Diego I have a very creative family; a fourteen-year-old who draws with unusual skill and has been playing the piano since she was six, a twelve year old who is equally at home in drawing, painting, sculpture and mathematics, and Zana, my three year-old daughter, who speaks two languages and has been drawing like a five-year-old since she was two-anda-half. The mother of these three is a distinguished Japanese poet and writer. As for the only male in this feminist stronghold, I've been making my life and my living in art for fifty years, for about thirty of which I've been writing computer programs of sufficient ... something ... to explain why I'm writing this paper. Which brings me to the last member of the household. When I don't keep it busy with email, designing machines, writing conference papers and doing my tax returns, AARON is sitting quietly on my desk, generating original images at the rate of about one every two minutes (fig I). And, most particularly when I'm watching it, I am aware of a couple of questions that need to be addressed: if I say that I have a creative family, and then I were to say that I have a creative computer program, would I mean the same thing by the word "creative"? And how far could I justify the claim that my computer program - or any other computer program - is, in fact, creative? I'd try to address those questions if I knew what the word "creative" meant: or if I thought I knew what anyone else meant by it. Back in the days of flower-power, the streets of Berkeley, California were lined with romantically attired flower-children selling tie-dye scarves and badly made silver jewelry, resulting, we were led to suppose, from the expression of their creativity. There's a lady in the United States who has built a media empire on showing people how to be creative by putting wreaths of rosebuds on their front doors and painting their dining rooms pink and yellow. Every week my mail offers me software which is guaranteed, for twenty-five pounds, or fifty, or a hundred, to increase my personal creativity. What can it possibly mean? Where does intelligence end and this, presumably higher, faculty begin? For reasons that I hope to make clear, I have never, in fact, claimed that AARON is a creative program, and while we think all our kids are pretty smart, and the three-year old will probably get her PhD by the time she's six, we'd never use a word like "creative" to describe what all little children do if they 're given enough room to develop. "Creative" is a word I do my very best never to use if it can be avoided. Having said all of which, I have to confess that when I think about the way Bonnard used colour, or the invention of anything as bizarre as superstring theory, or Faulkner's four ground-breaking novels written in the space of two years, or the astonishing prolificity of a Bach or a Mozart, I do find it hard to avoid the impression that there is indeed some "behaviour X" that appears to be distinct from intelligence, whether we choose to call it creativity or not, and whether I can find the line that separates it from intelligence or not. So, if I can't talk about creativity, and before I can question what it might mean to say that a computer program exhibits, or might eventually be able to exhibit, creativity, I can at least try to identify the essential elements of behaviour X in human beings. Well: any behaviour in mature human beings is pretty complex, and I don't have the knowledge or the professional skills that would be required to sort out the mess of genetic factors, environmental factors, personal experiences, ambitions and heaven knows what else that cause people to do the ordinary things they do, much less the extraordinary things. Fortunately I have access to a rather new human being (fig 2), in whom the determinants to behaviour are relatively uncluttered by unresolved experience. So I'd like to begin by looking to see whether part of her behaviour doesn't have many of the characteristics, albeit manifested on a very small scale, that we would hope to find in fully developed adult behaviour X. I'm going to look specifically at her drawing behaviour; which isn't meant to imply that drawing is by definition an example of behaviour X, but simply that behaviour X has to be manifested in activity, and drawing is one of those activities in which we can see, not only what has been done, but - if the drawer is a child and if we watch and listen carefully - track what happened while it was being done.
Early drawing usually proceeds out of the purely motor activity we call scribbling. At some point a round-and-round scribble will migrate outwards and become an enclosing form, and at some time rather later, the scribble will be omitted, leaving closed forms that are then available for the building of a variety of representational elements, (fig 3), made by the three-year old child of a friend, illustrates these stages quite clearly and is characteristic of what we would expect of a child of that age. The ability to distinguish between closed forms and open forms is one of the earliest cognitive skills to develop in the young child, so its appearance as a factor in drawing presumably waits upon the development of adequate motor skills. By the time Zana was two she was constantly asking me to draw the people in her family. I obliged, somewhat reluctantly, with cartoonlike drawings which I couldn't show you - without ruining my reputation. Zana bypassed the scribbling stage entirely, and by the time she was about two-and-a-half she was drawing faces (a, b, c) . These were clearly done in imitation of the drawings I made for her, but while I would repeat the same drawing of each person over and over, her own drawings, many of them of me, showed a quite surprising degree of variation in how they were put together, reflecting an increasingly important role for own perceptions. One day she was busily drawing me when she stopped, stared at me very hard for a minute, then she stuffed two fingers into her own nostrils and said "you got two noses!". What a discovery! She then proceeded to draw in the nose for the very first time (d). But what a strange configuration for a nose! Strange, that is, until one starts actually to examine the complicated formal structures around the nostrils, and then it seems that Zana got it very nearly right.
a
b
d c
Here's another example of an adopted practice modified by personal experience. Zana developed a fondness for drawing her hand by putting her hand on the paper and tracing around it. She didn't invent the strategy; children at her school learn the simple geometric shapes by tracing around plywood cutouts, and I'm told that at Thanksgiving in the US every child is shown how to make a drawing of a turkey from the outline of a hand. In this case, then, it is a drawing strategy, rather than a form, that has been adapted from a different context. Two things are noteworthy in these drawings, one having to do with how she made them and the other concerning the drawings themselves (e). The first, which illustrates the cognitive importance of closed forms in early development, is that Zana would always close the outline when she took her hand away and before she continued with the drawing. The second is the appearance of fingernails and joint-creases in the fingers. I know she didn't get that from anyone in the family, whose suggestions were always politely ignored, and it almost certainly didn't come from her school, since all of the other children of her age were still at the scribbling stage.
In fact, it is clear from the subject matter of these drawings that they reflect her own personal experience. Zana won't walk upstairs if she sees an ant on one of the steps, but she likes to pick up baby snails known to Japanese children as den-den-moshi-moshi, and let then crawl on her palm (f). Again, no one to my knowledge ever showed her how to draw a snail, and if they had. I'm sure they wouldn't have drawn them with legs. Some of Zana's snails have legs! This is not a mistake, you understand, though you may regard it as a failure to observe correctly, but clear evidence that when we draw, adults and children alike, we represent our internal models of the outside world, not the outside world "itself" - whatever that might mean. In Zana's experience, fairly recent for a three-year-old, you need legs for walking, and den-den-moshi-moshi unquestionably walked across her hand. I asked her about the legs, and she assured me that snails really do have legs. e
f
Adaptation - that is, the recycling of forms and production technologies into contexts other than those in which the forms and technologies originated, is clearly characteristic of the young child's development, as it is in any manifestation of the adult's behaviour X. Thus, for example, the short straight line which began as the folds in hands began to make their appearance in other contexts, (g, h, i) A long line attached to a closed shape became a lollipop. Short lines radiating from a closed form became either a sun - I'm sure she was shown that one - or a lion. She had seen pictures of lions in one of her books, though, of course, not in a rendering that could have served as a model for her own.
g
h
i
Books are an important part of Zana's life. Mummy reads the Japanese books and Daddy reads the English books. Zana knows all her English letters, and she can recognize the difference between English and Japanese texts. A couple of months before she was three she started to introduce "texts" into her drawings (j), which she would then ask me to read to her. None of that is very surprising if one reflects that pictures in childrens' books are always accompanied by texts, and parents are always able to read them.
Figure 1
Figure 2 Zana drawing.
Figure 3 Drawing by Noah Macilwaine.
Figure 4 Painting by Zana.
Figure 5
Figure 6
Figure 7 Screen Image 1992.
Figure 8 Painting from fig. 9. Oil on Canvas 1993.
Figure 9 Screen Image 1993.
Figure 10 Machine painting. Dyes on paper, 1995. Collection: Dr. Thomas Gruber.
Figure 11 Machine painting. Dyes on paper, 1995.
Figure 12 “Mother and Daughter” Oil on canvas, 1998.
Figure 13
Figure 14
Figure 15
Figure 17
Figure 16
Figure 18
j Now if Zana had been a mature adult artist and not a three-year-old, she would perhaps have noticed that her preoccupation with text-and-image had generated what was for her a quite new kind of composition quite different from the centralized faces. Of course, she wasn't at all interested in composition, but what of that? Every artist has known the experience of finding something happening in his work that he hadn't intended to happen, but which nevertheless causes a change in direction. As earnest observers, we would say that Zana's new kind of composition emerged, unbidden and unintended, from her preoccupation with text. And Zana, who probably couldn't care less about emergent properties, might be on her way to New York with a show of new paintings. And that observation leads us directly to what I am sure is a critical element of behaviour X. The individual has to find something in his work that he never consciously put there. Whether these emergent properties result from the juxtaposition of disparate conceptual elements, or whether they result from the technological complexity of the mode of production, the individual must find properties in the work that were neither consciously sought for nor intended, but which may nevertheless lead subsequently to new conceptual structures, new ways of using the technology, or even suggest a modification of existing, consciously-held, goals. At three, Zana has neither the adult's rich conceptual framework, nor the mature artist's formidable array of technological know-how. As I've tried to show, properties emerge in her drawing in part in an attempt to incorporate her own perceptions of the world, but principally through the adaptation of simple strategies to satisfy new purposes. Also, I should add, from her sensitivity to the physical properties of the materials she's been given. Like any other child, Zana will behave quite differently if she's given a brush and a pot of paint to the way she behaves with a pencil or a coloured marker (fig 4). So I will argue that emergence is a necessary component of behaviour X, but not that it is a sufficient component. There is the further requirement that becomes clear when we consider the difference between a three-year old and a mature adult. Zana hasn't noticed that she has made a new kind of composition. For behaviour X to manifest itself for the mature artist, the individual must also notice that something has emerged, and be prepared to act upon what that something suggests. Behaviour X is thus not manifested in a single unexpected outcome but rather in a capacity for continuous self-modification. To summarise, then, we have three elements, all of which have to be present in the performance of some activity: First, emergence, which implies a technology complex enough to guarantee it. And I want to stress that while for present purposes I am using terms like ''technologies" and "materials" in relation to the physical materials and production strategies of the plastic arts, I mean them in the broadest sense to include language and thought itself. Second, awareness of what has emerged. And I have to put aside what necessarily would be a lengthy discussion on the nature of awareness. If by awareness we mean conscious awareness, then the truth is that mature artists are often not aware of what emerges in their work, and consequently they often don't know why they act. But that is to be expected; only a very tiny part of all the data presented to the senses is ever advanced to consciousness — which isn't to say that it disappears without a trace — and there is ample evidence that most of what goes on in the brain, including decision-making, is not conscious at all. The second element is awareness, then, but not necessarily conscious awareness, of what has emerged. Third — without which, in fact, we could hardly know that awareness had occurred — a willingness to act upon the implications of what has emerged; and this surely implies a distinctive psychology and motivation in the individual.
And to these three I'll add a fourth, which hasn't been mentioned so far because my three-year old mini-subject has so little of it: knowledge. Having a great deal of knowledge about one's chosen technologies doesn't guarantee that behaviour X will kick in for the individual - consider how knowledgeable many thoroughly academic artists are — but behaviour X will not kick in without it. My behaviour X is not to be equated with, or confused with, either innocence or ignorance. How much of this - and I have no doubt there is more - can we expect to find manifested in a computer program? Let me begin with a short and distinctly provisional answer concerning these four elements. On the first: I can see no problem about the increasing complexity of computer programs other than the programmer's ability to keep track of what the programs are doing. On the third: of course a program can be written that will act upon anything we want, provided we can put into code what that is. On the fourth: I can see no reason in principle why a computer program should not be given an arbitrarily large body of knowledge about a particular technology, even a technology as complex as painting. As to the second element, the program's awareness of properties that emerge, unbidden and unanticipated, from its actions... well, that's a problem. But of course it's not the only problem. It should be clear from the way I've answered the question that the answers I give for the computer program don't mean just what they would mean if I were talking about a human being. It may be true that we can give a program an arbitrarily large body of knowledge about physical properties, of oil paint for example, but it can't be the kind of knowledge I acquire viscerally as I mix the paint with a palette knife. It may be true that the program can be written to act upon anything the programmer wants, but surely that's not the same as the individual human acting upon what he wants himself. Isn't free will of the essence when we're talking about the appearance of behaviour X in people? Perhaps: if we assume that free will emerges, when it emerges at all, as a rather unusual property of the nervous system; in human beings, that is. What else could it emerge from: in human beings, that is? I don't doubt the existence of free will, but I do reject the assumption of exclusivity, and my provisional answer to the question - how much of my four elements of behaviour X can we expect to find in a computer program - was intended precisely to focus attention upon the fact that a computer program is not a person. AARON is an entity, not a person; and its unmistakable artistic style is a product of its entitality, if I may coin a term, not its personality. In any computer program that purports to parallel human behaviour there is a line, above which the claim that the program is doing what human beings do can be verified in straightforwardly functional terms: either the program produces equivalent results or it doesn't. Below the line we're dealing with implementation, and we can rarely claim that the program does things in the same way that human beings do them. In functional terms AARON does what human artists do: it paints pictures. But if, as I've suggested, behaviour X is manifested in a capacity for continuous self-modification rather than in the resultant objects, then we need to look for it below the line: largely in the non-conscious processes of the human being on the one hand, and, on the other, in the program structures which can be devised for the machine's intrinsic and distinctly nonhuman capabilities. With some notion, then, of what we should be looking for, and something about the terms in which we should be looking, I'd like to turn to particulars. AARON began as a drawing program almost thirty years ago, and it has evolved, thanks to my own continuing involvement, from something resembling late Paleolithic cave painting to figurative painting: portraits, as it were, of "imagined" people (fig 5). For most of its history I've used some part of AARON's output to make paintings for which I supplied the colouring. The possibility of having AARON do its own colouring grew for me over the years from a casual curiosity about what would be possible and what wouldn't, to an unavoidable imperative, which took several years from inception to its current partial satisfaction. Since AARON as a whole is rather too complex to see how it shapes up with respect to behaviour X - it is currently about a megabyte and a quarter of LISP code - I'd like instead to examine the implementation of this most recently developed part of the program - colour - in some detail. How does one write a program to perform as an expert with respect to colour? Well, not the way the human colourist does it, obviously. The human colourist has an extremely refined visual system, and whatever theories he may come up with, he is unavoidably dependant upon visual feedback. The fact that the computer has no such system doesn't mean that colour expertise is impossible for a program, it means simply that one has to build a system on the capabilities the program has. And which capabilities, we should remember, the human colourist doesn't have. No human artist could compose an entire colour scheme in his head, with his eyes closed, and then write down the mixing instructions for someone else to follow; which is, roughly speaking, what the program needs to be able to do. But one has to build a system to do what, exactly? We can't expect to get very far in modeling expertise in any area
unless we have some notion about what the expertise entails. Unfortunately, and for very familiar reasons, very few experts know consciously what their own expertise entails. As a painter. I'm just your average expert in this regard. I've never subscribed to any particular theory of colour, and I've never been one of those whose colouring procedures have been either systematic or consciously manipulated, certainly not to the point where, like Albers or Seurat, I could have formulated a theory of my own. Colouring for me has always involved a lot of sitting and staring; a little thinking about structural considerations that might influence how to make choices, but in the main not consciously deciding what to do next. Sometimes I might find a colour name popping into my head - usually the name of a colour I haven't used in a while - and when that happens it may act as a trigger to action. But that's the exception; for the most part I don't know what causes me eventually to stop staring and get to work. I go to my paint table, squeeze some amount of paint from each of two or three tubes, mix them up, and recognize that the result is the colour I want. Evidently my non-consciousness has been pretty busy while I've been staring at the painting; it has not only provided the colour that I want, but also the program for generating it. ("Hold on," says the skeptic; "how do you know it's what you want? How do you know you're not fooling yourself into believing you want what you get?" That's obvious, isn't it? What I want relates to the needs of the painting. Do you imagine those needs can be satisfied by squeezing random amounts of paint from randomly selected paint-tubes?) For those of you whose experience of colour is limited to the computer display, colour is an abstraction with components like wavelength, amplitude and phosphorescence. The painter doesn't paint with abstractions, he paints with paint; and any painter will recognize that my non-conscious deliberations must rest upon an extensive body of knowledge; about colour-as-abstraction, certainly, but also about the stuff I squeeze from the various tubes, which gets its colour from widely differing physical materials, so that no two of them have quite the same physical characteristics. It would take a tube-full of cerulean blue to have the same colouring effect on a mixture as a thimble-full of ultramarine, for example, and nothing will give alizarin crimson the same opacity as any of the iron oxide colours. So knowing just how much of each colour to squeeze to make a mixture must rest upon a great deal of experience of mixing and using these particular physical materials. I'm almost inclined to think there's a difference between knowledge that has been internalized to the point where the expert no longer knows he has it, and knowledge that was not consciously acquired in the first place, so that the expert never knew he had it. If that is the case, then expertise in colour is surely of the second kind. What I do know is that it took me a painfully long time after I started to fuss about having AARON do its own colouring before I was able to see any light at all, before I was able to isolate a single element in my own knowledge. And I know that this was a piece of internalized, rather than intrinsically non-conscious, knowledge, because it was something I'd been passing on to my painting students for most of my adult life. It was that the most important single element in colouring is not colour at all, but brightness. Let me explain that in a little more detail. I referred earlier to colour-as-abstraction, as opposed to colour as a property of physical material. The primary abstractions, whether we're talking about computer displays or paint, are hue, brightness and purity. The curve in the graph shown in Figure 6 plots the energy of the component frequencies of a colour sample, so hue refers to where the sample falls on a spectrum - the horizontal axis -. brightness is the total energy and is thus represented by the area under the curve; and purity refers to the bandpass characteristics of the sample that is to say, how much of the available energy falls 'within how narrow a spectral band. I'm using a simple physical model here so that I can attach clear meanings to the terms I use, because these three properties go under different names in different colour theories. Words like brilliance, intensity and saturation are often used, either instead of what I've called purity, or for some hard to measure combination of brightness and purity. And I'm using the word "brightness" as a more expressive term for what is called "tone" in the UK and "value" in the US; independent of its hue and its purity, how light or dark is the sample? It isn't hard to see why brightness is more important in controlling a complex colour scheme than hue is. We have colour vision, yet the eye functions predominantly as a brightness discriminator. Consequently, and as anyone who has looked at a photographic negative knows, it's very hard to read an image when the brightness structure is distorted, while we have no difficulty with colour reproductions in which the colour bears little relationship to the original. Once when I was teaching a course on colour at the Slade School, I had the students take turns in repainting parts of one of their paintings, using arbitrary colours, but with the single constraint that the replacement had to be of the same brightness as the colour being replaced. And after a couple of hours work everyone except the student whose painting we used - and whose soul was a little the worse for wear — agreed that nothing significant had changed very much. Based upon this single scrap of expert knowledge, it didn't take long before AARON was able to generate simple but quite acceptable colour schemes (fig 7). By the end of a few weeks I was actually using AARON's colours as well as its drawings to make paintings (figs 8&9), and the days of staring and head-scratching were over, at least
temporarily. If I wanted to know what colour to paint something, I just looked at a slide of how AARON had done it. That marked the beginning of a rather difficult period for me, however, because my goal had always been to have AARON exercise colour expertise in the real world, with real materials, not simply on a computer screen. There's a small industry now devoted to resolving the fundamental difference between colour mixing on a screen and mixing with physical material. Everyone who uses Photoshop or one of its cousins wants to be sure that what finally gets printed on paper is just what was done on the screen, My own problem is the reverse: I want to be sure that what I see on the screen represents what I can get on the paper. But in either case the translation from one to the other is not so much extremely difficult as literally impossible; the best one can hope to do is to produce a convincing equivalent. The cathode ray tube display uses only three primary colours - red, green and blue - and they mix additively, which is to say that each new colour component added to the mix increases the energy - and hence the brightness - of the mixture. Physical materials - paint, or dyes - act as filters, and each new added component reduces the energy of the mix. The classic demonstration of the difference between additive and subtractive mixing, which always seems to astonish people who have never seen it before, is that if we were to put a green filter in one projector and a red one in another, we would get yellow on the screen where the two projections overlapped. Almost the same is true with any electronic display - a computer display or a television set - except that the tiny spots of red and green mix in the eye rather than on the screen itself. Based only on experience of physical materials, one would correctly anticipate that the pigment we call red would filter out everything except the red light, the green pigment would filter out everything except the green light, and we'd be left with a sort of dirty brown. Which is just what happens. The best kept secret of this little party trick is that actually the two methods of mixing produce the same hue; it is just that the one produces an energy-enhanced sample while the other produces an energy-degraded sample. And that points to the major difficulty in translating from one to the other, which is predicting the brightness of the result. While I was doing the initial work on colour on the screen I was also building AARON's first painting machine and selecting a palette of the dyes I proposed to have it use. I made about twelve hundred samples of mixtures of the dyes, borrowed a rather expensive gadget from my university's optical department, and measured the samples for their red, green and blue components and for their reflected brightness under standard lighting. Lots of data; but when the painting machine was finally finished and I tried to translate the mixing rules I had generated for the red, green and blue components on the screen into the red, green and blue components that identified the dyes, it became clear that the two had very little in common. Even worse, and because I had not seen my way clear to making a dozen or more dilution samples for each of my original twelve hundred mixtures, I really had no reliable data on dilution and thus no adequate control over brightness. The paintings generated during an exhibition at Boston's Computer Museum (figs 10 & 11) were consequently quite varied in quality - when they were good they were very good, but when they were bad they were ... well, not horrid exactly, but not what I'd hoped for. Eventually it occurred to me that rather than making a large set of dilution samples for each colour, I need only determine what dilution would be required to produce the lightest version that was still identifiably the same colour. If the dilution-brightness function proved to be acceptably linear, which it did, then AARON could calculate the dilution for any required brightness of any colour. I was able also to establish a coherent and reliable mapping between what I saw on the screen and what I would see on the paper. I simply scanned in the dye samples and let the Macintosh software tell me what it thought the component values were. The result of these two strategies is that in making colour specifications, AARON is limited now to the samples it scanned, with appropriate dilution, and I can have some confidence that the colours it uses on the screen are actually mixable from the dyes. I've identified brightness control as the most important key to colouring, regardless of whether the implementation is done by the human colourist or by the program and we might pause for a moment to consider how each goes about his —or its— business. As I noted, the human colourist relies heavily upon a sophisticated visual system. The possibility it affords of continuous feedback from the work in progress encourages the development of ad-hoc strategies so effective that they become essentially the norm; it's comparatively rare for the painter concerned with colour to "get it right" first time. For a program without a visual system, on the other hand, visual feedback isn't an option, and consequently the program has to work from a priori plans robust enough to produce satisfactory results without subsequent modification, no matter what chance juxtapositions arise in the course of their implementation. In short, the program can't do what the human colourist does, and the human colourist can't do what the program does; below the line, that is. But even though they are doing it in entirely different ways, they are, in fact, doing the same thing - that is, controlling the brightness structure of the image.
Brightness control is a key factor in colouring and, as we've seen, it was alone enough to make AARON a competent colourist; not enough, however, to sustain an expert level of performance. Something else was needed, presumably involving the other two components, hue and purity. That something else was not dug out from my internalized expert knowledge; in fact, I doubt it would have occurred to me at all had I not been engaged in this work. It is that while one's overall response to an image - one's emotional response, if you will - is substantially determined by the interrelationships and the juxtapositions of masses of colour, the legibility of the image is determined primarily by what happens at the edges of forms. It is very clear, of course, that we don't need colour to build a legible image; brightness control alone will do that, as any black-and-white photograph will demonstrate. But if we want to use colour, for its ability to extend the range of the responses it can evoke in the viewer, for example, then the use of colour has to be subject to a complex set of constraints, most notably by the need to retain adequate separation between forms. And we have not just brightness alone, but also hue and purity at our disposal in maintaining adequate separation. We can see now why it is so difficult for the painter to get the colours right first time, why he needs feedback from the image in progress to adjust the hue, the brightness and the purity of the colour areas until the proper separation has been achieved. And finally, as often as not, making use of localized darkening around the edges of forms — a kind of simulated Mach-band effect — to get what he was unable to get from the adjusted colour areas alone (fig 12). AARON gets no feedback, is unable to make adjustments, and at the present time is unable to handle localized darkening. An ideal dead-reckoning strategy would involve a combining function to generate suitable simultaneous values for the three components, but it doesn't look as though such a combining function can exist. We simply don't know how to weight the three components in arriving at an overall score, and it's pretty clear that weighting couldn't be constant across the spectrum. We might expect, for example, that increasing the separation on any of the three axes would result in increased overall separation. But we find, in fact, that increasing the difference in hue between two colours only increases legibility if there is adequate separation in brightness. If the two colours are adjusted to near-equal brightness, the eye has more difficulty sorting out where the edge is as the hue difference increases, not less (fig 13). As a first approximation, then - and I should say that I've not yet moved on to a better one - I decided to have the program provide adequate separation on each of the three axes independently. This is the sort of thing a program should be able to do in one go, if at all, while the human being can only do it by continuous adjustment. Let me explain how the program's implementation proceeded. My twelve hundred samples had been made in the first instance by mixing pairs of dyes in seven equally spaced samples from one pure dye to the other. About a third of them I considered to be too drab to be worth using and they were discarded. The remaining eight hundred or so I distributed by eye around a circle divided arbitrarily into a hundred and eighty positions. These dyes are not manufactured to cover the spectrum evenly, unfortunately; there are big gaps in the greens, for example, and no strong blues, while there are several colours in the red-to-yellow part of the spectrum. The result was that in some cases there were no samples to occupy one of the hundred and eighty positions, while in others there were anything up to six samples. These were then sorted, also by eye, according to decreasing purity, and without any regard to their undiluted brightness. All of the data representing mixing instructions on these eight hundred samples - measured brightness, estimated purity and dilution factors - is available to the program, and it can now associate these instructions with physical locations. We might think of this as a crude three-space model, in which brightness would be an independently variable third dimension controlled by dilution. There is an important limitation here, however, which directly affected the design of the implementation, as you will see. It is, simply, that dyes are made lighter by adding water, but there's no way to make them darker than their undiluted original state. Within this limitation, AARON is able to produce any brightness it requires of any of its eight hundred available colours. The point of the exercise is not to deal in terms of individual colours, obviously, but in terms of sets of colours that satisfy the separation criteria as I've described them-, to function as a colour chord generator, and to assign the components of its chords to particular elements within a painting. The first pass involved using equal spacing of the chord components around the circle, with only the purest sample taken from each position. By setting the spacing differently, the program could generate a colour scheme all within a narrow part of the spectrum or spread across the entire spectrum, but obviously the relative brightness and the relative purity of the elements of the chord was quite arbitrary; it depended simply upon what particular set of samples happened to be chosen. A variation on this strategy (fig 14) used an additive series for the distances between samples rather than constant spacing with slightly more interesting results; close grouping of parts of the chord along with greater distances between other parts. But, again, purity and brightness remained uncontrolled.
In order to assert the essential control of brightness, the next step was to sort the chords by brightness, then divide the range from lightest to darkest into equal steps and try to adjust each component to the brightness required by its position in the sequence (fig 15). That worked to some extent, but of course a colour couldn't be adjusted if it was already lighter than what was required, and in that case it was simply left unchanged. That led to the final step (fig 16). The program would scan through the lower purity levels at the position in question, and then around its neighbours, until it found a colour dark enough to be diluted to the required brightness. The result was a guaranteed minimum perceptual separation between any two components of the chord, which the program was able to demonstrate with simple diagrammatic renderings of a couple of figures in a room (fig 17). I've left out most of the details of what remains, even with the details, a fairly simple system, one of the most interesting features of which is the astonishing proliferation of options that presented themselves at every stage of development. For example, I mentioned that in the first step it would only use the purest colours available, but obviously it could also choose to use the least pure, which then yielded entirely different effects; or to choose colours which varied as much as possible in purity, again with different results. Or it could select a starting point for the additive distribution in order to generate chords with strong groupings in particular parts of the spectrum. Or, I said that in the final stage it would look for colours that could be diluted, while actually it has the option of giving preference to brightness or to purity as the controlling factor. And finally, once the chord is applied to a representational framework, it has to deal with semantic issues. With respect to the composition as a whole, for example, the program has the option of placing dark figures against a light ground or light figures against a dark ground. And since some colour choices are mandated by subject matter - AARON will never choose to paint faces green or purple, for example - it may choose to generate a separate chord to deal exclusively with flesh tones. At one point I estimated that AARON had no less than seventy-two distinct paths to follow, and by that time, even though I could follow the history, I could no longer keep track of how an individual image had come about. Which condition is, of course, exactly what one would look for to satisfy the requirement of emergence. And so we come to the current state of the program (fig 18) and the need to assess its success, with respect both to the presence of behaviour X and more generally to the quality of its work. Let me begin by reflecting that AARON is able to do what expert human beings do, and do it to a significant level of expertise, without the visual system upon which human beings rely and without the full range of experiential knowledge which they bring to bear, in this case to colouring. It is noteworthy also that the response its work is capable of evoking in the viewer appears not to be too badly constrained by the program's own lack of an emotional life. At the very least, these facts should raise interesting questions that are clearly not being addressed by those who concentrate only on whether the machine can be said to have intelligence, or consciousness, or awareness, or, in the final fallback position, a soul. However, my goal in this paper was not simply to demonstrate AARON's prowess as an image maker, but specifically to look for evidence of behaviour X within the program. I have, in fact, demonstrated a colour technology complex enough to guarantee emergence. And, to the degree that this technological complexity rides on a body of expert knowledge, we should conclude that AARON has expert knowledge of colour and of this particular range of colouring materials. I don't doubt that it's far short of what I have myself, and it's unlikely ever to cover quite the same range, simply because it has been acquired quite differently from the way I acquired mine. Nevertheless, I think it gives reason enough to infer the possibility of much greater knowledge in the future. Now we come to the hard part. AARON is not a prescription for picture-making; the complexity of its structure guarantees that the program itself cannot predict exactly what will happen as a painting proceeds - that is, after all, what we mean by emergence - and a high proportion of its decisions have to respond to the state of the painting at the time the decisions are made. That level of response sounds promising, but it doesn't satisfy the second criterion for behaviour X, which requires that AARON is aware of what emerges, unbidden, from the exercise of its own production technology. AARON does respond to some part of what emerges, but that's because there is a range of properties AARON is always watching out for - they define what the human artist would call the problematics of current artmaking – the things worth paying attention to. In that sense they define what AARON is. But AARON can't notice something outside that range; it couldn't notice, for example, if the foreground elements of the painting occupied exactly half the overall space, or if the colour of a background turned out to be precisely complementary to the colour of the subject's shirt. And, by the way, humans wouldn't notice those things consciously, either. Or, more correctly, AARON could notice those things, but only if they were included in the range of issues it was written to watch out for. Certainly, we can anticipate that its performance will become increasingly varied and sophisticated as its range of noteworthy events increases, but the criterion for behaviour X is satisfied only if it can notice something that wasn't included. Catch 22. We can always include more, and what we want always requires something we didn't
include. But of course! I did say at the outset that behaviour X is not manifested in objects, but in the capacity for continuous self modification. And if, as I suspect, most human beings don't do either better or differently in this regard, we should probably conclude that their performance doesn't satisfy the requirements for behaviour X either. So, finally AARON fails on the third criterion also: the ability to act upon the significance of emergent properties. AARON's biggest single limitation is the fact that it has no capacity for self-modification. And why, one may ask, after more than twenty-five years of work, is that still the case? Because I've always considered self-modification to be a pointless programming exercise unless it could be driven by the existence in the program of higher-level criteria and the program's ability to judge to what degree it was satisfying them. You must understand that I'm not talking about rules for good composition; what I mean by higher level is criteria pertaining to painting-as-verb, not to paintingas-object. After fifty years of painting I am still unable to externalize the criteria for painting-as-verb which I must obviously exercise myself, much less see how to build that capacity into a program. So, as the author of a program capable of generating a quarter-million original, museum quality images every year into perpetuity, I will award myself an A for effort, but no cigar. I don't regard AARON as being creative; and I won't, until I see the program doing things it couldn't have done as a direct result of what I had put into it. That isn't currently possible, and I am unable to offer myself any assurances that it will be possible in the future. On the other hand I don't think I've said anything to indicate definitively that it isn't possible. Many of the things we see computer programs doing today would have been regarded as impossible a couple of decades ago; AARON is surely one of them. In the final analysis we follow our dreams, not the all-too-obvious, but ultimately proven-wrong, reasons why our dreams won't work. If you want the machine to be a device for enhancing your own personal creativity, whatever that might mean, it's yours for twenty-five pounds or whatever if you believe the hype; until the next release, at least. If I want autonomy for the program; if that notion is at the core of everything I do, then perhaps the day will come when I can proclaim AARON to be the most creative program in history. Or someone else can make the claim on behalf of some other AARON. Stay tuned.
Biography of Harold Cohen, Creator of AARON Harold Cohen studied painting at the Slade School of Fine Arts in London, and taught there for several years before joining the Visual Arts Department in 1968. His work as a painter has been exhibited widely both in galleries and in major museums. During the 'sixties he represented Great Britain in the Venice Biennale, Documenta 3, the Paris Biennale, the Carnegie International and many other important international shows. He exhibited regularly at the Robert Fraser Gallery in London and the Alan Stone Gallery in New York.
After moving to San Diego, Cohen became interested in computer programming and particularly in the field of artificial intelligence. On the basis of his early research he was invited, in 1971, to spend two years at the Artificial Intelligence Laboratory of Stanford University as a Guest Scholar. Much of his work since that time has been concerned with building a machine-based simulation of the cognitive processes underlying the human act of drawing. The resulting ongoing program, AARON, has by now been seen producing original "freehand" drawings in museums and science centers in the US, Europe and Japan: the Los Angeles County Museum, Documenta-6, The San Francisco Museum of Modern Art, the Stedelijk Museum in Amsterdam, the Brooklyn Museum, the Tate Gallery in London and the IBM Gallery in New York among others. He has also exhibited in a number of science centers, including the Ontario Science Center, Pittsburgh's Buhl Center, the Science Museum in Boston and the California Museum of Science and Technology. He has a permanent exhibit in the Computer Museum in Boston, and he represented the US in the Japan World Fair in Tsukuba in 1985.
Away from painting in the early '70's, Cohen marked his return with a 100-foot painting for the San Francisco Museum show in 1979. Since then he has executed a number of murals from AARON's drawings: one for the Capitol Children's Museum in Washington, DC, three for the Digital Equipment Corporation, a mosaic mural for the Computer Science Department at Stanford, and one each for the Buhl Science Center and the Ontario Science Center. His recent work has extended AARON's
capabilities from drawing to painting; the first of his painting machines was used in an exhibition at the Computer Museum in Boston in 1995.
Cohen has delivered invited papers at a number of conferences, including those of the College Art Association, the American Association for the Advancement of Science, the International Joint Conference on Artificial Intelligence, the American Association for Artificial Intelligence, European Conference on Artificial Intelligence and the Tokyo Nicograph Conference. He has lectured about his work at the National Bureau of Standards, the School of the Art Institute of Chicago, the Computer Museum and at many other schools and universities both in the US and the UK. In 1999, he was keynote speaker at two conferences in the UK on Cognition and Creativity, one at Edinburgh University and the other at Loghbboro University.
His published writings include "What is an Image?" (Conference Proceedings, IJCAI 1979), "The First Artificial Intelligence Coloring Book," "Off the Shelf," (The Visual Computer, 1986), "Can Computers Make Art?" (proceedings, NICOGRAPH-85), "How to Draw Three People in a Botanical Garden," (proceedings, AAAI-88), "The Further Exploits of AARON, Painter," (Stanford Humanities Review, 1997) and a piece for children in MUSE Magazine, 1998.
Writing about his work includes AARON's CODE: Meta-Art, Artificial Intelligence and the Work of Harold Cohen (Pamela McCorduck, Freeman, 1991), full chapters in The Creative Mind, Myths and Mechanisms (Margaret A. Boden, Weidenfeld & Nicholson, 1990), Digital Mantras (Steven Holtzman, MIT Press, 1994), The Universal Machine (Pamela McCorduck, McGraw Hill 1985), another chapter in Machinery of the Mind (George Johnson, Times Books, 1986) and articles in Discover Magazine (Special Edition on Creativity, 1998), Daedelus Magazine (Winter 88), Insight Magazine (March 88), The Chronicle of Higher Education (January 88), Discover Magazine (October 87), and The Whole Earth Review (Summer 87).
In recent years, Cohen's work has attracted increasing media attention. Discovery OnLine broadcast the painting machine in action directly from his studio onto the World Wide Web in 1996, and AARON has
been featured on a number of TV programs including Beyond 2000, Scientific American Frontiers, a program on programming for the BBC's Open University Series and It'll Never Work, a BBC program for children.