A d v a n c e Praise f o r B l o n d i e z 4
David Fogel’s “Blondiez4 ” argues convincingly that the future of artijic...
21 downloads
777 Views
18MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
A d v a n c e Praise f o r B l o n d i e z 4
David Fogel’s “Blondiez4 ” argues convincingly that the future of artijicial intelligence lies not in programs that act like people but rather in programs that can automatically improve themselves over time-without the bias o f human knowledge. This book describes an impressive success story of the evolutionary approach to AI in an engaging andjlowing style. I wholeheartedly recommend it to thegeneral public and AI experts alike. Dr. Peter Stone, Artificial Intelligence Principles Research Depart-
ment, AT&T Labs Research A n absorbing and enchanting tale of a personal quest for the deeper meaning of AI: the discovery of how intelligence itseyarises. Fogel seizes the challenge by capturing the evolutionary process and shaping it to breed a checkers expert from an artijicial neural net. Scientists, humanists, and artists will appreciate his inspiring wit and clarity of thought in narrating the growth of Blondiez4, a synthetic sentience born inside a desktop PC. Nicholas Gessler, Director, UCLA Center for Computational Social Science This enthralling book willfascinate those with an interest in artijicial intelligence, machine learning, and evolutionary computing. Moreover, it leads us to question the accepted visionfor attaining true AI. The book is also to be com-
mended for the thoroughness and ingenuity of the experimental methodology used to evaluate Blondie24. But, perhaps above all, it's a great read.
Dr. Ian Watson, Department of Computer Science, University of Auckland David has written an important and influential book. Not only is the discussion of Blondie2 4, the cute checkers-playing heroine of the book, a lively romp through the ins and outs of evolutionary programming but it sets the stagefor David's more serious and far-reaching discussion of what is right and wrong in our quest for a companion intelligence.
Earl Cox, Vice President and Chief Scientist, Panacya, Inc. Blondie24 is a fascinating and informative book that will be absolutely engrossing for anyone with an interest in artificial intelligence and computers. Although masters-level checkers programs have been around for a while, they have all used brute force to achieve their goals. The Blondie24 project represents the first serious attempt since Samuel's experiments in the 195 o s to do something much more interesting and elegant: create a checkers program that can learn on its own. This book is easily accessiblefor the uninitiated, and I guarantee that you'll be swept along.
Gill Dodgen, creator of the computer program World Championship Checkers My A I students will love this engaging and instructive book, and it will fit perfectly into my HAL-based course on A l programming. This book will do much in establishing the connection between artificial evolution and artificial intelligence. The ice of the "AI winter" is at last beginning to melt and itfeets good/
Dr. Julian Miller, School of Computer Science, University of Birmingham
B
L
0
N
D
I
E
2
4
This Page Intentionally Left Blank
David B. Fogel, Ph.D.
Playing at the Edge of A I
|
M
[~
MORGAN KAUFMANN PUBLIShErS
An Imprint
ofAc,~_DVMIC VRrSS
A Division of Harcourt, Inc. San Francisco San Diego New York Boston London
Sydney Tokyo
Senior Editor: Denise Penrose Publishing Services Manager: Scott Norton Editorial Coordinator: Emilia Thiuri Cover Design: Janet Wood Cover Image: Ren~ Rutledge Text Design: Janet Wood Illustration: Bill Nelson Composition: Integrated Composition Systems Copyeditor: Mimi Kusch Proofreader: Janet Reed Indexer: Pat Deminna Printer: Courier Corporation Permission to reprint the following images is gratefully acknowledged: figure 12, walking stick, courtesy Mark Newman / Image State; figure 13, leafy seahorse, courtesy ANT Photo Library; figures 14 and 15, bees, courtesy Edward S. Ross; figures 36, 37, 6 I, and 62, screenshots, courtesy Microsoft Corporation; figures 66, 67, 68, 72, screenshots, courtesy Professor Jonathan Schaeffer. Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks. In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding trademarks and registration. Morgan Kaufmann Publishers 340 Pine Street, Sixth Floor, San Francisco, CA 94104-3205, USA http : / /wum~. mkp.com
Academic Press, a Division of Harcourt, Inc. 525 B Street, Suite I9OO, San Diego, CA 921o1-4495, USA http : / /wuaw. academicpress, com
Academic Press Harcourt Place, 32 Jamestown Road, London, NWI 7BY, United Kingdom http : / /www. academicpress, com
9 20o2 by Academic Press All rights reserved Printed in the United States of America 06 05 04 03 02
5 4 3 2 I
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means--electronic, mechanical, photocopying, or otherwise--without the prior written permission of the publisher.
Library of Congress Control Number: aooio94348 ISBN: 1-5586o-783-8 This book is printed on acid-free paper.
For Kumar and for Arthur Samuel (i 9 o 1-199 o) and Allen Newell (192 7-1992)
This Page Intentionally Left Blank
Everything must wait until its time; science is the art of the possible. --Allen Newell
Contents
xi
ACKNOWLEDGMENTS INTRODUCTION
...
Xlll
-
1
SETTING THE STAGE
3
I N T T L I J G E N T M A C H I N E S : I M I T A T I N G LIFE
DEEP BLUE: A TRIUMPH FOR AI?
19
BUILDING AN ARTIFICIAL BRAIN
37
EVOLUTIONARY COMPUTATION: PUTTING NATURE TO WORK BLUE HAWAII: A NATURAL SELECTION CHECKERS
85
97
CHINOOK: THE MAN-MACHINE CHECKERS CHAMPION SAMUEL’S LEARNING MACHINE THE SAMUEL-NEWELL CHALLENGE
2
129 151
THE MAKING OF BLONDIE
10
EVOLVING IN THE CHECKERS ENVIRONMENT
I1
INTHEZONE
187
163
113
69
I2
A REPEAT PERFORMANCE
213
I3
A NEW DIMENSION
14
LETTING THE GENIE OUT OF THE BOTTLE
I5
BLONDIE24
233 257
273
EPILOGUE: THE FUTURE OF ARTIFICIAL INTELLIGENCE APPENDIX: YOUR HONOR, I OBJECT! NOTES
32I
INDEX
393
ABOUT THE AUTHOR
405
305
299
Acknowledgments
I have many people to thank for their efforts in helping me write this book. Peter Bentley, Mike Boughton, Monica Buonincontri, Gary Fogel, Larry Fogel, DougJohnson,Jim Kennedy, and Ian Watson offered technical and editorial comments about the manuscript. Tom English was especially helpful, reviewing successive revisions. Pete Angeline, Bill Porto, and I shared many interesting discussions about Blondie24 along the way. Hans Moravec kindly provided data on the historical ratings of chess programs, which I've used in chapter 2. Jonathan Schaeffer, the author of the world's best checkers program, Chinook, offered comments and corrections to chapter 7. I thank Jonathan for his encouragement as well. Gil Dodgen, the author of Checkers 3.o and World Championship Checkers also offered comments and corrections to the book. I thank Gil for his enthusiasm and generosity. Denise Penrose of Morgan Kaufmann deserves special thanks, not only for publishing the book but also for enlisting her first-rate staff~ including Emilia Thiuri, Scott Norton, Georgina Edwards, and Sheri D e a n ~ a n d for arranging for me to collaborate with Jeannine Drew on the final version of the manuscript.Jeannine's keen insights about revising and restructuring the text were invaluable. Thanks also go to
the IEEE and Elsevier Science Publishers for permission to reprint portions of my technical publications. Finally, I would like to thank Kumar Chellapilla for all the time he spent with me on this research. Kumar not only helped make the effort a success, he also helped make it fun. O f course, my wife, Jacquelyn, also deserves thanks, not only for her support but also for letting me play checkers on the weekends, month after month after month. David B. Fogel La Jolla, California
XII
ACKNOWLEDGMENTS
Introduction
Meet Blondie. She's a twenty-four-year-old graduate student in mathematics at the University of California at San Diego. She was named after the comic strip character, and she's a natural blonde, if you're curious. Blondie's very athletic, she skis and surfs, and she's an ace at math. But her real claim to fame is her ability to play checkers. She's really good. She isn't good enough to defeat grand masters, but she did earn a spot in the top five hundred of an international checkers website. She even won an online checkers tournament once. Considering that Blondie taught herself how to play, without reading any books or relying on anyone for instruction, that's pretty impressive. And considering that Blondie is just a computer program, and her persona is only a product of my imagination, you might say that's really impressive. This book tells the story of how Blondie24 (her official Internet name) came to play checkers much better than her creators do. More important, this book describes a new approach to designing intelligent machines. It's borrowed from an old approach, one that nature has used for billions of years: evolution.
Blondie is the product of a computer program that Kumar Chellapilla and I developed. The program emulates the principles of Darwinian evolution~random variation and selection. Through this twostep evolutionary process, Blondie discovered innovative ways to play the game and rose to the level of a human checkers expert, even performing well against masters on occasion. She's the result of hundreds of generations of evolution, all simulated on a computer. Blondie had to outwit, outplay, and outlast hundreds of programmed cybercompetitors. She's a real survivor. What really sets Blondie apart from other artificial intelligence programs is that Kumar and I didn't provide her with many hints about how to play the game. Historically, computer chess and checkers programs have relied on human expertise, and lots of it. The computer is programmed with rules based on what's already known about how to play the game, and it is expected to execute those rules faster and more accurately than humans can. The machine uses databases to look up specific moves, either from recorded openings from games played by grand masters or from endgame tables in which the computer enumerates every possibility and pinpoints the perfect play. Blondie, by contrast, didn't have access to such human expertise. We didn't transcribe opening moves from previous grand master games for her. We didn't download a database of endgame situations into her memory. And Blondie didn't learn from u s ~ K u m a r and I are relatively weak checkers players. No, Blondie had to learn how to play on her own, using the information contained in the number, location, and types of pieces on
XlV
INTRODUCTION
the board. Even with this rudimentary knowledge, Blondie evolved into an expert. Navigating This Book
This book covers some of artificial intelligence's history and also explores its future. To make my story complete, I've provided a little background on artificial neural networks, evolutionary computation, and a mathematical concept called a "landscape." I've also included an overview of how others have designed programs to play intellectual games, such as chess and checkers. As you'll discover, the Darwinian approach that Kumar and I used stands in marked contrast to these other efforts. This book is structured so that it can be read without referring to the notes. However, for readers who want to gain a deeper understanding of the subject matter, I've placed a great deal of reference information and discussions on related points in the notes.
The Road to H A L
Recently, we witnessed the turn of the new millennium. With the passing of the year 2ooo, it's natural to speculate on the prospects for creating an intelligent machine, such as Arthur C. Clarke's HAL from the movie 2 o o 1 : A Space Odyssey. Some have predicted that HAL waits just a few decades in the future. Computer speeds are increasing rapidly. Before long, they argue,
INTRODUCTION
XW
desktop machines will compute more operations per second than our brains can. Intelligent machines such as HAL are an inevitable consequence of this progression. It's time for a reality check. If we don't change the focus of our efforts in artificial intelligence, we may never create HAL. To date, artificial intelligence has focused mainly on creating machines that emulate us. We capture what we already know and inscribe that knowledge in a computer program. We program computers to do things--and they do those things, such as play chess, but they only do what they were programmed to do. They are inherently "brittle." But HAL is a learning machine, a machine that can adapt its behavior to different environments. To make HAL, we'll need computer programs that can teach themselves how to solve problems, perhaps without our help. To date, traditional artificial intelligence methods haven't been very successful in this regard. O u r challenge is to create a machine that is itself creative, that learns to behave in circumstances we can't possibly anticipate. Natural evolution has designed carbon-based machines that meet this challenge. My experience with evolutionary computation has convinced me that evolutionary algorithms will meet this challenge eventually as well. In some regards, they already do. Come read my story about Blondie and see for yourself.
XV!
INTRODUCTION
This Page Intentionally Left Blank
Intelligent
Machines:
Imitatin
2
Life
"I'm sorry, Frank, I think you missed it," said the computer HAL to Dr. Frank Poole, its human challenger seated at the chess table. "Queen to bishop three, bishop takes queen, knight takes bishop, mate." "Uh. Yeah, looks like you're right. I resign," replied the frustrated astronaut, obviously tired from his long trip to the planet Jupiter. "Thank you for a very enjoyable game," said HAL. "Yeah, thank you," Poole replied. When HAL exchanged these words with its human opponent in the ~968 movie 2 o o 1 ; A Space Odyssey it was more than just the end of a friendly game of chess. It was Arthur C. Clarke's prediction about where humanity would stand relative to machines at the turn of the century. Clarke's intuition about the technological future had often been prescient. He'd even described the possibility ofgeosynchronous satellites orbiting the earth before the advance of modern rocketry. Stanley Kubrick, who directed the movie based on Clarke's novel, brought Clarke's vision to life on the screen. Everything appeared plausible: artificial gravity on spaceships, commercial travel in space, and computers serving as automatic pilots. Nevertheless, Clarke's vision of 2oo I erred in three important details.
The first error was his prediction that we would have a continuous presence on the moon, a lunar base. The second error was Clarke's conviction that we would have spacecraft transporting us to other planets (such as Jupiter). But the third and most significant error of foresight was that computers would be as smart as we are, fully conversant in English or another language, with signs of emotion, intelligence, and consciousness. I claim this last error to be the most significant one, because such an achievement is currently beyond science; given twentieth-century technology, it's purely science fiction. The other two erroneous predictions are really just a matter of time. It's inevitable that we'll return to the moon, and we could send people to Mars, Jupiter, or beyond, even today, if we chose to take up the challenge. But we have no idea of how to create HAL, a machine that is indistinguishable in its processes from those we associate with our own cognition, our own intelligence.
Mirror, Mirror All our attempts to generate artificially intelligent machines have failed to realize the dreams of the pioneers in computer science w h o envisioned machines that could "think" faster than us, without error, that never grew old, never got sick, and never got tired. O u r best minds have worked on building smart machines for more than five decades. Yet we've not only failed to create a machine as intelligent as HAL, we've also failed to recognize a pathway that would lead to its production. Why? 4
S E T T I N G T H E STAGE
The main cause of our failure stems from a decision made fifty years ago, when computer science was in its infancy. Rather than carefully define what we meant by an intelligent machine so that we could build such a device from first principles, we set aside such a definition and tried to program a computer to "act like us" in some limited ways. This approach often meant getting computers to perform a discrete human activity, such as playing chess, by feeding enormous amounts of our knowledge into the computer and having the machine regurgitate that knowledge in the hopes that it would behave as we do. Unfortunately, programming computers to behave like human beings became the central concept in artificial intelligence, setting back our efforts to design truly creative machines for decades. To help you understand why this seeming contradiction is in fact the truth, I need to take you back fifty years, to the creation of what has become the most famous criterion for artificial intelligence (AI), and at once also the death of artificial intelligence in the twentieth century.
Turing's Test: The Holy Grail of AI In the early stages of computer science, many researchers had high hopes about the prospect of getting machines to think. But the fundamental question, What is thinking? was often overlooked or skirted. In 1950 the famous mathematician Alan Turing wrote a seminal paper in the technical journal Mind called "Computing Machinery and Intelligence." He too skirted the question by introducing a new INTELLIGENT
M A C H I N E S : I M I T A T I N G LIFE
S
question. In his words: "If the meaning of the words 'machine' and 'think' are to be found by examining how they are commonly used it is dimcult to escape the conclusion that the meaning and the answer to the question 'Can machines think?' is to be sought in a statistical survey such as a Gallup poll. But this is absurd. Instead of attempting such a definition I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words."1 Turing's question was centered on the following thought experiment. Imagine two rooms, as shown in figure I. Three people are involved, a man (A), a woman (B), and an interrogator (C). The interrogator is separated from the man and woman, in a closed room, but gets to ask questions of them both using a terminal connection,just as we would do in an Internet chat room. The interrogator's objective is to determine which one is the woman. The man's objective is to fool the interrogator into thinking that he is the woman. The man can lie, if he so desires. In contrast, the woman's objective is to help the interrogator. Note that Turing's restriction allowing for only electronic communication eliminates any possibility that the interrogator could come to a decision based on the pitch of the respondent's voice or on any other sensory clues. Turing then replaced the original question, Can machines think? with the following: "We now ask the question, 'What will happen when a machine takes the part of [the man] in this game?' Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman?" Presumably, if the interrogator weren't able to show any increased abil6
S E T T I N G T H E STAGE
ity to decide between the woman and the man, as opposed to between the woman and a machine, then the machine would be declared to have passed the test. 2 Turing's test (also known as the Turing Test) became the Holy Grail of artificial intelligence. Perhaps the test gained this notoriety not just because of its intuitive appeal and because of Turing's reputation as a mathematician, but also because it was most likely the first formal criterion offered for judging machine intelligence. Over decades, however, the test took on a slightly different character. Today, when most people speak about the Turing Test, they imply that the test involves a machine fooling us into thinking that it's human. Some people corrupt the test even further and speak about the machine fooling us into thinking that it's a man. The subtle difference between Turing's proposal and this corrupted version is important: The point of Turing's test isn't to determine if a machine can fool you into thinking that it's human, or that it's a man. The point is to determine if the machine can fool you into thinking that it's a woman as often as a man can fool you into thinking that he's a woman. 3 As we've found out in the last decade of the Internet, it's fairly easy for a man to trick an interrogator into believing that he's a woman! Any machine that "seeks" to pass the Turing Test will have a very high hurdle to clear.
Turning to Games W h e n Turing proposed his test in 19 5o, no contemporary computer could come close to passing it. Even trying to pass the test was an INTELLIGENT
MACHINES:
IMITATING
LIFE
7
Woman (B)
Man (A)
=l Interrogator (C) FIGURE: !
In the Turing Test, an interrogator (C) asks questions of a man (A) and a w o m a n (B) in an attempt to identify which one is the woman. Turing suggested replacing the man with a computer and asking whether the computer can do as well as the man at fooling the interrogator into thinking that it's the woman.
effort in futility: Computers were simply too slow, and at any rate, there were no viable algorithms to test. So instead the focus on artificial intelligence drifted toward more limited domains. Rather than address the problem of interacting with an interrogator who can ask questions like H o w does it feel to be pregnant? it was deemed more reasonable to treat aspects of human behavior presumed to require intelligence. Games made for a good test bed because the rules are well defined, the actions are limited in scope, the outcome is clear, and in games like chess and checkers, intellect, as opposed to pure luck, plays a large role. The new, and mostly unstated, derivative of the Turing Test then became Can a machine play a game of skill as well as a human being, so well that a person playing against the machine can't discern the difference? From there, it was a simple course to try to generate the best gameplaying program possible. Whatever the means for creating the program, all that mattered was whether the result could compete with people. If the program happened to compute solutions to problems the same way that a human expert would, then that would be so much the better. 4
The A I Winter In our attempt to coax computers to act like we do, we spent years learning how to interview human experts and transcribe how they make decisions in their own domains of application into if-then rules or other mathematical formulas that a computer could interpret. Much INTELLIGENT
MACHINES;
IMITATING
LIFE
9
of this effort is now viewed as an instructive failure, the "AI winter," a testament to the difficulty of translating human expertise into independent useful chunks of knowledge. Incremental improvements in a typical AI program's performance often required unmanageable order-of-magnitude increases in the number of rules. There were even rules that told you when to ignore other rules. Processing all the required rules was time consuming, to say the least. Some research groups spent great sums of money designing hardware dedicated specifically to these knowledge-based computer systems. If only the systems would compute faster, then they'd be smarter, we were told. Unfortunately, this dictum was a red herring. No doubt you own a handheld calculator. Imagine speeding it up by a factor of one thousand. Would it be any "smarter," any more "intelligent"? No, it would just be faster. Speeding it up might make it fast enough to compute a solution to some heretofore difficult problem, but it wouldn't fundamentally change its character.
It's Time for a Change N o w that we've created programs that defeat the best human beings in chess, checkers, and other games, there's a renewed sense of disappointment. These programs, like the celebrated Deep Blue, which bested chess champion Garry Kasparov, really haven't made any contribution to understanding intelligence or to understanding how intelligence developed in the first place.
! O
S E T T I N G T H E STAGE
The vague and undefined quality of intelligence has been a major source of difficulty, a situation that continues to this day. Instead of focusing on the fundamental processes of intelligent behavior, we've expended considerable effort in the attempt to write programs that simply demonstrate a level of competence that is on par with people's, thereby satisfying some version of a Turing Test, at least in a limited sense. Some simple examples, however, indicate that this goal alone is insuflqcient: (I) a calculator can find the square root of 29,241 much faster and more accurately than a person can; (2) a watch can keep time more accurately than a person can; and (3) a radar gun can determine the speed of a baseball pitcher's fastball more accurately than a person can. (Note that all these tasks require only mental, not physical, ability.) Many similar pattern-recognition tasks can be performed better by machines than by people. Devising tools that can assist us with these tasks, or even outperform us, makes those tools useful; it does not by consequence make them intelligent. The Turing Test is no more a test for intelligence than it is a test for femininity. Certainly, even if a man could imitate a woman perfectly, he would still be a man. Imitations are just that: imitations. A man doesn't become a woman because he can fool you into thinking that he's a woman. By the same token, a machine doesn't become a thinking machine, an intelligent machine,just because it can fool you into thinking that it's thinking. Maintaining the illusion of intelligence requires that someone be fooled. If artificial intelligence is defined operationally
INTELLIGENT
MACHINES;
IMITATING
LIFE
1 1
around fooling people, then it makes the quest for thinking machines nothing more than a mirage on the horizon of computer science, a sorcerer's trick. That's not what the dream of artificial intelligence is all about. No research framed around a definition of artificial intelligence that relies on illusion will ever be able to explain the origin of intelligence in living systems. Nor will it be able to form the basis of a theory for how a similar intelligence can be achieved in machines. Nature also provides the best hint at how to design and construct machines that really can think for themselves, computers that can create their own solutions to p r o b l e m s ~ e v e n to problems for which we don't have the solutions. The key to creating truly creative computers lies in mimicking nature's process of evolution.
Back to Nature
Having raised the issue of AI's poor track record at defining intelligence, I'm obliged to offer an alternative. Rather than focus on our own intelligence and the cognitive processes of the human mind, let's begin by asking what it takes for a system--any system--to be intelligent. One fundamental characteristic of any intelligent system is that it makes decisions. We can define a decision as the choice of how to allocate available resources (energy, time, or food, for example). A decision maker must respond to its environment by choosing one particular course of action over another. The environment provides a stimulus, and the decision maker generates a response. (We'll see in 1 2
S E T T I N G T H E STAGE
later chapters how a computer can optimize its own stimulus-response behavior.) To favor one response over another, the intelligent decision maker needs a goal. A goal provides a criterion for measuring the decision maker's degree of success, a means for assessing which option to favor. Without a goal, decision making is pointless, because there's no basis for choosing one action over another. 5 (We'll also see in later chapters how we can program a computer to teach itself to favor certain actions over others.) Where do goals come from? In nature, evolution instills the primary goal of survival into the genes of each extant species. Living creatures exist in a bounded world of finite resources that are required for survival, and they compete for those resources. Natural selection, in turn, eliminates those organisms that don't acquire sufficient resources. The genes of those ineffective individuals are removed from the evolutionary process by means of the rule "the dead shall not breed." In contrast, the genes of the survivors are passed along to progeny. To the extent that the behaviors favoring survival have a genetic basis, the goal of survival is reinforced in those genes in every generation through intense competition. 6 Survival is both the primary goal and the ultimate measure of successful decision making in the natural world. Organisms that can adapt their behavior to survive the demands of an antagonistic environment demonstrate their success. It's fitting, then, to define intelligence in terms of adaptive behavior. Intelligence is the property that allows living organisms to sense, react to, and learn from their environment in order to adapt their behavior to better promote their survival. 7 INTELLIGENT
M A C H I N E S : I M I T A T I N G LIFE
] 3
We shouldn't, however, restrict the definition of intelligence to living organisms, for it's possible to construct mechanical decision makers. More generally, then: Intelligence is the capability of a decisionmaking system to adapt its behavior to meet its goals in a range of environments. 8 This definition of intelligence can be applied to humans, dogs, a colony of ants, an evolving species of insects, or even an evolving population of robots or computer programs.
The Repeated Pattern of Adaptation Definitions are neither right nor wrong, but some are more useful than others. I argue that this definition of intelligence is more useful in the context of constructing intelligent machines than is a definition that focuses on our own uniquely human intelligence. Defining intelligence in terms of adaptive behavior is more useful because it reveals a repeated pattern of adaptation that can be exploited in a simple algorithm. Adaptation, the process of fitting behavior to meet goals in light of environmental demands, requires a reservoir of learned knowledge and a means for varying that knowledge. Knowledge must be stored or it becomes ephemeral and the adaptive organism can't hold on to the gains that it makes. Knowledge must also be subjected to variation, however, or else the organism can't change its behavior to meet new demands. At a high level of abstraction, every learning system employs the same general process for adapting behavior. The system gleans a reservoir of knowledge from interactions with an environment, the sieve | 4
S E T T I N G T H E STAGE
of selection (natural or artificial) culls out incorrect "knowledge," and the learning system invents new variants of its old ideas that are tested against environmental demands. This process of random variation and selection~the very core of evolution~occurs not just in evolving lineages but also in individuals and social groups. 9 Each system uses a different substrate (individuals use neural circuitry, social groups rely on "culture," evolving lineages rely on genetics), but the pattern is the same.
Exploiting the Pattern We face a choice in designing intelligent machines. One choice is to continue emulating specific manifestations of intelligent behavior as we observe them in ourselves (or even in ants, dogs, or other living creatures). This approach seeks to model the end products of a long process of evolution without asking why or how those products emerged. Details become important for their own sake. This "bottom-up" approach poses a great potential for reversing cause and effect. For example, suppose we wanted to design a flying machine. We might look to nature for inspiration and see a vast array of feathered birds flapping their wings. But in emulating those specific manifestations of flight, we'd be led astray. Neither feathers nor flapping wings is a cause, but rather an effect. It's no surprise that we've failed to build a practical man-carrying ornithopter. Alternatively, we can adopt a higher-level and more abstract perspective that exploits the common ground found across all learning I N T E L L I G E N T M A C H I N E S ; I M I T A T I N G LIFE
! 5
systems. This "top-down" approach seeks out repeated patterns in systems and does not lead us astray. Considering my example of aerodynamics, the top-down approach focuses on the countering forces of lift and gravity, thrust and drag, and air flowing over an airfoil. We induce the causative nature of these factors by examining repeated patterns found across all "lifting surfaces," not just the flapping featheredwings of birds. In so doing, we can now design aircraft completely in simulation, without a single flight test. This top-down approach of searching for repeated patterns offers a superior prospect for identifying causative factors. The pattern repeated in all natural learning systems is an evolutionary process of adaptation by variation and selection. Evolution, then, provides a simple yet complete prescription for programming a learning machine, a computer that can adapt its behavior to meet goals in a range of environments and to generate solutions to problems that we don't yet know how to solve.
The Evolutionary Approach to Artificial Intelligence We can continue to pour the facts we know into a computer program, looking to our own intelligence, but facts alone can't create HAL. HAL isn't a mere collection of facts, but rather a learning machine. We've programmed thousands of facts into the common scientific calculator, but it doesn't learn. We've programmed thousands of facts into "expert systems" for medical diagnosis and other applications, and yet these programs often fail to handle simple situations, and more important, fail to learn how to avoid such failures. 6
S E T T I N G T H E STAGE
We've programmed knowledge into Deep Blue, and yet Deep Blue can't do anything but play chess. It can't even make the first move in a game of checkers and, most important, it has no provision for teaching itself how to do this. Knowledge is wonderful, but learning is the key element missing from the majority of efforts in what's routinely called artificial intelligence. Programs that are incapable of learning will never solve the problem of how to solve problems. To build HAL, we'll need programs that can adapt to new situations and garner their own knowledge. As we'll see, nature's design principle of evolution points the way.
I N T E L L I G E N T M A C H I N E S : I M I T A T I N G LIFE
! 7
This Page Intentionally Left Blank
Deep A
Triumph
for
Blue: AI?
The average person will tell you that artificial intelligence is about getting machines to think. We've all seen "smart" computers in the movies, like 2 o o I : A Space Odyssey, Star Trek, Tron, The Matrix, War
Games, Terminator, Bicentennial Man, and many others. But they're just tinsel from Tinseltown. Hollywood can make machines that seem to come alive with intellect, even consciousness, simply by playing with pixels projected on the silver screen. But what about real artificial intelligence? What about a machine that can adapt its behavior to solve problems on its own? Truly brilliant people have been working for more than fifty years on the problem of getting machines to "think" for themselves.Yet despite all the time, money, and intellectual horsepower that science has devoted to the problem of creating artificially intelligent machines, we haven't made much progress. Certainly, the early pioneers of AI thought we'd be farther along by now. Alan Turing offered in 1950 that by the year 20oo, computers would be able to pass his test about 7o percent of the time. 1 The fact is, we're nowhere near creating a program that can fool you into believing that it's a woman as often as a man can. The Nobel laure19
ate Herbert Simon predicted in 1958 that a computer would become the world champion at chess within just ten years. 2 And why not? Computers could process the same rules that chess experts rely on, only faster and without "human error." Still, we had to wait not ten years, but forty, before Deep Blue, IBM's supercomputer, defeated Garry Kasparov, the world's greatest chess player, in a head-to-head match that finished on May I I, 1997. W h e n you ask people to point to an example of a real artificially intelligent machine, most will point to Deep Blue. The chess program is so well known that it enjoys as much name recognition as celebrities Howard Stern and Carmen Electra. 3 The Guinness Book of World Recordslists the supercomputer as "the most advanced form of artificial intelligence."4 But if Deep Blue is truly the triumph of artificial intelligence, then we've had very little success in getting machines to think, for Deep Blue has next to nothing to do with computers that "think." It's mainly a very fast calculator designed for one task: playing chess.
The More Things Change... Deep Blue is the culmination of a long line of efforts, begun five decades ago, to program computers to play chess. Certainly, computer technology has progressed tremendously during the last fifty years.Yet the basic design for chess programs has remained almost constant. It's fitting, even revealing, to return to the earliest efforts
ZO
S E T T I N G T H E STAGE
at designing chess programs and to understand their essential elements. In 195o Claude Shannon, one of the founders of the branch of mathematics known as information theory, was also one of the first researchers to propose an automatic method of playing chess (although he didn't actually implement a chess program). 5 Shannon suggested that a mechanical algorithm could play the game if that algorithm involved two facets: an evaluation f u n c t i o n ~ a mathematical formula that assessed alternative features about different board positions and a rationale, which he called "minimax," that sought to minimize the maximum damage that the opponent could do in any situation. 6 Shannon proposed that the evaluation function could generate a value for any arrangement of pieces on the board. The function would quantify how good or bad each arrangement was. Working off the evaluation function, Shannon's minimax procedure then provided a way to evaluate the possible alternative positions in light of their associated values. Minimax favored positions that provided the least advantage for the opponent. During the five decades since Shannon proposed his method, these two ingredients were staples of almost every recipe for programming computers to play chess. Even Deep Blue works on the same principles. The differences between Deep Blue and Shannon's prescription lie mainly in the specific components of Deep Blue's evaluation function and the incredible speed of its hardware. As we'll see later in the chapter, the speed of Deep Blue's machinery is really the primary factor in its success.
DEEP BLUE: A T R I U M P H
F O R A I .~
2 $
The Precursors to Deep Blue The progression of programs that led to Deep Blue began shortly after Shannon's seminal contribution in 195 o. Turing wrote one of the first algorithms for actually playing chess, which he published in 1953. He never completed programming the algorithm on a computer, but he played one game by hand against a "weak human opponent" and lost. Turing, a relatively poor chess player, attributed the weakness of his program to its "caricature of his own play. ''7 The first documented working chess program, created by scientists at Los Alamos, came just three years later, in 1956. 8 It took another ten years before Machack VI, a program written by Greenblatt, Eastlake, and Crocker, racked up the first defeat of a human in tournament play.9 After that, the performance of the best chess programs increased consistently. In 1966 Machack was rated at about 1,25o, far below the grandmaster rating needed to challenge the world's best players.But by 1978, Northwestern University's Chess 4.7, the best available chess program, was rated just below 2,I oo. The program defeated Scottish chess champion David Levy in a tournament game.Just five years later, AT&T's Bell Laboratories' program, called Belle, was the first to qualify for a title of U.S. Master, with a rating of 2,3oo. Figure 2 indicates the ratings of various programs from the I96OS all the way up to Deep Blue.
Computing Chess Ratings To understand figure 2, we need to review the chess rating system. Chess programs, just like chess players, are evaluated using a univer2Z
S E T T I N G T H E STAGE
3,000 2,750
-
2,500
-
2,250
-
\
Chess 4.7
D e e p Blue
.~ 2 , 0 0 0 _r 1,75o
-
1,500
-
1,250
-
1,000
-
750 1955
Belle
9~
9
i 1960
i
i
1965
i
MacHack 6
i
1970
i
i
i
1975
i
1980
i
i
1985
i
i
1990
i
i
1995
i
2000
Yearn FIGURE
2
The graph shows the increase in the rating of chess programs for the past thirty-five years.
sal rating system that assigns points based on the level of their competition and their resulting performance.You'll find the basic formulas for computing a player's rating in the notes at the back of the book. 1~ The concepts underlying the formulas are simple, and to assist those of you who don't want to delve into the mathematics, I'll explain the concepts in words. Suppose you have a rating of 1,5oo, which places you in the Class C category, and you play against a better opponent, who has a rating of 1,9oo, which places her in the Class A category. DEEP BLUE" A T R I U M P H FOR AI?
23
Class Grand (Senior) Master Master Expert Class A Class B Class C ClassD Class E Class F Class G Class H Class I Class J
Rating 2400+ 2200-2399 2000-2199 1800-1999 1600-1799 1400-1599 1200-1399 1000-1199 800-999 600-799 400-599 200-399 below 200
TABLE: t
The names of categories based on their associated ratings. (Table i provides the names for the different categories based on their associated ratings.) The odds are that your opponent, who is four hundred points better than you, is going to win the match. The rating system is built so that someone who is four hundred points better should win about 9o percent of the time. The formulas reward you for winning and penalize you for losing, but they do so in proportion to the presumed likelihood that you were going to win. If your 1,9oo-rated opponent defeats you, your opponent's rating will climb about three points.Your rating will simultaneously fall about three points. But if you're fortunate enough to win the match, your rating will increase twenty-nine points, and 24
S E T T I N G T H E STAGE
your opponent's rating will plummet by the same amount. Figure 3 shows how many points you could expect to earn if you had a 1,5oo rating and defeated someone with a rating varying from I,Ioo to 1,9oo.
The rating system is designed to make smaller adjustments to your rating once you climb above 2,I oo points, and again even smaller adjustments once you climb above 2,400. It gets quite difficult to reach stratospheric ratings, because unless you're continually playing the very best opponents and defeating them, you'll earn only a few points for a win. It simply isn't worth the time for a master rated at 2,3oo to play someone in Class G at 65o; the easy win wouldn't earn the master even one full rating point. Looking back at figure 2, we see that Chess 4.7 outshined Machack by almost nine hundred points. There was only a decade of time between the two programs, but light years of distance; by the rating formulas, we'd expect Chess 4.7 to beat Machack more than 99 percent of the time. Belle, in turn, should beat Chess 4.7 more than 75 percent of the time.Yet, as we'll discuss shortly, in a hypothetical game between Deep Blue and Belle, the odds against Belle would be almost eight to one.Just how did Deep Blue get to be so good?
Hardware Makes It Happen The 198os saw a rise in efforts to make application-specific hardware that could evaluate large numbers of possible chessboards very quickly. Every new generation of computer hardware could process more DEEP BLUE: A TRIUMPH
F O R A I .~
2S
30
ee,F-
25 20
c ~ F_
10
nO 5
1,100
1,200
1,300
1,400
1,500
1,600
1,700
1,800
1,900
Rating of Opponent FIGURE 3
If you have a rating of 1,5oo , then the curve shows how many points you'll add to your rating if you defeat an opponent rated anywhere from I,IOO to 1,9oo.You gain the most when you beat someone who is much better than you.You'd gain sixteen points if you defeated another player rated I, 5oo, but fewer than five points for defeating someone rated I,I oo.
chessboards in the same a m o u n t o f t i m e and t h e r e f o r e h a d an a d v a n tage over older, slower hardware. T i m e is o f the essence in m o s t c h a m p i o n s h i p chess m a t c h e s , in w h i c h players m u s t m a k e t h e i r m o v e s in a l i m i t e d n u m b e r o f hours. 11 U s i n g faster h a r d w a r e to process m o r e boards in the same a m o u n t o f t i m e m e a n t that chess p r o g r a m s c o u l d l o o k f a r t h e r a h e a d a n d anticipate a g r e a t e r possible r a n g e o f m o v e s and c o u n t e r m o v e s . T h e m o s t 26
S E T T I N G THE STAGE
direct route to making better chess machines came from optimizing their hardware rather than their software. Deep Blue, the epitome of this optimization, emerged from the battle of two competing teams of computer experts at Carnegie Mellon University. Each team sought to take full advantage of tailoring computer circuitry to evaluate chess positions. One team, working on a system called Hitech, was led by Professor Hans Berliner. Hitech was the first computer to achieve a grand master (senior master) U.S. rating. The other team, working on a system called ChipTest, was led by doctoral students Feng-hsiung Hsu and Thomas Anantharaman. With IBM sponsorship, ChipTest evolved into Deep Thought and ultimately Deep Blue, which led a nearly linear progression in the rating of chess programs up toward 2,7o0 by 1995,in large part from the benefits of specialized hardware designs. 12 In fact, most of the gain in the rating of chess programs, from start to finish, can be attributed directly to the simultaneous gain in computer processing speed. Figure 4 shows that the progression of ratings was nearly linear in any decade, with signs of saturating in the late 199os. Keep in mind that during the entire time depicted in the graph, from 1957 to 1997, computer speeds were doubling, on average, every eighteen months. Faster computers can "see" farther ahead in the same amount of time and therefore gain an advantage over slower computers. It's reasonable to claim that the gains in ratings came not so much as a result of better evaluation functions but simply by consequence of employing faster machines. The data in figure 4 are quite suggestive: The curve that models DEEP BLUE: A T R I U M P H
F O R A I .~
Z7
3,000
...........................................................................................
2,750 ee
2,500 2,250
Belle, 1 9 8 3 1
r
2,000
,
1,750
Deep Blue, 1996 9
Chess 4.7, 1977
9
19~70S ,
9 9
1980s
j.:
,
1990s
9 9 9
e:
1,500 9
1,250
MacHack 6
1,000 750 1E-3 , '
.......
I
,
, I ~,,,1
.1
.........
I
........
I
10
........
I
......
rail
........
1,000
I
........
~
........
a
100,000
........
I
'r
1E7
Processor Speed (Millions of instructions per second) FIGU
RE
4
The increase in the rating of chess programs as a function of computer processor speed.
the increase in chess program ratings is independent of the evaluation function that each program used (despite considerable time and effort spent crafting each function). They all used reasonable variations on what to measure with regard to material (number and types of pieces) and position, butperhaps that didn't make much difference. What really mattered was the speed at which the computer cranked through the cornucopia of alternative positions, looking farther and farther into the future in the same amount of time. 28
SETTING THE STAGE
The final result was a systematic bludgeoning of the game of chess. No longer was it an elegant game to be mastered by intellect, but rather just a big mathematical graph of moves and countermoves, searched at blindingly high speeds by a very fast machine named Deep Blue.
Deep Blue Beats Kasparov In the mid-199os, speculation grew about whether Deep Blue, the best chess machine in the world, could beat Garry Kasparov, perhaps the greatest human chess player ever. 13 Kasparov was rated 2,8o6 before playing Deep Blue. TM A certain air of inevitability accompanied his defeat. The steadily increasing achievements of computer chess programs seemed like a rising tide that would invariably swallow up the world champion. The IBM team had assembled a monster machine to manipulate hundreds of millions of chess positions per second. Furthermore, they'd invested significant effort in adjusting Deep Blue's evaluation function, reportedly tuning the function to take advantage of Kasparov's particular style of play. 15 Would 1997 be the Year of the Machine, or would we have to wait a little longer? IBM's team had been expecting victory for many years, projecting a possible win as early as 1992.16 They'd tried and failed in 1996, losing by three games to one with two draws. Going into the fateful rematch in May of 1997, Kasparov had never lost a match. The public didn't expect this to be his first either: In a C N E T News Com Poll, 66 percent of respondents thought that Kasparov would prevail over Deep Blue. 17 D E E P BLUE: A T R I U M P H FOR A I .~
29
They were wrong. USA Todaycalled the outcome a "shocking defeat of world champion Garry Kasparov. ''18 It was unfortunate that before the competition, Kasparov commented that the match was "about the supremacy of human beings over machines in purely intellectual fields. It's about defending human superiority in an area that defines human beings. ''19 The match certainly would have been spun that way if Kasparov had won. But of course it wasn't about human supremacy at all. It was simply about the ability of IBM's team to capture what people already knew about playing chess and couple that with extremely fast computers in a parallel architecture. It was about assembling a thirty-two node high-performance computer in which each node had eight very large-scale integrated (VLSI) processors designed especially for evaluating chess positions at the combined rate of two hundred million per second. Imagine someone pressing the buttons of a calculator at blinding speed. That's what this match was about.
Machine Learning in Deep Blue: Close but Not Quite Let's dig a little deeper into how Deep Blue worked. Its program relied substantially on an evaluation function that comprised four basic elements that grand masters believe are important: material, position, king safety, and tempo. 2~ Material referred to the point value assigned to each chess piece. Conventional wisdom starts by assigning one point for pawns, three points for knights and bishops, five points for rooks, and nine points for a queen. 30
S E T T I N G T H E STAGE
O f course, these are just heuristics, and in certain cases it makes sense to give up more material for better position. Thus the position of the pieces on the board is crucially important. Position for Deep Blue also included a concept related to mobility, determining the number of squares that its pieces could attack. King safety was a measure of how safe the king was from attack. Finally, tempo related to h o w quickly the program was capitalizing on position relative to its opponent. The I99O version of Deep Blue used about I2O parameters in its evaluation function. 21 Each of the basic elements comprised multiple parameters. Each parameter had to be weighted in terms of its importance relative to the other parameters. The traditional way to find appropriate settings of these parameters was by hand tuning: The Deep Blue team used this approach, playing games with various parameter settings and adjusting them through trial and error. But the team also tried an alternative to hand tuning. They gathered nine hundred games from chess masters, then used an algorithm to adjust the parameters so that the evaluation from their function would produce the greatest match between the moves that the machine thought were best and those actually played by the masters. In this way, they applied a form of machine learning, called supervised learning, whereby a program tries to fit its behavior to a list of correct answers to a set of problems. Deep Blue's team, writing in the October I99O issue of Scientific American, offered, "We believe ours is the only major program to tune its own weights automatically." Perhaps this was true, but the team's method for machine learning presumed that they already knew all DEEP BLUE: A T R I U M P H FOR AI?
31
the right answers for their test cases, evidenced by the play of previous masters. To be sure, this effort to incorporate learning into Deep Blue's development was an important step, but the best we can hope to achieve with this approach is a fine distillation of the play of the best chess players so far to appear. Downloading human knowledge into a machine can't help the computer to discover new and innovative ways of attacking the very same test cases for which the computer is trained. What about the possibility of going beyond what human beings already have learned about chess? The future of artificial intelligence doesn't rely on having computers learn what we already know, but rather on having them learn what we don't know. Only then will computers be able to solve new problems in new ways.
Artificial Intelligence, Emphasis on Artificial Deep Blue was a significant engineering accomplishment, but from the perspective of designing intelligent machines it was more an admission of failure. After all, almost everything Deep Blue "knew" in its evaluation function was preprogrammed by people and/or based on the knowledge accumulated from chess grand masters. 22 Even with all the knowledge of previous games, endgame strategies, and gambits, it had taken IBM millions of dollars, specially designed hardware, and years of effort to capture that knowledge and finally process it fast enough to beat the world's greatest player. 23 Let's review the numbers. To beat Garry Kasparov, Deep Blue examined two hundred million different chessboards every second, each 3Z
S E T T I N G T H E STAGE
evaluated in light of those features that people have already come to believe are important. The average length of time per move in a championship match is three minutes. 24 This means that, on average, Deep Blue parsed through almost thirty-six billion different alternative future positions to assess which move to make at every point in the game. In contrast, how many positions did Kasparov consider per second? N o b o d y really knows for certain, but it's been estimated that he considered three moves per second. 25 (Some grand masters say that they only consider one move, the rightmove, but that claim has to be made more out of hubris than candor. It's certainly not much of a prescription for designing a computer program.) 26 Programming human knowledge into an algorithm isn't easy. Indeed, programming Deep Blue was a monumental effort. But once that knowledge is captured in a set of deterministic rules, the program's courses of action are simply canned routines. The program can't adapt its behavior to meet goals in a range of environments. So it is with Deep Blue: The program simply regurgitates what it's been told, how its human instructors told it to assess the alternative positions that it sees. The program becomes a useful tool to defeat a human expert, but with regard to that tool's intelligence, it might as well be a hammer.
Successfor Deep Blue: Defeatfor AI Given the close competition between Kasparov and Deep Blue, despite the massive computer resources that IBM applied, I can't help DEEP BLUE: A TRIUMPH
F O R A I .~
33
but believe that the match was more of a defeat for artificial intelligence than it was a success for IBM. IBM seems, in part, to agree. The Guinness Book of World Records, and much of the public, may consider Deep Blue the apex of artificial intelligence, but IBM shies away from declaring Deep Blue to be even an example of artificial intelligence: "Deep Blue, as it stands today, is not a 'learning system.' It is therefore not capable of utilizing artificial intelligence to either learn from its opponent or 'think' about the current position of the chessboard. ''27 IBM suggests, and quite rightly so, that an artificially intelligent computer should be a learning,computer, able to teach itself without relying on our knowledge.Yet the road to much of what has been called "artificial intelligence" is paved with programs that were drawn up much like Deep Blue, with no ability to learn anything new on their own. These "artificial intelligence" programs rely on human beings for all the answers to their problems. Such programs have nothing to do with intelligence; they instead merely recapitulate things we already know, just like Deep Blue does. 2s
Building an A I Program That Teaches Itself Rather than distilling human knowledge into a computer program, and hoping that we have hardware that's fast enough to exploit that knowledge, a really significant step in artificial intelligence would be to devise a program that could learn how to play expert-level chess on its own. The program would require the basics of how to play the game and knowledge of the different types of pieces and how they move, but little or nothing else. Simply by playing against itself, it would 34
SETTING
T H E STAGE
adapt and improve as it played through different games. It would learn concepts about the game, understand how to craft a strategy, and use its own j u d g m e n t to master its play. Something similar was played out in the 1983 movie War Games, which was released during heightened tensions between the United States and the Soviet Union. In the film, a seventeen-year-old computer hacker named David unwittingly breaks into a Department of Defense computer named W O P R (pronounced "whopper," and standing for War Operations Plan and Response) that maintains all the incoming data for ballistic missile defense. The W O P R also serves to simulate alternative scenarios for nuclear war, as highlighted in the following dialogue between the machine's supervisors, Richter and McKittrick:
Richter: The WOPR spends all its time thinking about World War III. Twenty-four hours a day, 365 days a year, it plays an endless series of war games, using all available information on the state of the world. The WOPR has already fought World War III as a game, time and time again. It estimates Soviet responses to our responses to their responses, and so on. Estimates damage, counts the dead, then it looks for ways to improve its score. McKittrick: But the point is, is that the key decisions for every conceivable nuclear crisis have already been made by the WOPR. 29 W h e n David breaks into the system, he thinks he's hacking into an entertainment software company's computer and tries to play a game called "Global Thermonuclear War." His hacking starts the W O P R playing out World War III scenarios and feeding information to the military forces as if it were real. D E E P BLUE: A T R I U M P H FOR AI?
35
Tensions escalate as the W O P R makes it seem as if we are coming closer and closer to the brink of war. But in the end, the W O P R explores the results of myriad nuclear exchanges and seems to make an analogy to tic-tac-toe, in which every properly played game ends without a winner. The WOPR's conclusion: "Strange g a m e . . , the only winning move is not to play." The movie audiences of the 198OShad no problem envisioning a machine that was smart enough to convince itself that nuclear war wasn't winnable. That seemed plausible enough. The mechanisms underlying the W O P R were never hinted at in the movie. It was simply "intelligent," and the audience was left to accept that this machine could play games, understand the outcome, try out alternative strategies, compare the results, and iteratively improve its performance. Audiences are allowed to accept things on faith. Engineers have to draw the blueprints. H o w can we build a computer like the W O P R that would play games like chess, teach itself the right moves, and adapt to new circumstances as it progressed without relying on people to provide the right answers? H o w can we build such an intelligent machine? One promising solution can be found by turning to evolution, nature's design'principle of random variation and selection. I'm going to tell you about some experiments that support this contention. These experiments involve the evolution of artificial neural networks, computer models of how simple brains might function. So before we discuss the experiments, let's examine the details of these brain models.
36
S E T T I N G T H E STAGE
B u i l d i n g an A r t i f i c i a l B r a i n
Hollywood often inspires our imagination. Think of an intelligent machine, a machine capable of adapting its behavior to meet goals in a range of environments, and you'll likely envision something from a movie. You might remember Arthur C. Clarke's HAL or perhaps a Terminator from one of James Cameron's films. These made-up machines, and many others from the realm of science fiction, use computers to facilitate their own thought processes. More specifically, they use computers to emulate the functions of our own brains. HAL had circuit boards of memory, based loosely on our brain's ability to remember things. The more recent versions of artificially intelligent machines that we see in the movies rely on "neural nets," circuits that mimic the way the human brain works. As Arnold Schwarzenegger's character said in Terminatorz :Judgment Day: "My C P U is a neural net p r o c e s s o r ~ a learning computer. The more contact I have with humans, the more I learn." The idea of capturing the essence of how our own neurons work and assembling a network of artificial neurons inside a computer is compelling. We could start by analyzing how individual neurons be37
have and program a computer to model them. We could then add more programming code to connect these individual artificial neurons, effectively letting them "talk to one another." Perhaps this endeavor would lead eventually to an artificial brain, a program that resides on a computer and works like a real brain.
Keeping It Simple In fact, scientists have been pursuing this course of action for many decades. Warren M c C u l l o c h and Walter Pitts, of the University of Illinois and the University of Chicago, respectively, published a seminal article outlining a step in this direction more than half a century ago. 1 M c C u l l o c h and Pitts understood that a real neuron "fires" w h e n a stimulus excites it beyond a certain threshold. Thus they suggested a simple, highly idealized model whereby a neuron takes on one of two states. In its resting (or quiescent) state, the neuron yields no output; however, w h e n the input activity to the neuron exceeds some limit, the neuron fires. Quite arbitrarily, w h e n a neuron fires it's said to output a value of ~;it remains with an output of o w h e n quiet. Figure 5 depicts the behavior of a single McCulloch-Pitts neuron. The neuron switches between quiescence and firing whenever its level of excitation switches from one side of the threshold to the other. You might think of this simple neuron as a "detector," firing w h e n ever something excites it. That's not m u c h different from the way a thermostat or a motion detector in a security system works. The secret to getting the neuron to detect what you want comes in choos38
S E T T I N G T H E STAGE
Below Critical Threshold, the Neuron Doesn't Fire L_
Above Critical Threshold, the Neuron Fires
Z
Critical Threshold 0
Incoming Neuronal Activity
FIGURE[ S
The McCulloch-Pitts model of a single neuron. The neuron fires when the incoming neuronal activity exceeds a specified threshold.
ing the right sensors that feed signals to it and in amplifying those signals appropriately. But there's only so much you can do with just one neuron. As a computing device, it's pretty limited. It's very easy to write a program that implements a single neuron, but that program wouldn't be very interesting. There's no way to make HAL out of just one neuron.
A Network of Neurons In 19 5 8 Frank Rosenblatt, of Cornell University, suggested that a network of neurons could produce a more versatile computing device. 2 BUILDING
AN
ARTIFICIAL
BRAIN
39
FIGURE 6
An extension of Rosenblatt's "perceptron" neuron network. The neurons are arranged into layers. The input neurons send sensed data to the hidden neurons, which in turn fire or fail to fire and send their signals on to the output neurons, which also fire or fail to fire. Notice that the hidden neurons do not connect to one another, nor do the input neurons refer to one another. The neural network processes data in a layered series of stages.
Rosenblatt arranged his artificial neural n e t w o r k in a series o f neural layers, m u c h like wafers. N e u r o n s in one layer were c o n n e c t e d to n e u rons in the next layer, in an analogy to the synapses that c o n n e c t the axons and dendrites o f our o w n neurons. In our brains, axons, dendrites, and synapses carry signals across a n e t w o r k o f neurons, m u c h like electricity is carried through a p o w e r grid. Similarly, in Rosenblatt's neural network, the connections between neurons carried the o u t p u t o f one n e u r o n to be sent as input to another. Along the way, a signal could be amplified or reduced, d e p e n d ing on the associated " c o n n e c t i o n strength" b e t w e e n the neurons. You can see a simple extension o f Rosenblatt's device, w h i c h he called a " p e r c e p t r o n , " in figure 6. I've provided a detailed description in the notes o f h o w such a neural n e t w o r k works, but here's a quick and illustrative example. 3
Computing with a Neural Network Suppose you w a n t e d to create an artificial neural n e t w o r k that performs a simple function called " O R . " T h e O R function works with 40
S E T T I N G T H E STAGE
1
l
two inputs that can be either I or o, which might represent conditions like true or false, respectively. The OR function works just like it does in English. If I say, "I'm hungry or I'm happy," as long as either of these possibilities is true, the entire statement is true. If I'm not hungry and I'm not happy, then the statement's false. In numbers, if both inputs to the OR function are o, then the function should output a o; otherwise, it should output a I. The neural network shown in figure 7 can perform the O R function when the firing thresholds for each neurorl are set at o. Explicitly, the neural network outputs a value of I if it "sees" input pairs of (I,I), (I,O), or (O,I), and outputs a value of o if it sees the input pair
(o,o). Here's how it works. The numbers on the links represent the connection weights between pairs of neurons. Suppose the neural network sees the input pair (I,O). The first input neuron sends the value I along to the first hidden neuron. Along the way, it's multiplied by the weighting factor of I, which doesn't change the value. Simultaneously, the second input neuron sends the value o to the second hidden neuron. That value is multiplied by I, which still leaves the incoming activity at o. The incoming activity for the first hidden neuron is greater than o, so it fires a value of I and sends it to the output neuron. Along the way, it's amplified by the weight I, which doesn't change the activity. At the same time, the incoming activity for the second hidden node isn't greater than o, so it stays quiet, passing a value of o along to the output node. That value is mukiplied by the connection strength of I, but this still leaves the signal with no strength. AIg2
S E T T I N G T H E STAGE
( ,nput2~-l.0
/, Neuron )
FIGURE 7
A three-neuron network that computes a simple
function called "OR."
The output node then adds up both incoming signals--a I from the first hidden neuron and a o from the second hidden n e u r o n - and totals up to I. Since this total is greater than o, the output neuron fires a value of I ,indicating that the O R function has been satisfied. You can verify that the other three possible instancesw(I,I), (o,I), and ( o , o ) ~ a r e also computed correctly. This example shows how even a small number of neurons can be used to compute a mathematical function. It's not a complicated function, but it's easy to imagine that a neural network comprising scores of neurons would be able to handle complex functions, even though its basic neurons are simple. One such function might be used to assess the worth of alternative positions in a game of checkers. A neural network could sense the positions and types of pieces on the board as input and respond BUILDING
AN ARTIFICIAL
BRAIN
43
with an output value that indicates how much the network "likes" the position. We might bound the network's output to lie between +1 and -1, where +1 would mean that the neural network really likes the position a n d - 1 would mean just the opposite. If we pick the right set of weights, then the neural network should tell us that it really likes checkerboards that put us in a winning position, and it should also tell us to avoid positions that lead to a loss.
The All-Important Links We've seen that networks of artificial neurons can perform some interesting tasks. N o w comes the hard part: H o w do you get an artificial neural network to do the task you want? Look back to figure 7. In that example, I wanted a neural network to perform the O R function. It was easy to construct a suitable network of three neurons by connecting one input value to a hidden neuron, the other input value to another hidden neuron, and finally by connecting the two hidden neurons to the output. But the real trick came in recognizing that the weights for connecting the neurons should all be equal to I.o and the threshold values should be o. What if we'd chosen some other weights and thresholds? Suppose that instead of having weights equal to ~.o and thresholds ofo, we reversed these, setting the weights to o and the thresholds to I.O. Let's see how the neural network would respond to (I,o). The first hidden neuron would have o stimulus, because the input value of I would be multiplied by o. Likewise, the second hidden neu-
44
S E T T I N G T H E STAGE
ron would also have o stimulus as its corresponding input value of o was multiplied by o. Since the activation at both hidden nodes would be less than their respective thresholds, neither neuron would fire. This would then send two o signals on the output node. Each of these signals would again be multiplied by o. In turn, the output neuron would have no incoming activation and would fail to fire. Overall, the neural network would respond to (I,o) by outputting a o. That's not what we want. We want the neural network to have an input-output or "stimulus-response" behavior that matches our intended function, like OR in this case. 4 By changing the weights and the thresholds of the neurons, we can make a neural network that performs this task or fails miserably. The same principle holds for designing a neural network that assesses positions on a checkerboard. If a neural network has the right number of neurons and the right weights between those neurons, that network might do very well. But even if it has the right number of neurons, with the wrong weights, the neural network would play very poorly. Finding the right weights becomes the critical and vexing issue.
A Sinister Task O R is a simple function. We only need three neurons, and we can figure out the right values for the weights and thresholds simply by examining the neural network and doing a little scratch work with paper and pencil.
B U I L D I N G AN A R T I F I C I A L B R A I N
48
To compute more interesting complicated functions (like assessing the worth of a position in a game of checkers), we need more neurons. That means we need more weighted connections and more thresholds, and each has to be set appropriately. Suppose that the task you faced demanded a neural network with thousands of weighted connections, each needing to be set appropriately. Linking up all the neurons would become a significant challenge! The neural net'work's output depends on the weighted connections from all the hidden neurons, which in turn fire or remain quiet depending on the weighted connections from all the input neurons and the input data. Therefore, it's impossible to know which weighted connection is responsible for the output, because in essence they all are. It would be a bit like adjusting the controls on a television made by a fictitious company called Sinister Electronics; imagine that the brightness and contrast knobs are connected in some mysterious way with the horizontal and vertical holds. All you want is a nice, clear picture that stays still and is centered on your screen.You move the dial to make the screen brighter, and the picture begins to flip slowly up and down.You reach for the vertical hold and manage to halt the flipping, but this enhances the contrast and also makes the picture shift a little to the left. Every attempt you make to control one knob at a time results in defeat.You have to move all the knobs simultaneously to get the desired effect. Similarly, with an artificial neural network, you rarely can focus on a single weighted connection at a time. Instead, you have to consider changing all the weights on the connections simultaneously. The question is, How? Furthermore, the weights in the network might num46
S E T T I N G T H E STAGE
ber in the hundreds or thousands, if not the millions or billions. It's like having a Sinister television with thousands of knobs. Unfortunately, you only have two hands.
A Not-So-Sinister Television Suppose our Sinister television had two knobs for horizontal and vertical control. Suppose further that these controls were independent, and that we could measure how much we like the position of the picture on the screen for any pair of control settings. We'd then have the situation captured in figure 8. The x-axis is the position of the horizontal control knob, the y-axis is the position of the vertical control knob, and the z-axis marks how much we like the position of the picture. We'd like to find the settings that correspond to the maximum on the z-axis of the function, which is where we like the picture the most. Simply by looking at the function, we can tell that the best situation is when both the vertical and horizontal knobs are set at five, the midpoint of their range. Even if we couldn't see the entire function, we could adjust the knobs independently and find the best setting for each one. In this case, those best independent settings would correspond to the best setting overall as well. Note that the function graphed in figure 8 has only one maximum. That maximum corresponds to a pair of settings of the horizontal and vertical controls that's optimal. What's more, we can find that maxim u m by first changing only the horizontal control knob and locating its best setting, then turning our attention to the vertical conBUILDING
AN A R T I F I C I A L
BRAIN
47
FIGURE 8
A function showing how much w e ' d like the resulting picture on the screen of our Sinister Electronics television as we turn the control knobs. This is an easy situation, in which the effects of" the two knobs aren't coupled. Even if we couldn't see the entire function, we could adjust the knobs independently and find the best setting for each one. In this case, that would correspond to the best setting overall as well.
trol knob and locating its best setting (or we could perform these tasks in reverse order). The function in figure 8 reflects the fact that the two knobs are decoupled. Unfortunately, neural networks (and especially networks with thousands of connections, as might be needed for playing checkers) don't work like this television.
A Really Sinister Television N o w let's look at the relationship depicted in figure 9. The axes of the graph are the same as before, but now the function has multiple peaks and valleys. This is what we might face if the two control knobs of the television weren't independent, but rather dependent. If we found the picture to be a little low, we'd adjust the vertical hold to raise it, but in so doing we might let the picture slip a little to the left. Overall, we might be a bit worse off than we were before. Then we'd adjust the horizontal control back to the right, but the picture might move a little farther up, and so forth. We'd have to adjust both knobs at the same time to have any chance of finding the best picture. Remember, we never get to see the entire function that shows how much we like the picture for each pair of settings. All we can do is try different settings and see whether the picture gets better or worse. It would be easy to find the peak if we could see the whole function. We can just look at figure 9 and know that the best setting for the two knobs is at about eight on the horizontal control and four on the vertical control. But we don't get to see the entire function. We know that we have to change both knobs simultaneously, but the BUILDING
AN A R T I F I C I A L
BRAIN
49
FIGURE 9
A new function showing how much we'd like the picture on the screen of our Sinister Electronics television if the effects of the control knobs were coupled. Higher values of the function indicate where we like the picture more. Finding the best settings of the knobs is difficult.
real question is, H o w ? W e face the same question w h e n " t u n i n g " a neural n e t w o r k , because the effects o f its weights are coupled. A n d all we can do is measure the neural n e t w o r k ' s effectiveness based on the weights that we try. ( W h e n t h i n k i n g about a neural n e t w o r k for playing checkers, that means we're going to need to play lots o f games, 50
S E T T I N G T H E STAGE
testing out alternative sets of weights to find out which sets work better.)
Hill Climbing to Optimize a Neural Network We can imagine the function depicted in figure 9 describing how well our neural network performs some task, such as evaluating the likelihood of our winning a checkers game based on the current position of the pieces. We'd like to find the set of weights and thresholds that corresponds to the peak of that function; that's where the neural network is doing its best (for example, winning the most checkers games). Finding that proverbial pot of gold requires some searching. To make an analogy, one that's common in computer science literature, the situation is a bit like trying to find the highest peak in the Rocky Mountains while hiking in dense fog. (Think about figure 9 as if you were the mountain climber trying to find the highest peak of the function.) If the fog were to lift, you might be able to see for miles around and easily spy the highest peak. But in the fog, you can only make out the terrain that lies under your next footstep. Under these conditions of limited visibility, you might assess the slope of the terrain that you're standing on and find the direction where the slope rises fastest, s If you kept walking in the direction of where the slope was rising, and rising fastest, you'd eventually reach a peak. (In computer science, this technique is called "hill climbing.") That peak might not be the highest peak, but at least it would be a peak. BUILDING
AN ARTIFICIAL
BRAIN
51
Let's see what the analogy with functions and climbing hills can offer to the problem of designing artificial neural networks. Essentially, we want the neural network to perform a task, like evaluating checkerboards. It senses some input, perhaps the position and types of pieces on the board, and responds with some output, such as how much the network likes the position it sees. O u r control over the neural network's behavior is limited to adjusting the weights on the connections between the neurons, much like adjusting the control knobs on the Sinister Electronics television or hiking in the Rockies. Every setting of weights on the neural connections generates an input-output behavior for the neural network that we can measure and grade. O u r hope would be that by making small changes to the weights, we could affect the neural network's output and determine if that output were better or worse than what's offered by the current best weight settings. If it were better, that would be like our mountain climber finding a higher position on the terrain, and we could accept those new weights and search again. Otherwise, we could try a different set of weight changes, much like walking in a different direction, and continue until we found something useful or until we grew tired. 6 Returning to the idea of a neural network that plays checkers, we might pit two different neural networks against each other, with each network using a different set of weights. One set might involve a small change from the other set, and just like the example of our mountain climber, we could try to hill climb toward a set of weights that allows the neural network to play well. Something similar to this idea was 52
SETTING
T H E STAGE
tried many years ago, although that effort didn't use neural networks. (I'11 describe that attempt in greater detail in chapter 8.)
Smoothing Out the Rough Spots The preceding paragraphs present a prescription for optimizing the weighted connections and thresholds of the neurons in an artificial neural network.You start with some values for the weights and thresholds, much like parachuting the mountain climber somewhere into the Rockies. Then you make small changes to the values, much like having the mountain climber search around his or her immediate vicinity. If those new values seem to work better than the old values, you keep them. That's the equivalent of hiking up hill. Eventually, you'll get to a peak. But now we face a new problem. Staying with our analogy, some mountains are easier to climb than others. Some offer nice, gradual slopes. Others offer steep, sharp cliffs. It would be nice if the function that we had to climb didn't look like the white cliffs of Dover. Unfortunately, the simple McCulloch-Pitts neurons that I introduced earlier often induce functions that have exactly this undesirable characteristic. Remember, the McCulloch-Pitts neurons are all-ornothing. They either fire or they don't. O n the one hand, the sharp thresholds of the neurons can translate into cliffs on the surrounding functional landscape, because if you change a weight slightly, it might alter the output of the network from all to nothing, or vice versa. O n the other hand, the change might make no difference at all in the neural network's output, because the BUILDING
AN ARTIFICIAL
BRAIN
53
alteration might not result in a neuron switching between firing or staying quiescent. It's difficult to walk uphill when you're staring at a landscape of large flat mesas bounded by cliffs. The all-or-nothing McCullochPitts model of how neurons work doesn't leave much room for hill climbing. To circumvent this conundrum, we can replace the McCullochPitts threshold function with another function that's smooth, continuous, and shaped almost like a threshold. This approximating function is called a "sigmoid." Figure I o shows a family of these functions. 7 This function has a smoothing effect on the landscape, so our attempts to locate a peak may become easier. The cliffs become rounded, the mesas become sloped, and once again we can walk uphill (by adjusting the network's weights) and hope to find the closest peak. O u r hope is that the closest peak corresponds to a good set of weights and threshold values for the neural network so that it can compute the function we want, such as a function to assess checkerboards.
Smoother Is Better, but Is It Good Enough? Unfortunately, sometimes that closest peak simply isn't good enough. Look back to figure 9 and imagine that this graph represents the overall quality of a neural network based on two weights. (There might be many more weights, and threshold values too, but we can't show more than a three-dimensional graph, so you'll have to use your imagination.) There are lots of low-lying peaks. Suppose the level of performance 54
S E T T I N G T H E STAGE
FIGURE
10
A family of"sigmoid" curves. Each curve looks a bit like the threshold function for a McCulloch-Pitts neuron. The specific mathematical expression that generates each of these curves isn't particularly important, but if you're interested, it's 1/(1 +e-SX), where x is the incoming neuronal activity and s is the scaling factor. The important point is that there is a simple formula for creating these curves and that by adjusting the scaling factor, we can make the curves look more like a threshold function or less like a threshold function. W h e n the scaling factor's high, the sigmoid curve increases steeply, like a threshold. W h e n the scaling factor's low, closer to zero, the sigmoid flattens out. The smoother sigmoid functions often generate landscapes that are easier to climb than the abrupt threshold functions that McCulloch and Pitts suggested.
that each one of those peaks represents simply isn't good enough for what we want the neural network to do. W h e n using sigmoid neurons, we can at least expect to obviate the problems associated with facing steep cliffs and flat mesas on that landscape. But we still have no idea in general if the nearest peak to where we start searching will be the best one, or even acceptable. H o w can we prevent getting stuck on a neural molehill when we want to find the top of Pike's Peak?
Using Evolution to Avoid Getting Stuck Evolution provides an answer to our dilemma. In evolution, individuals compete for survival and can be measured in terms of their overall quality, their fitness. That measure corresponds to the z-axis in our function shown in figure 8 or 9. Each individual's behavior is based in part on its genetic composition. Changing that genetic makeup may portend changes in behavior, and since some behaviors are better than others, similarly some combinations of genes are better than others in the context of the individual's particular environment.Just as we can turn the knobs of a television, nature can adjust the genetic knobs of the individuals competing for survival. 8 The evolutionary process of random variation explores alternative genetic combinations. Random variation provides a means for escaping from low-lying hills and proceeding toward a higher peak. Most often, offspring are closely related to their parents (thus the saying, "The apple doesn't fall far from the tree"), but occasionally a random mutation can generate a "mutant," with significantly different genetic composition from its parents. If those combinations of mutated genes 56
S E T T I N G T H E STAGE
lead to improved performance--better qualitymthey will be favored by selection. Similarly, we can use something akin to "survival of the fittest" when exploring for the suitable weights and thresholds in a neural network. We'd like to adjust the weights of a neural network,just as nature adjusts the genes of an individual's offspring, and witness alternative neural networks compete for survival. Those that perform the desired task better than others would survive and pass their weights along to "offspring" neural networks through some form of random variation. Over time, we'd expect a series of improved, higher-quality neural networks. W h e n applied to the game of checkers, maybe we'd even see some expert neural network players emerge from this evolution.
Advantages of Evolution We've seen that although hill climbing can help us to climb peaks, it also has some disadvantages. In particular, you might only be able to locate a nearby peak, and that peak might not be good enough. Evolution, in contrast, has some advantages over hill climbing. For instance, instead of relying on a single mountain climber in a dense fog, evolution operates on populations of individuals, all performing in parallel. Imagine, instead of" parachuting a single mountain climber into the Rockies and telling him or her to find a peak, you could parachute in an entire airborne division. What's more, it would be as if each of these paratroopers had a walkie-talkie and was in constant communication with his or her comrades. Those who parachuted into low-lying foothills would hear BUILDING
AN A R T I F I C I A L B R A I N
57
about the higher locations that have been discovered by their teammates. The effect of natural selection is almost magical: The individuals who are searching in low-lying areas are instantly teleported in the vicinity of those other individuals who are searching in more promising areas. (In natural evolution, this is the process of selection eliminating those individuals with poor performance and making random variations of those with superior performance. The dead don't teleport or get reincarnated, but we can take some poetic license here.) With an evolutionary search for a peak, there might be less chance of becoming stuck on a low peak, since other individuals are searching simultaneously. All it takes is for one of them to pinpoint a higher peak, and the reproductive attention of the evolving population shifts to that new winner.
A Tangled Web of Genes Leads to Complex Landscapes If we're going to look to evolutionary optimization for inspiration, it might be reassuring to know that nature poses some complicated functions (known as adaptive landscapes, see note 8), like the ones we might expect when optimizing a neural network. What do natural adaptive landscapes look like? Are they simple, as in figure 8 ? Or are they complex, as in figure 9? The answer is predominantly more the latter than the former: Natural adaptive landscapes can be immensely complex. This complexity comes as a result of the intricate machinery and process that transfers D N A into proteins and in turn transfers proteins into individuals. 58
S E T T I N G T H E STAGE
By the time you examine a behavioral trait, such as a tendency to be compulsive, the number of genes involved in affecting that trait can be enormous, and their individual degrees of influence on the trait aren't equal. Indeed, they may not even be separable. By analogy, a full house in poker consists of three cards of one denomination and two cards of another denomination, such as three eights and two fours. Which of these cards is most influential in creating the full house? This is meant as a rhetorical question. Only the entire full house has a measurable value; each of the cards is just a card. Similarly, genes are just genes. 9 Only the final product of their combination interacting with the environment determines whether those genes are passed along to offspring (see figure I I).1~ Think back to the Sinister television, with the tuning knobs that were coupled; the situation in natural genetics can be much the same. Real-world adaptive landscapes must be very complex, yet evolution has done a remarkable job of creating m a c h i n e s ~ c a r b o n - b a s e d machinesmthat survive in a wide array of environmental conditions. Certainly, the real world poses complexities that go far beyond those we could expect when trying to find the right weights and thresholds of a neural network, even when applying a network to a fairly complex game such as checkers. Evolution's success serves as our inspiration: We can harness the evolutionary processes of random variation and selection in an algorithm that can search for the right weights and thresholds for a neural network, using a computer to simulate generation after generation. What remains is to convince ourselves that evolution is an effective means for finding extraordinary solutions. We'd like to see some organisms that have found the highest peaks BUILDING
AN ARTIFICIAL
BRAIN
59
Gene
Gene Product
Character
// #
on their adaptive landscapes,just as we'd like to find the peak on our neural network's adaptive landscape.
The Principal Problem" Avoid Being, Someone Else's Lunch Evolution generates and tests many possibilities. Individuals that are best suited for their environmental demands survive and have the pos60
S E T T I N G T H E STAGE
FIGURE
1 1
Real-world genetic systems are quite complex. It's rarely the case that a single gene, and only a single gene, controls any particular behavioral characteristic. More often, single genes affect more than one trait, and individual traits are affected by more than one gene. In the figure, the sequence of nucleotides that encode the first gene generate the gene product (a protein) represented by the dark triangle. This product affects three different behavioral traits. Any adjustment to the first gene could change three behavioral characteristics simultaneously. Even when we consider the behavioral characteristic described as "g" in the figure, where all we have to do is adjust gene number seven, in so doing we can affect the traits "c" and "h" as well. It's easy to imagine that real-world adaptive landscapes can be very rugged when faced with a complicated genetic milieu. Here, we only have eight genes, but as we've learned from the Human Genome Project, there are about forty thousand genes in humans. Try to imagine what figure 9 might look like in forty thousand dimensions!
sibility to pass a l o n g t h e i r g e n e t i c i n f o r m a t i o n to offspring. T h e b e haviors that are crucial for survival are o f t e n heritable, w r a p p e d inside the genetics. True, a great m a n y trials are w a s t e d ~ i n d i v i d u a l s that n e v e r survive to r e p r o d u c e - - y e t e v o l u t i o n can g e n e r a t e m a n y m a r v e l o u s l y a d a p t e d creatures. O n e such u n m i s t a k a b l e a d a p t a t i o n is seen in the r e p e a t e d instances BUILDING
AN A R T I F I C I A L B R A I N
61
FIGURE 12
The walking stick is an insect that has evolved to look like a twig.You can see it because you know it's there, but imagine walking casually through the woods. Do you think you'd notice this living stick as you strolled by it? Many generations of random variation and selection have perfected the insect's ability to mimic its natural setting. Photo courtesy Mark Newman/Image State.
of cryptic coloration and camouflage found in nature. For example, somewhere in figure I2 is an insect. It looks like a stick. Do you see it? This photograph inspires a certain awe, because there's no doubt that here is a creative solution to the principal problem facing all living organisms: how to avoid being someone else's lunch. The idea of making the insect look like a stick is captured in its genetics, and the ruse is executed with impressive precision. Another example, and one of my favorites, is the leafy sea dragon (see figure 13), which has evolved an impressive mobile defense. Its limbs and back take on the form of the surrounding vegetation, concealing it from predators. Another way to avoid becoming someone else's lunch is to be poisonous. And if you're going to be poisonous, it pays to advertise that fact so that predators will leave you alone. The yellow jacket (figure 14) is a case in point. The c o m m o n yellow banding has evolved as a form of communication from insect to potential predators, such as birds, which often have superior visual acuity. The problem with advertising is that it invites imitation, and this is just as true in nature as it is in the marketplace. Figure 15 shows a flower fly. It's not poisonous, but it wants you to think that it is. 62
S E T T I N G T H E STAGE
Certainly, the flower fly didn't evolve its yellow-and-black banding by random chance alone. Natural selection provided the ultimate measure of success. Those flower flies that fooled predators into moving on for an easier meal were the flies that survived.The better the match between the fly's color scheme and the pattern used by truly dangerous insects, the better are its chances for survival. Take a minute to think about the creative nature of its solution to the problem of avoiding being eaten. In the absence of the effective communication to potential predators of a yellow-banded pattern, the fly's solution couldn't succeed. Evolution thus displays a series of unfolding inventions, each being put to the test in current environments, with the ability to take advantage of the ingenuity of other evolving species. For a time, each of these creatures sits on a peak in an adaptive landscape that measures the appropriateness of their behavioral traits, at least in some regard. Admittedly, not every trait of every individual is optimized. 11 Still, the stringency of competition must often be severe, for evolution has generated ingenious solutions to the problem of survival in so many cases.
Harnessing Evolution in an Algorithm We started this chapter with the hope of designing an artificial neural network, a computer that acts something like our own brains. We saw how a collection of even simple neurons can compute some
64
S E T T I N G T H E STAGE
FIGURE
13
Like the walking stick, the leafy sea dragon also avoids predators by blending into the background, with the difference being that the sea dragon takes its background with it. Over time, the process of random variation and selection has optimized the limbs and back of the leafy sea dragon to take on the form of the surrounding vegetation. Photo courtesy ANT Photo Library.
FIGURE
! 4
The yellowjacket exhibits a common yellowand-black banding that signals to predators that it packs a powerful sting. The insect's coloration is a symbol, no different from the symbols we use in our language, that conveys an important idea to birds and other potential enemies. The high degree of contrast between the yellow and black bands and the consistent association between this pattern and a poisonous stinger send a reliable warning. Photo courtesy Edward S. Ross. functions, but we realized quickly that we'd need many neurons with potentially thousands of connections to compute anything truly c o m plicated, such as assessing the prospects of winning a game of checkers from any candidate position of pieces on the board. Finding the right settings for those connections, and the threshold values for each neuron, became the primary task. We drew an analogy between finding the best set of values and finding the location of the highest peak in the Rocky Mountains. Searching for that peak by 66
S E T T I N G T H E STAGE
FIGURE
15
Imitation is indeed the sincerest form of flattery. Here, the flower fly hitchhikes on the ingenuity of the yellow jacket, and many other stinging insects, by adopting the same coloration. The flower fly isn't dangerous, but it wants you to think that it is. Photo courtesy Edward S. Ross. having a lonely m o u n t a i n climber stumble around in a fog posed p r o b lems. B u t n o w w e ' v e seen that evolution can help us to o v e r c o m e these problems. Suppose we could harness the fundamental processes o f natural evolution inside a computer. We could generate many thousands, or maybe millions, o f solutions to problems, test these solutions, keep the ones that are better, and use t h e m as parents o f future i m p r o v e d solutions. We could write a c o m p u t e r p r o g r a m that uses an evolutionary algor i t h m to breed solutions to problems and perfect t h e m over time. O n e such p r o b l e m m i g h t be to design an artificial neural n e t w o r k that recognizes patterns we think are i m p o r t a n t (we'll see an e x a m BUILDING
AN A R T I F I C I A L B R A I N
67
pie of this in chapter 5). Perhaps, using nature for inspiration, we could go further and have an evolving artificial neural network discover its own patterns, ones that we haven't recognized. As I'll describe shortly, this scenario is more than just a possibility. It's a reality, for a computer has used this evolutionary process to teach itself how to play checkers at a level that's competitive with human experts.
60
S E T T I N G T H E STAGE
Evolutionary Computation: P u t t i n g N a t u r e to Work
4
If we strip evolution down to its essence, we see that it's a never-ending two-step process of random variation and selection. Genetic changes that arise from mutations and recombination portend new behaviors, which are then tested in light of current environmental demands. Selection eliminates individuals that don't meet those demands, and the process iterates. The basic steps of evolution provide a prescription for writing an algorithm that follows a similar procedure, that is, an "evolutionary algorithm." I've suggested that evolutionary algorithms might be useful for designing artificial neural networks, computer models that emulate the functioning of simple brains. Later in the chapter, I'll illustrate this concept with specific examples, but first, let's consider a simpler case to indicate how evolutionary computation works.
Evolutionary Computation in Action: The TravelingSalesman Problem Suppose you have to find the shortest path from Los Angeles that leads to San Francisco, Seattle, Las Vegas, Phoenix, San Diego, and then re69
turns to Los Angeles (see figure ~6). This task is known as a "traveling salesman problem" in the scientific literature. The salesman starts at his h o m e and must visit each city in his area once and only once, then return home in m i n i m u m distance. Sometimes a traveling salesman problem is easy to solve.You can just look at a map and deduce the correct path to take. N o doubt, that's the case here.You can take a quick glance at figure 16 and immediately know the best route to take. But computers can't "glance" at maps, at least not unless we program them to, so we have to think about how to solve the problem step by step, algorithmically.
Enumerating the Alternatives H o w many alternative routes are there? Starting at Los Angeles, you have five possible choices for your first stop (San Francisco, Seattle, Las Vegas, Phoenix, or San Diego). After that, only four choices remain, then three, then two, and finally there is only one choice left before you have to head h o m e to Los Angeles. All tolled, there are then five times four times three times two times one different paths that you could consider. That's ~2o alternatives. The most obvious way to find the best path is to calculate the distance required to complete each path and then choose the one that yields the minimum. This process is called enumeration, and it works fine w h e n there aren't too many possibilities to consider, as in this case. But as we consider larger problems involving more cities, each new city increases the total number of possibilities as a product function. For six cities there are six times as many possibilities as there are w h e n 70
S E T T I N G T H E STAGE
S
SnFrancsco Los Angeies, ~ :~an olego" ... " ~
/ ?
/ I 9Phoenix
| l
FIGURE | 6
A traveling salesman problem involving Los Angeles, San Francisco, Seattle Las Vegas, Phoenix, and San Diego. The task is to start in Los Angeles and visit each city once and only once, then return home taking the shortest route.
you have only five cities. For seven cities, there are seven times as many alternatives as w h e n there are only six cities, and so forth. The n u m ber of different ways to solve the problem grows rapidly. Figure 17 shows jUSt how rapidly this product function grows. The numbers quickly go offthe chart. For ten cities, you'll face 3,628,8oo different ways to traverse them. 1 For fifty cities, the n u m b e r of possibilities is roughly 1063, or written out in long form: 1,000,000,000, 000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000, 000,000. This n u m b e r is far larger than the n u m b e r of seconds in the history of the universe, about 1018. Enumerating and evaluating each EVOLUTIONARY COMPUTATION
7 |
possibility to find the best solution is now out of the question. We just don't have the time.
Good Enough, Fast Enough to Be Useful Finding the perfect solution to any sizable traveling salesman problem is usually a futile effort. 2 But in practice, we rarely care about a "perfect" solution. What we're looking for is a solution that is good enough and that we can find in a reasonable amount of'time. Instead of exhaustive enumeration, let's consider evolution. We start with a collectionma p o p u l a t i o n - - o f different potential solutions and evaluate the fitness of this small subset of all possibilities. Next we have the computer make random variations of these parent solutions and evaluate these offspring. We then apply the rule of survival of the fittest, mimicking natural selection by keeping those solutions that are better and "killing" those solutions that are worse. This completes one generation. Finally, we use the computer to reiterate the process of making variations of the survivors. Applying this process, in a short period of time, we can find nearly optimal paths through the cities we must visit.
Evolutionar 7 Computation in Action Let's take a look at this process in action. In an example that I presented in the February 2000 issue of IEEE Spectrum, I applied an evolutionary algorithm to find a good solution to a traveling salesman problem with one hundred cities. 3 '7'2
S E T T I N G T H E STAGE
120 -r 100
~
c-9 ._
80
o~ ,- ~
60
o ~
40
_J
Z
o
I--
20 0
FIGURE
,
0
10
20
30 40 50 Number of Cities
60
70
.
.
.
,
80
17
The total number of possible solutions to a traveling salesman problem as the number of cities in the problem increases. Note that the vertical axis is the logarithm of the total possibilities. A value of twenty on the vertical axis equals 100,000,000,000,000,000,000, or in words, a hundred billion billion. By the time you get to even fifty cities, the number of possibilities start to approach the number of electrons in the universe.
T h e first step in applying the algorithm was to create the initial p o p u l a t i o n o f candidate solutions. ( R e m e m b e r that by
solution,
I sim-
ply m e a n a possible path by w h i c h to traverse all the cities. Any such path is a solution, and some are better than others.) I started with one h u n d r e d "parent solutions." Each parent visited the cities in a r a n d o m order. Figure 18 shows the best o f these ranEVOLUTIONARY
COMPUTATION
73
dora parents at the first generation. Each parent then created one offspring by copying itself, then introducing some random variation, a reverse ordering of" certain randomly selected cities. This increased the population's size to two hundred total solutions. Next I applied the equivalent of natural selection. I calculated the total length of every solution in the population, culled out the one hundred worst solutions, and kept the one hundred best. These survivors then became the parents for the next generation. After five hundred generations of this random variation and selection process, the computer discovered the solution shown in figure 19. After four thousand generations, it homed in on the solution depicted in figure 20, which looks pretty good. Actually, it's very good. It's not perfect, but evolution doesn't strive for perfection. The object here was to find a very good solution in a short amount of time. Completing four thousand generations typically takes less than three minutes using a 350 MHz Pentium II. Keep in mind that the evolutionary algorithm doesn't contain any explicit concept of'what a traveling salesman problem is, or what Euclidean geometry is all about. It just has lists of cities to visit, and it can determine the relative merits of alternative paths. W h e n coupled with a suitable means for varying those paths, that information is all that an evolutionary algorithm needs to quickly generate superior solutions. 4
Turning to Evolutionary Neural Networks Having seen how evolutionary computation can generate excellent solutions to a traveling salesman problem, let's take up the problem 74
S E T T I N G T H E STAGE
FIGURE
| 8
Generation I : A n example of using an evolutionary algorithm to optimize the solution of a one hundredcity traveling salesman problem. Each dark circle represents a city that must be visited only once. Here the figure shows the best solution after the first generation of one hundred "parent" solutions, which is quite poor, as would be expected.
100
80
>~
~ 60
2O
0
FIGURE
i
,
[
20
i
,
i
,
i
,
,-,
40 60 Unit distance, x
,
80
~
l
i
100'
I 9
Generation 500: The best solution in the population after five hundred generations shows a marked improvement over the initial best solution. The evolutionary algorithm cut the total length of the best tour by more than 5o percent through simple random variation and selection.
lOO
80 >,
~ 60 ~ 40 2O
00
20
40
60
80
100
Unit distance, x FIGURE
20
Generation 4000: After four thousand generations, the evolutionary algorithm discovered the solution shown here. It's not perfect, but its total length is within I o percent of the expected best. Remember that there are over 10150 different possible solutions. The evolutionary algorithm examined only four hundred thousand of these, an infinitesimally small fraction.
of evolving neural networks to solve more complex problems. The basic approach is very similar. We start with a parent population of artificial neural networks and create offspring from these parents. We copy the parents, cloning them into their offspring, and then apply some random variation to the weights and thresholds. We score how well each of the neural networks accomplishes some task, such as generating the OR function (see chapter 3), or more likely something more interesting, such as evaluating the worth of a checkerboard position. We keep the neural networks that are better at performing the task, discard the ones that are worse, and repeat the process until either we're satisfied with the best neural network's performance or we give up.
The Roots of Evolutionary Computation This evolutionary approach to producing a neural network sounds new. Actually, the roots of coupling evolutionary computation with models of neurons go back many decades. Hans Bremermann, a professor at the University of California at Berkeley and a pioneer of evolutionary computation, suggested this coupling of evolutionary algorithms and neural networks in 1966. Unfortunately, the computers that were around several decades ago weren't fast enough to really capitalize on his idea. Evolutionary computation itself has been around for a while, since the early days of computer science. From 1953 to 197o, at least ten people invented this same basic concept of capturing evolution within a computer, s Some even conducted actual computer experiments 78
S E T T I N G T H E STAGE
using evolutionary algorithms to solve engineering problems. O t h ers used their evolutionary models to study genetics in simulations. O n e of these pioneers was my father, Larry Fogel, who conceived of what he called "evolutionary programming" while serving at the National Science Foundation in I96O. His idea was to use evolution as a means for generating artificial intelligence. Instead of programming computers to model the complexities of human brains or the rules that people give to solve problems, he suggested m o d eling the process that creates organisms of increasing intellect over time: evolution. 6
From Parent to Offspring I'm fortunate to have a father who was a pioneer in evolutionary computation. There's no doubt that my exposure in 1981 t o his evolutionary programming, at age seventeen, focused my own efforts to apply evolutionary algorithms to solve difficult problems. 7 I'd found computers fascinating, and I was more than a little curious about the prospects of creating intelligent machines. 8 I first attempted to program an "intelligent" computer when I was thirteen years old and in junior high school. All the students had access to the San Diego City school system's computer via a terminal that operated at a baud rate of 36o bits per second. (For comparison, consider that contemporary baud rates extend to 56K for conventional modems transmitting over phone lines.) We could play simple games, printed out on punch-holed computer paper using a dot matrix printer. We could even write our own programs in BASIC. EVOLUTIONARY COMPUTATION
'7'9
FIGURE 21
A The framework I used in my 1992 doctoral dissertation for a neural network that could learn to play tic-tac-toe. There are nine possible positions on the tictac-toe board. Each input neuron corresponds to one of these positions. The neural network had up to ten hidden neurons and nine output n e u r o n s . . Each output neuron again corresponded to a position on the board. Whenever the neural network had to make a move, I would present the current position. The neural network coded each " X " as +1, each "0" as -1, and each blank square as 0. Based on those values, and the weighted connections of the neural network, each output neuron ended up with a different value. The neural network then moved in the open square that had the highest output value. Each connection, designated by an arrow in figure A, had its own evolvable weight. Moreover, each hidden neuron and output neuron had its own evolvable threshold value. Thus, in total, there were 199 parameters to adjust. You can easily imagine that adjusting the weights of this neural network is a lot like adjusting the knobs on a Sinister television. For instance, if you increase the weight that connects the first input neuron to the first hidden neuron, then you also increase the output from the first hidden neuron to all of the output neurons.
Once, after watching an episode of The Bionic Woman on television in which the heroine,Jaime Summers, had to outsmart an intelligent computer that controlled a large military installation, I decided to program my own intelligent computer. I set about writing an interface that would respond to English words and phrases. Day after day, I developed the program, but I quit after a little more than a week, realizing that my project just wasn't feasible. I'd never complete the program, even in my lifetime! There were simply too many individual cases to handle.
Evolving Neural Networks: A Tic- Tac- Toe Machine Ten years after my father introduced me to the idea, I wrote my doctoral dissertation on the potential for applying evolutionary computation to machine learning, including the possibilities for evolving artificial neural networks to play games. The title of my work was "Evolving Artificial Intelligence," a continuation of the ideas that my father had originated. 9 One of the applications I developed used an evolutionary algorithm to optimize the performance of a simple artificial neural network that played tic-tac-toe (also known as naughts and crosses). Writing a program that plays perfect tic-tac-toe is an elementary task in computer science. But the application here was different: Was it possible to start with randomly connected artificial neurons, much like the kind Rosenblatt proposed in the I95OS, and evolve them through random variation and selection so that they could play a good game of tic-tac-toe? 82
S E T T I N G T H E STAGE
The neural networks would need an opponent, so I quickly programmed a proficient tic-tac-toe algorithm to serve as the adversary. My procedure was to put this adversary in competition against a population of one hundred different neural networks, each having nine input and output neurons. Each of these neurons corresponded to a square on the tic-tac-toe board. In addition, ten neurons connected the inputs to the outputs. Every possible board in the game could be fed into the neural network, and each output node would respond with a value. Arbitrarily, I decided that the neural network would move to the open square that had the highest associated output activation. Figures 21 (a) and 21 (b) present an example in greater detail. Using the supercomputer at U C S D (a Cray-YMP), I pitted the random neural networks against the nearly perfect tic-tac-toe program that I'd coded by hand. I saved those neural networks that won more games than their competitors and used them to generate "offspring" neural networks through random variation of their weighted connections and neural thresholds. During a series of generations of variation and selection, the computer reliably and repeatedly discovered neural networks that could play a high-quality game of tic-tac-toe. The neural networks weren't perfect, but still, they were pretty good. My brother, Gary, was the first victim, losing a game to one of the evolved neural networks. In fairness, it was late at night and he was playing over the telephone, so he had to imagine the board in front of him. Nevertheless, a win is a win. More important wins were forthcoming.
EVOLUTIONARY COMPUTATION
83
This Page Intentionally Left Blank
Blue Hawaii" A Natural
Selection
One year after I'd developed my evolutionary tic-tac-toe program, my father and I started a new company, Natural Selection, Inc., which was dedicated to applying evolutionary algorithms and other techniques to real-world problems in industry, medicine, and defense. Computer processing speeds had doubled about twenty times since the time my father first proposed evolutionary programming. A typical desktop PC finally offered suflqcient computing power to apply the evolutionary approach to solving problems that included doing scheduling for complex factories, predicting financial markets, and searching for new pharmaceuticals to fight HIV. In canvassing for new business, I operated under the principle that you should try to find work somewhere that you enjoy, and since I enjoy Hawaii, where I'd spent a semester of graduate school, why not start there? After calling a few professors at the university and people who work at the Natural Energy Laboratory on the Big Island of Hawaii, I learned that a new supercomputer center was to be installed on the island of Maui. Mike Boughton was the man in charge. He'd taken a position as president of the Maui Economic Development 85
Board with the objective of expanding Maui's economy without increasing manual labor or service jobs. The supercomputer was to be the focal point in attracting new high-technology businesses to Maui. I phoned Mike and tried to explain the usefulness of the evolutionary approach to optimizing solutions to complex problems. Not knowing Mike's background--he went to Caltech and worked for TRW for more than thirty years--I watered down my language. I tried to explain how an evolutionary algorithm could avoid stagnating at solutions that were less than optimal, a problem that's intrinsic to many classic optimization techniques, but I had difficulty getting the point across without using specific technical terms. Finally, Mike interjected, obviously trying to help me: "Oh, so you mean this technique is good at escaping from local optima?" I knew then that Mike and I would get along just fine.
Evolving a Mammographer I visited Mike in 1994, and he took me to see his friend Gene Wasson, a medical doctor specializing in radiology at Maui Memorial Hospital. We discussed the possibility of evolving neural networks for improving breast cancer detection. There were already some preliminary publications in scientific journals such as Radiology and Cancer suggesting that neural networks might be useful in this regard. The idea was to train a neural network to recognize different features from mammograms that would be associated with both malig-
06
S E T T I N G T H E STAGE
nancies or benign cases. The hope was that the neural network would be able to assist the radiologist in detecting more cancers and in more quickly discarding the benign cases. Although prior publications showed significant potential, the methods used to train the networks were already obsolete. Evolutionary computation was the cutting edge for neural network training, posing an opportunity to make an advance, a chance to apply evolutionary algorithms to a problem of significance. I didn't really know just how significant at the time. In January 1995, my mother was diagnosed with breast cancer. Fortunately, her doctors caught the disease early, before it had a chance to spread to her lymph nodes. We all hoped for the best as she underwent local radiation therapy. I focused more intently on the problem of using evolutionary algorithms to help with breast cancer detection. In reading the scientific literature, I reasoned that we could replace the training methods outlined in prior publications with an evolutionary algorithm and that we might thereby discover better neural networks. The next step was to obtain some funding.
Advancing the State of the Art Three years earlier, Congress passed a budget that allowed the U.S. Army to maintain a multiyear broad agency announcement soliciting research in breast cancer. 1 Gene, Mike, and I submitted a proposal to the army for consideration in the summer of 1996. Our plan was
BLUE H A W A I I : A N A T U R A L S E L E C T I O N
87
to evolve neural networks that would operate on mammographic features as interpreted by Gene. O u r primary goal was to determine which features were really important and which were perhaps inconsequential. Eliminating those features that didn't contribute to an accurate assessment, or didn't contribute much, would save time and effort. Any automated m a m m o gram screening system should be as efficient as possible. Knowing the relative importance of different features in a mammogram could be a significant help. In February 1997 we learned that our proposal had been selected for an award of approximately $7o,ooo. This was a small grant, relatively speaking, but it was enough to get us started, and knowing that I would be working on something that might someday help save someone's life--like my m o m ' s m w a s worth much more than the money. While I was preparing for the grant to start, I found a database on an ftp (file transfer protocol) site that I could download from the Internet. There a physician had rated different features of cells that were taken under surgical biopsy from patients who were suspected of having breast cancer. The physician had also indicated whether the cells were malignant. This case posed a different problem than the one that Gene faces: Here we were working with the cells from the breast, whereas Gene must work with just an image of the breast. But I quickly set about evolving a neural network to see if we could learn how to classify the relevant features of the cancerous cells. I divided the available data into two sets. The first set was used for 08
S E T T I N G T H E STAGE
training neural networks, while the remaining data were held out for testing. 21 logged on to the Maul supercomputer and programmed an evolutionary algorithm that operated on a population of one thousand neural networks. The evolutionary program evaluated each neural network in terms of how well it classified malignancies based on the available input features. The best five hundred neural networks at each generation were saved and used to create five hundred new neural networks through random mutation, whereby "offspring" neural networks were created from each "parent" neural network by varying every one of the parent's connection weights at random. (I also varied the neural thresholds, but from this point forward, I'll talk simply about the weights. It's possible to design a neural network so that the thresholds are equivalent to an additional set of weighted connections.) After four hundred generations, I evaluated the best-evolved neural network on the remaining data that I'd held out for testing. It classified more than 98 percent of the test cases correctly. This was better than the best results using other methods that were reported on the ftp site. Along with Gene and Mike, I wrote up the results and submitted them for possible publication to the medical journal Cancer Letters. The paper was received on May 15, 1997, and accepted for publication on May 18, 1997. To this day, I remain amazed at this quick turnaround, but clearly our work and the technology that it represented was of critical interest. With a bit of serendipity, our efforts were about to lead me into an entirely different adventure. BLUE H A W A I I : A N A T U R A L S E L E C T I O N
89
May 11, 199 7 I flew to Maui in May 1997 in support of our army breast cancer contract and worked for a week with Gene Wasson analyzing the data that he recorded. One day whe n Gene was unavailable, I decided to drive from my hotel in Kihei, in the southwest sector of the island, to Lahaina, in the northwest corner, and then on to the very end of the highway past Kapalua in West Maui. The northwest cliffs of Maui offer a spectacular vista that I really wanted to see. The main problem is that the northern road--the Kaupo road--from West Maui back to Wailuku, Maui's capital, remains mostly unimproved and is extremely narrow, curving around the cliffs. Travel on the stretch between Nakalele Point and Kahakuloa is recommended only to those with four-wheel-drive vehicles. I should have listened to the advice. Not paying any attention to the warnings or to the rental car agreement, which prohibits travel on this stretch of road, I proceeded. The road grew more precarious, and my speed was l o w - - a t least it wasn't raining, or nighttime, I thought. The cliffs were steep. One lapse in concentration, and I'd be over the edge. (Gene later told me that drivers who've had too much to drink and still try to negotiate the road, particularly on NewYear's Eve, may end up lost--literally. Later, sometimes years later, their cars would be spotted under the dense foliage that lines the cliffs below.) Off to my left was the island of Molokai, but my eyes were fixed ahead of me as the island moved gradually into my rearview mirror, then disappeared. The car radio could pull in only a few stations, and 90
SETTING
THE STAGE
I decided to leave it on the news. Most of the cars that passed me in the other direction were four-wheel drives. Was this a harbinger of bad things to come? The road narrowed further until it was really only one lane. At one point I was looking at what seemed to be a 3o percent grade up at least three hundred feet that appeared just wide enough for my small car. There was no turning around, no turning back. If another car came from the other direction I'd have to back my compact d o w n the hill. I didn't even want to go up the hill, let alone do it in reverse. As I inched cautiously up around the top of the grade and made it without any close encounters, it seemed that the worst was over. T h e n came the news bulletin on the radio: 1BM's computer, Deep Blue,
had defeated Garry Kasparov, the world chess champion. Deep Blue 1, Public o Perhaps the worst wasn't over after all. IBM's win, while well deserved, would have a profound impact on the public, and one that I viewed as strongly negative. Media coverage o f l B M ' s success would only reinforce the notion that Deep Blue was an intelligent machine, the epitome of a smart computer. There'd be no discussion about a machine adapting its behavior to solve new problems in new ways, or probing questions regarding how much (or h o w little) the computer had learned on its own. Attention would be focused solely on the end result, a win for the machine. The public would be misled into believing that C o m BLUE H A W A I I ; A NATURAL S E L E C T I O N
91
mander Data from Star Trek: Next Generation or HAL was just around the corner. I anticipated that there'd be no rematch between Deep Blue and Kasparov. IBM had little to gain in publicity or technological knowhow by accepting a rematch, and Kasparov would be unlikely to challenge the machine again. N o longer was there a compelling rationale for the competition. Both parties had too much to lose. The story was over. It would have been nice if the final result had been a draw, and we could have looked forward to a sequel. N o w there would be no sequel, no trilogy. Kasparov wasn't the only loser that day. The public lost as well, for the Deep Blue-Kasparov challenge matches served as a valuable opportunity to discuss the impact that intelligent machines will have on society (even if Deep Blue isn't what I consider to be an example of AI). The media coverage of "man versus machine" focused our collective attention on our future relationship with computers, a serious issue that deserves more consideration. Instead, with Deep Blue writing the final chapter of the tale, a discourse on serious issues devolved into issues of mere fame and celebrity. We were left with a machine as pop star. 3 I drove on in silent acceptance that a landmark in computer science had come and gone, much like the odometer of my car measuring another mile. 4 Finally, I crept out of the winding road and onto Highway 34o heading toward W a i l u k u ~ m o r e than forty miles and five hours later. 92
S E T T I N G T H E STAGE
Taking the Road Less Traveled Back at the hotel, I got to work correcting the galley proofs for the first issue of the IEEE Transactions on Evolutionary Computation, a new technical journal that I helped create with an international team of more than thirty-five other scientific colleagues acting as associate editors. ~ I'd been fortunate to be asked to be the founding editor-inchief of this new venture. In my hands, I held the first issue that was to be published within the coming month. I diligently went over the text trying to find mistakes. I didn't want to have to publish any corrections to articles in our very first issue! But after reading each paper in the journal for the third time, my mind wandered back to Deep Blue and Kasparov. Deep Blue's win was a significant engineering accomplishment, but from the perspective of designing intelligent machines, it seemed more like an admission of failure. If this is what it t a k e s ~ a monster machine like Deep Bluemjust to play chess at the level of the human world champion, what hope do we have of creating HAL or Commander Data? We can't expect to create Commander Data by taking on millions of different programming challenges, each as monumental as Deep Blue, so that we'll have a computer that plays chess, flies airplanes, shoots straight, speaks Spanish, composes music, interprets literature from nineteenth-century Russia, and so forth. IBM's success with Deep Blue seemed, ironically, to illustrate the fundamental limitation of the brute-force approach. I took this recognition as motivation to explore an alternative pathway. BLUE H A W A I I : A NATURAL S E L E C T I O N
93
Instead of capturing human expertise in a computer program that plays chess, why not evolve a program that plays chess? Such an approach might have a better chance of moving us closer to creating Commander Data. In fact, the work I was doing on evolving neural networks for breast cancer detection pointed in a promising direction.Just as we could evolve neural networks that learned how to detect a malignancy in a mammogram, we could evolve neural networks that learn how to play chess.But the really alluring experiment would constrain those neural networks to learn how to play without relying,
on human expertise in chess. I was determined to try it. What I envisioned went well beyond what Gene and I were doing. In the breast cancer effort, Gene knew the results for every mammogram he examined. Our immediate challenge was to see if an evolutionary algorithm could recapitulate Gene's decisions based on his own interpretation of the image features in each mammogram. It was a bit like IBM's efforts to train Deep Blue using a learning algorithm coupled with nine hundred games from grand masters. Here Gene was the grand master. What I wanted to do in chess was the equivalent of removing Gene from the loop. I wanted neural networks to learn how to assess their environmentmthe chessboard~and to adapt their behavior without having an expert on hand to tell them how to do it. I thought back to the movie War Games, and to the machine that taught itself to play tic-tac-toe. Similarly, I could set up an evolving population of neural networks, each representing a strategy for playing chess. Each neural network would receive information about the
94
SETTING
T H E STAGE
types and positions of pieces on the board, but no hints about which moves to make or formations to shape. The neural networks would need to discover patterns in the positions of the pieces on the chessboard and associate those patterns with good or bad outcomes. Those neural networks that played a better game of chess would be saved at each generation and used to create new offspring networks that would then compete with their parents. Over time, the process of evolution might create superior chess-playing neural networks. Furthermore, instead of being limited to one domain, as in chess, this evolutionary approach could in principle be extended to other domains. It could create its own solutions to problems for which we don't already know the answers. It was a risky approach, but it seemed like the right approach. I wanted to get started right away, but other obligations forced me to wait a full year before moving forward. 6
Two Heads Are Better Than One About twelve months earlier, I'd met Kumar Chellapilla at the 1996 Annual Conference on Evolutionary Programming in San Diego, California. He was a master's degree student at Villanova University in Philadelphia, Pennsylvania, who was interested in evolutionary algorithms. His focus was on designing mathematical filters. 7 The idea was that an evolutionary algorithm could create and optimize those filters quickly. Kumar was clearly enthusiastic and motivated. He was eager to
BLUE H A W A I I : A N A T U R A L S E L E C T I O N
95
learn, and he found the area of evolutionary computation enticing. As I later discovered, he was also one of the best programmers that I would ever meet. In the summer of 1998, Kumar and I discussed the possibility of evolving a chess-playing program. He'd thought about this possibility as well. The challenge was awesome. We kept thinking about the hardware that IBM brought to the task. The company had designed application-specific integrated circuits (ASICs) so that what would normally be performed in software could be accomplished in hardware at much greater speed. Certainly, Kumar and I would never be able to obtain similar hardware. Using what was then the state-of-the-art in desktop personal computers, such as the Pentium II running at 4oo MHz, we figured we'd be fortunate to evaluate one hundred chessboards in a second. That would make Deep Blue two hundred thousand times faster than anything we could develop. We might be able to assemble a cluster of these desktop machines and take advantage of a parallel approach to evolving a chess player, but each machine would cost at least $ I, 5oo, and we simply had no funding for our idea. We had time, energy, enthusiasm, one 4oo M H z computer, and no money. Chess would have to wait for another day. We set our sights a little lower.
96
S E T T I N G T H E STAGE
Checkers
One step below chess is checkers, also known in much of the world as draughts (pronounced "drafts"). The standard game is played on an eight-by-eight board with squares of alternating colors, although some other versions use larger boards.You need two players, denoted as "red" and "white" or sometimes "black" and "white," but we'll stick with red and white here. (If you're already familiar with the basics of checkers, you might want to skip this brief introduction.) You and your opponent start with twelve pieces (checkers) that begin in the twelve alternating squares of the same color that are closest to that player's side, with the rightmost square on the closest row to the player being left open (figure zz). Whoever plays as red moves first, then play alternates between sides. You can move checkers forward diagonally one square at a time, or, when your checker is next to an opposing checker and there is an open space directly behind that opposing checker, you can jump diagonally over your opponent's checker and remove it from play. If your jump would in turn place your jumping checker in position for another jump, that jump must also be played, and so forth, until no further jumps are available to that piece. 97
FIGURE 22
The starting position in the game of checkers. Each side begins with twelve pieces. The pieces can only move on the diagonal (numbered) squares. When a piece reaches the back row, it becomes a king and can move forward or backward. Jump moves are forced in checkers.
W h e n e v e r you have an available j u m p , you must play that j u m p in preference to a m o v e that does not j u m p . However, w h e n multiple j u m p moves are available, you have the choice o f w h i c h j u m p to make, even if one offers you the chance to remove m o r e o p p o n e n t ' s pieces than the o t h e r (for example, a double j u m p versus a single j u m p ) . If you advance a checker to the last row o f the board (which is also k n o w n as the "back rank") it b e c o m e s a king and can thereafter m o v e diagonally forward or backward. T h e game ends w h e n either player has no m o r e available moves, w h i c h most often occurs w h e n a player's last piece is r e m o v e d from the board but can also occur w h e n all existing pieces are trapped, resulting in a loss for the player w h o can't m o v e and a w i n for the opp o n e n t . D e p r i v i n g y o u r o p p o n e n t o f any available moves is the object o f the game. T h e game can also end w h e n one player offers a draw and the other accepts. 1 90
S E T T I N G T H E STAGE
The Challenge of Checkers Checkers isn't as complicated as chess. The game has only two types of pieces, and those pieces can only visit thirty-two of the sixty-four squares on the checkerboard. Still, checkers poses a significant challenge. No one has been able to prove, mathematically, whether the person who moves first can force a win by playing flawlessly. The reason is that the number of possible combinations of board positions is greater than 5 x 10 2~ which is too large to enumerate. 2 (As I mentioned earlier, there are only about 10 TM seconds in the history of the universe.) No computer would be able to consider every possible pathway through a labyrinth of five times 10 20 positions, at least not in our lifetimes, regardless of the program it uses. So rather than look to enumeration, programmers have adopted an alternative strategy for designing programs to play checkers. This strategy, like the approach that led to Deep Blue, relies on a great deal of human expertise and knowledge of the game gleaned over many years.
The Recipefor a Checkers Program Most checkers programs rely on a recipe of three essential ingredients. The first ingredient is a set of features about checkers formations that people have ascertained over time. For example, one feature is called "mobility," which is the measure of the number of alternative moves a player can make at any given point in the game. The thinking is that it's wise to increase your own mobility while simultaneously constricting your opponent's mobility. CHECKERS
99
Another feature is the presence of an unimpeded path to a king. This means that a player has a checker that can advance to the back rank without facing any opposition, thereby becoming a king. It's something like a prince just waiting to be a king. People have invented many such features over the years. In a checkers program, features are used to ascribe a numerical value to any given pattern of checkers on the board in order to compare two alternative positions and favor one over another. Although there's usually some variation in the specific features employed in each program, there's one constant: All the features are derived from human knowledge, from years and years of checkers history. The second ingredient in the checkers program recipe is a handcoded weighting function that combines the chosen features by emphasizing some more than others. Is it better to have a high degree of mobility without having an unimpeded path to a king, or would you be better off sacrificing some mobility to obtain an eventual king? Clearly, trade-offs are involved, and the goal of the weighting function is to determine the appropriate trade-offs. Each checkers program incorporates a mathematical formula that trades off the features in terms of their relative importance. These features come from the computer programmer who designs the program, or in some cases from the programmer in collaboration with masterlevel checkers players. 3 The programmer hopes to capture some sensible means for identifying the primary, secondary, and less important features of the game and to combine them so that the program favors one position over another, just as a human master might. Once again, the program 100
S E T T I N G T H E STAGE
relies on a human for the right answersmin this case, for the right weights. Being able to evaluate the worth of a specific configuration of pieces on a checkerboard is important, but it becomes truly valuable only when coupled with an ability to look into the future and assess the likely replies from an opponent. It's not the current position that's most pressing to the checkers player, it's the likely future positions. Thus the third ingredient for the typical checkers program is a procedure that enumerates all possible moves, and every potential reply from the opponent, up to some number of turns in the future. By enumerating all possible moves and replies, the program can evaluate the prospects of each potential future position, then determine which move is best in light of the possible exchange that would ensue.
Using the Minimax Principle Far and away the most common rationale for determining which move to make in checkers (as well as in chess and many other games) uses the minimax principle, which I touched on briefly in chapter 2. The goal of the minimax strategy is to minimize the maximum damage your opponent can inflict. Figure 2 3 illustrates how the strategy works. Suppose that you have three possible moves.You have to choose which one to play.You evaluate the resulting position of each move, then examine what your opponent is likely to do in that new position, assuming that she plays just as you do (meaning that she assesses alternative positions with the CHECKERS
101
+0.5
+0.1
/%
-0.2
same numeric value as you).Your opponent is t h i n g to beat you, so you have to figure that she'll do whatever she can to minimize your score. You can see from the tree in figure 23 that there are eight possible results of your move and your opponent's response. If you pick the first move, corresponding to the leftmost branch of the tree, the intermediate result looks good--its value is + 0 . 5 - - b u t your opponent can reply with a move that actually makes things worse for you, as shown by the negative scores. If you choose the second move, the intermediate result looks slightly worse than if you'd chosen the first move, but there's little your opponent can do to harm you after that. The third move seems bad by all considerations; the best you can obtain for the move, once your opponent responds, is - 0 . 5 . Under the 1 O2
S E T T I N G T H E STAGE
FIGURE: 2 3
To illustrate the minimax principle, suppose it's your move and you have three options. The current position is denoted as the circle at the top of the graph, with each move designated by a link to a corresponding square, which represents the position after you make your move. In turn, your opponent has three possible replies, unless you choose the third option, in which your opponent has only two possible replies. Each of these replies is again designated by a link to a corresponding circle, which represents the position after your opponent's move. The values in each square or circle indicate the degree to which you favor the position. Positive numbers indicate a more favorable position. The minimax principle suggests that you make the move that allows your opponent the opportunity to do the least damage to you. Here, although your first option puts you in a position with a value of +0.5, your opponent could choose a reply that makes the next position w o r t h - 0 . 3 . The opponent's other option does less damage to you, so we assume that he or she will choose the move that puts you in the circle marked -0.3. Thus the value of your first move, looking ahead two moves, is-0.3, not +0.5. Turning to your second move, we see that the worst case that your opponent can leave you in has a value of +0.05, in your favor. Examining the third option, we see that both your opponent's replies leave you in an unfavorable position, with a maximally bad value of-0.7. The minimax rule then states that you take your second option because it minimizes the harm that your opponent can do. If we looked farther into the future, your minimax move might change based on the values assigned to those future positions. The entire graph is called a game tree, with the links called branches.
m i n i m a x p r i n c i p l e , y o u s h o u l d c h o o s e t h e s e c o n d m o v e , b e c a u s e it allows y o u r o p p o n e n t t h e o p p o r t u n i t y to inflict t h e least d a m a g e . T h e m i n i m a x principle works well u n d e r m a n y circumstances, but it isn't a p a n a c e a . In s o m e cases, m i n i m a x s e e m s a l m o s t irrational. C o n sider t h e f o l l o w i n g : S u p p o s e y o u h a d t w o p o s s i b l e m o v e s . O n e e n CHECKERS
! 03
genders two possible opponent responses, each with values of, say, +0.05 and +0.2 points, respectively. The other move has two possible responses worth 0.0 and +0.9 points, respectively. Let's further say that +1.0 is defined as a winning position. The minimax principle demands that you play the first move, because the worst situation for you will be +0.05, which is better than the worst situation if you play the second move, which could be 0.0. But in making the first move, you give up the possibility that your opponent may make a mistake, and as a result you'd wind up in a situation worth +0.9, very close to a winning position. The difference between +0.05 and 0.0 seems small enough to perhaps take a chance, but minimax is risk averse and doesn't care how small the difference is. Clearly, the minimax strategy has its limitations, but so far its been difficult to come up with a better strategy. Thus despite the drawback of being highly conservative, minimax is a standard procedure found in practically all checkers programs. Summarizing, the standard approach to designing a checkers program starts by capturing features that checkers experts, or even grand masters, believe are important and proceeds to weight those features in terms of their relative importance, using more human judgment. A search is then made to assess the possible alternative future positions that would result from every possible move and countermove, up to some prescribed limit of moves to look ahead. The result of this protocol can be impressive, as we'll see in chapter 7, but it is limited to the intelligence of the programmer or human master.
1 O4
S E T T I N G T H E STAGE
Why Not Use Everything You Know? Human knowledge has always played an integral role in the design of good checkers programs. If my goal and Kumar's had been to construct a world-champion checkers program, we too would have used every available bit of knowledge about checkers, no matter what its source. People have been playing checkers for hundreds of years. It makes sense to take advantage of all that checkers knowledge,/f the
goal is to make the world's greatest checkersprogram. But that wasn't our goal. Instead, we wanted to explore the possibility of having a computer teach itself how to play a great game of checkers, not "merely" to create an expert checkers program. 4 We wanted to program a computer that would invent its own features and its own weights, even its own mathematical function for evaluating the features it created. We wanted to discover just how good a computer could become without relying on human checkers expertise. After all, artificial intelligence ultimately will have to face problems for which we don't already know the answers or have any resident experts.
Evolving Neural Networks to Play Checkers Evolution provided the design principle we needed to embark on our experiment. But where to begin? We'd need a program that could learn on its own, and we'd need to give it a means for adapting its behavior to meet the goal of winning games (and also not losing games)
CHECKERS
105
across environments that contained diverse competitors who relied on various strategies. We began by framing the problem of playing checkers in terms of pattern recognition, just as so many other checkers programs do. The checkers on the board constitute alternative patterns, and the program's task is to recognize those p a t t e r n s ~ t h e stimuli--and then r e a c t ~ t h e responsemso as to maximize its chances of winning. I thought back to my work on evolving neural networks for assisting in mammography. There our inputs were examples of different patterns of features that Gene Wasson interpreted. We also had a target output for each case, a verified result showing whether a malignancy was present. There the problem we needed to solve was devising a neural network that would match the existing diagnosis for as many of the examples as we could. Our hope was that a neural network that performed well in making the diagnoses for the data we'd acquired would also perform well on future data, when we wouldn't know whether the woman in question had cancer. A similar protocol could be applied to checkers, but what inputs would we feed into the neural network? The obvious answer was to use features that people think are important--but then we'd be right back to relying on human expertise. We could have the neural network learn how to weight the features in terms of relative importance, and in fact some of the earliest efforts to program computers to play checkers used a somewhat similar approach that I'll describe in chapter 8. But we wanted to avoid using features that human experts have already discovered and instead let the computer unearth the proper features in the same way that living organisms have learned 106
S E T T I N G T H E STAGE
to solve the problems posed by nature. Furthermore, we wanted the evolutionary algorithm to operate without any immediate knowledge about whether a particular decision was correct. W h e n evolving an "artificial mammographer," I had relied on all the available knowledge. Knowing whether a malignancy was present in a particular case was vital. But what if we hadn't possessed that information? What if all we could know was that, say, five out of ten sampled mammograms presented signs of malignancy? We might get feedback on how well we did in detecting cancer across the sample of ten mammograms, but suppose we didn't get any feedback on which specific cases were classified correctly. That was the sort of challenge that Kumar and I set up as we began to design our checkers program. Kumar and I hypothesized that by capturing the essential aspects of evolution in a checkers program, that program would itself be sufficient to create, improve on, and eventually discover neural networks that could play checkers as well as human experts. 5 Rather than rely primarily on human expertise from the start, we decided to eschew as much human expertise as we reasonably could and, in essence, start from the position outlined in the following thought experiment.
The G e d a n k e n Experiment Imagine yourself seated at a table. In front of you is an eight-by-eight board of squares that alternate colors. I'm seated across from you and tell you that we're going to play a game. We each start with twelve pieces placed on alternating squares, as shown in figure 22.You're playing "white" while I'm playing "red." The red player moves first. CHECKERS
I OW
Initially, you think I might be giving myself an advantage, but since you have no idea of what game we're about to play, you continue to listen. O n each turn, you can only move your pieces diagonally forward one square at a time, unless they are next to my opposing piece and there is an empty square directly behind that piece, in which case you're forced to jump over my piece. In fact, you must continue jumping over my pieces in succession if possible. In doing so, you also remove my piece(s) from the board. If you have more than one possibility to jump my pieces, then you can choose which way you'd like to execute the jump. Any of your pieces that makes it to the back row of the board becomes a special piece called a "king," and this piece can move forward or backward diagonally, again one square at a time unless it is involved in a jump. "Let's play," I say.You protest naturally that I haven't told you the object of the game. "Right, let's play," I say again, and I make my first move.You counter with a move. We play for several more moves and eventually I declare the game to be over. "Let's play again," I suggest. "But wait," you say. "Did I win, or lose, draw, or what? .... I'm not telling yet. Let's play again." N o w imagine that we play five such games, and only after the fifth game do I tell you that you earned, say, seven points for playing those games. I don't tell you which games earned you the points or even indeed if you might have started with, say, twenty points and lost points on each and every game. The only way for you to find out is to play another series of five games and compare the total points that you re-
'108
S E T T I N G T H E STAGE
ceive after that series. I will tell you, though, that seven points is better than five points. Here's the critical thought experiment: H o w long would it take you to become an expert at this g a m e ~ b e t t e r than, say, 99 percent of all other people who play it? H o w many games would you have to endure? What features of the game would you look for? One obvious feature might be the piece differential, that is, the difference in the number of pieces that you and I have. It would also be fairly easy to correlate whether it was good to be ahead on pieces or behind. Once you find that the game ends consistently when one player has no more pieces and that, during many trials, the player who ends with no moves receives fewer points, it would be clear that having more pieces is better. But how long would it take you to become really good at this game? The answer will remain unknown, because we can't conduct the experiment. By the time you play your first game of checkers you have so much knowledge about games that you have an inherent skill. Having even once played tic-tac-toe as a child, you know infinitely more about checkers than is presumed in our mental game above.You already know about board games, the spatial nature of each square on the board, that play takes place in turns, and that the patterns of pieces on the board may lead to a condition of victory or defeat. In our thought experiment, by contrast, you're a blank slate. Checkers is significantly more complex than tic-tac-toe, but how long would it take you to learn to play a game that's as simple as tictac-toe at the expert level if you weren't told the object of the game?
CHECKERS
| 09
It might take a frustratingly long time. 6 We can only speculate on the leap in difficulty that checkers imposes as compared to the simple child's game of tic-tac-toe, but this is the experiment that Kumar and I sought to conduct on the computer. To carry out our experiment, we'd rely on an evolutionary algorithm to adapt the behavior of neural networks that evaluated alternative positions based on the information presented on the checkerboard. Initially, the neural networks would have completely random connection strengths and would certainly play very poorly. But selection would cull out the worst-playing neural networks, and the survivors would reproduce. Random variation would mutate the offspring neural networks, and the competition for survival would begin again. The computer would adapt the behavior of the competing neural networks to meet the goal of survival judged by how many points they could earn in a series of games. Each generation would represent a new collection of neural networks, effectively a new assortment of adversaries, a new environment, a new challenge. Our hypothesis was that, using this simple two-step process of random variation and selection, the computer would teach itself to play checkers at a very high level, even without relying on expert knowledge of the game.
Newell ' s Challenge
The challenge we faced was made all the more significant by an early quote from Allen Newell, a winner of the National Medal of Science and a pioneer of computer science. "It is extremely doubtful whether there is enough information in 'win, lose, or draw' when referred to ! ~O
S E T T I N G T H E STAGE
the whole play of the game [such as checkers] to permit any learning at all over available time scales." Marvin Minsky cited Newell's remark in a 1961 review of the state of artificial intelligence research. 7 Minsky noted being in complete agreement with Newell. And he went on to suggest that the way to treat the challenge of constructing good game-playing programs is by tackling what's called the "credit assignment problem." This involves assigning credit to individual moves that lead to good positions and blame to moves that lead to poor positions. The credit assignment problem still features prominently in many machine-learning efforts today. It's a difficult problem, because the effects of different moves often aren't felt until many moves later. A poor position might arise from a bad move made, say, six or more turns in the past. Nevertheless, Minsky and Newel] viewed solving the credit assignment problem as the appropriate challenge to accept. Instead, attempting to conjure up game-playing programs based on feedback that was limited to nothing more than the final outcome of a long series of moves seemed simply impossible. I had a different intuition, so we decided to take Newell's challenge one step farther.Just as in the thought experiment above, Kumar and I would give feedback to our computer only after a series of games. The computer wouldn't even get to know which games it had won, lost, or drawn. This scenario raised Newell's hurdle considerably. The typical credit assignment problem involves assigning value to moves:You have to decide which moves were responsible for the outcome and to what degree. Now, instead, it involved entire games: Which games ended CHECKERS
| ! 1
in victory or defeat? W h a t was the value of any particular entire game? And how could a machine teach itself to play a complex game like checkers without explicitly answering those questions, working with a single numeric value that represented the performance achieved across several games? H o w could a mere machine figure out which moves to make in a game w h e n it wouldn't even get feedback on w h e t h e r that game resulted in a win or a loss? We believed the answer was to be found in evolution, and we set out to create a "Darwin computer" that would evolve itself into an expert.
1 12
S E T T I N G T H E STAGE
Chinook: The M a n - M a c h i n e Checkers Champion
Any good scientific endeavor starts with an assessment of what has come before. So as Kumar began work programming our first attempt, I spent some time reading up on the history of computer programs for checkers. Ignorance of history leads to repeated efforts and, often, repeated failures. I knew very little about what had been accomplished in programming computers to play checkers. What's more, much of what I believed to be true, based on my limited knowledge, was actually in errormincorrect information that had been passed down from others over many years. M y initial search on the Internet showed that the current world champion at checkers is a computer program named Chinook. Professor Jonathan Schaeffer and his colleagues at the University of Alberta designed it. The program operates much like Deep Blue, relying on human expertise and brute force to play at the highest level. I had a lot to learn about checkers-playing programs, and Chinook was a good place to start. 113
Chinook Schaeffer began designing Chinook in 1989 with two goals in mind. He wanted to build the world's best checkers player and also solve the game of checkers. W h e n computer scientists speak about "solving" a game, they mean proving whether either player can, without exception, win the game by following a certain sequence of moves. Some games can be won; others can't. For example, you probably know that tic-tac-toe will play out to a draw if the player who moves second doesn't make any mistakes. No matter what the first player does, he or she can't force a win. The game is solved, and just about everyone knows the solution. Checkers remains unsolved. One reason we haven't resolved the mystery yet is the huge number of possible combinations of positions that might occur during a single game (recall, there are about five times 102o total individual positions). There are too many routes from the start to the end to consider each one individually. Solving the game, however, might not require enumerating all possible positions and pathways. If we could find even a single way to force a win for either player, that would suffice. All we need is one chain from the beginning to the end, in which one player can link together a series of positions that force the outcome to a win. So far that pathway has remained elusive, and the huge number of variations in possible play make it unlikely that an answer to the question of whether checkers can be won by either side will be soon forthcoming. This challenge still stands before Schaeffer, but the chal-
$ '14
SETTING
T H E STAGE
lenge of making a world-champion checkers program does not, as we'll see. From the beginning, Schaeffer focused intently on designing the strongest possible program, as opposed to exploring the possibilities for a machine to learn about checkers for itself. 1 As Schaeffer wrote in One.Jump Ahead, a very interesting review of his efforts with Chinook, his viewpoint was in line with the premise of the Turing Test, in which artificial intelligence is about creating the "illusion of intelligence"--a program is said to be artificially intelligent if it can demonstrate a skill that people usually consider to require intelligence and if it can do so with a reasonable level of competence. 2 Under this definition, the means for acquiring that skill are irrelevant. It doesn't matter if the computer teaches itself or relies on human expertise. All that matters is the final product: Does the computer win? Can it beat humans, even grand masters? In the end, Schaeffer was highly successful, achieving his goal of creating the world's greatest checkers program.
The Heart and Brain of Chinooh At its heart, Chinook began with an evaluation function that compared the relative merits of alternative checkers positions. Several features went into this function, all based on Schaeffer's instincts about what a good checkers evaluation function should contain. The primary feature was the piece count. Schaeffer used a baseline score of one hundred points for every checker. If Chinook had
CHINOOK
1 15
six checkers and its opponent had four, Chinook would be ahead by a score of +200. Next came the score for a king. The conventional wisdom in checkers is that a king is worth 50 percent more than a checker. Chinook started out with this value as well. After some experimentation, however, Schaeffer reduced the king's extra value to 3 o percent, with considerable improvement in Chinook's play. Many of the remaining features on which Chinook relied are less obvious, at least to the novice player. One involved "trapped kings." If you get a king, but you can't move it because your opponent would capture you, that king is less valuable. Schaeffer penalized Chinook by making trapped kings equivalent to regular checkers. Similarly, opponents' trapped kings were also viewed as regular checkers. Another feature was a "runaway checker." If Chinook could see a path for advancing a checker to the opponent's back rank without any opposition, then that piece was almost as good as a king already. Schaeffer scored this as +50 points for the eventual king, less a penalty of three points for each move that would be needed to advance the piece all the way to the back row. Other features included the "turn," "mobile kings," the "dog hole" (a special configuration of checkers on the board, shown in figure 24), and a few others. Each was assigned a different point value indicating its perceived importance. 3 Chinook's overall assessment of a board came from summing up these weighted features. Using the evaluation function, Chinook could then compare one board to another and determine which position appeared more favorable. 1 1 6
S E T T I N G T H E STAGE
FIGURE 24
Illustration of the position known as the "dog hole." White's checker on square 5 is trapped by red's checker on square I, placing white in the dog hole.
Weighting the Features Where did the weights for these features come from? Initially, they came directly from Schaeffer himself, and a colleague, N o r m Treloar, who was an expert checkers player. Over time, however, Schaeffer changed the weights on Chinook's features to correct for mistakes observed during match play. If Chinook made an apparent error or entered a needlessly difficult sequence of moves, Schaeffer would try, by hand, to revise the weights of the terms to make Chinook's decision "correct." Schaeffer admitted that doing this was often difficult because he wasn't an expert player and was unable, in many cases, to question Chinook's judgment reliably. In some situations, however, in which Chinook had entered a position thought to lead to a loss, Schaeffer simply instructed Chinook CHINOOK
! 17
never to enter that position again, regardless of what the evaluation function indicated, essentially overriding the program's judgment. It was easier to rule out these troublesome positions than it was to refine the weights of Chinook's evaluation function to have the program handle the situation appropriately, especially since Schaeffer had to refine the weights by trial and error.
Tuning the WeightsAutomatically At one point in Chinook's development, Schaeffer contacted the team from Deep Blue. He'd heard that they were using a machine-learning method to adjust the weights of Deep's Blue's evaluation function. As I described in chapter 2, their strategy was to take games played by grand masters and to analyze hundreds of positions, comparing the move that the grand masters made to the moves suggested by Deep Blue's evaluation function. The machine learning method was designed to tune the weights of the evaluation function to make the function's evaluation agree with the grand masters'. In essence, the grand masters' moves served as a training set of examples. The task was to adjust the evaluation function to best fit those examples. Thus this method still required previous human expertise, but perhaps here was a way to tune the weights of Chinook's evaluation function automatically. If the evaluation function would recommend the same moves that the grand masters made, maybe that function would capture something useful about how to treat new positions that it might encounter.
I 1 O
S E T T I N G T H E STAGE
Schaeffer obtained the programming for this machine-learning algorithm from the Deep Blue team and applied it to Chinook, using a large volume of opening moves he had acquired.At first the method appeared to work, since it was able to more closely match the moves indicated in the literature, but the weights turned out to be drastically different. When Schaeffer played Chinook using his old version of weights against Chinook using the new ones, surprisingly, the old version w o n . 4 Even though the new weights generated moves that were in greater agreement with grand-master openings than Schaeffer's old hand-tuned weights, the new evaluation function was unable to handle the rest of the game effectively. Schaeffer refrained from further efforts to tune the weights of Chinook's evaluation function automatically and instead reverted to the previous hand-tuned set.
The Opening Rather than expend more effort adjusting the terms in Chinook's evaluation function, Schaeffer turned to the opening and endgame to bolster Chinook's performance. For the openings, he transcribed the known lines of play from Richard Fortman's Basic Checkers into the computer's memory, building an opening book of moves that incorporated many years of grand masters' expertise, s Chinook would then rely on those standard lines of play unless Schaeffer had discovered a plausible innovative move (called a "cook") to spring on an unsuspecting opponent.Anyone playing against Chinook would, in essence,
CHINOOK
1 ! 9
be opening against all the grand masters recorded in Basic Checkers, acting out the movements in a play scripted many years earlier by checkers experts. 6 Initially, Schaeffer programmed Chinook to include about four thousand such openings, but by the time of its retirement in 1996 it could access more than forty thousand opening lines of play. The opening game database served as a guide for Chinook, flagging losing moves as well as winning moves and providing the basis for a reduced search when more than one move was recommended. Schaeffer wanted Chinook to take its opponent out of the known lines of play as soon as possible. In checkers, you can lose some games early by making one wrong move in the opening sequence. 7 Forcing opponents to play "on their own" meant a greater chance of capitalizing on potential mistakes. 8
The Endgame For the endgame, Schaeffer and his colleagues and graduate students began a complete enumeration of all possible positions to determine whether each could be forced to a win, a loss, or a draw. They began with the positions that included just two pieces. Some of these positions can result in a draw, but others end in a win for one side and a loss for the other. Using a computer, they exhausted all the possibilities and determined the potential outcomes for every instance with complete certainty. They then leveraged the two-piece database to construct the three-piece database, and so forth. I ZO
S E T T I N G T H E STAGE
The problem is that the number of possible positions increases rapidly with the number of remaining pieces. There are just under seven thousand positions with two pieces but over seven million positions with four pieces. This increases to about z. 5 billion positions with six pieces and four hundred billion positions with eight pieces. In August 1989, only two months after beginning work on Chinook, Schaeffer had already computed the four-piece endgame database. The implication was that once Chinook could see ahead far enough to reduce the number of pieces down to four, it would never make a mistake. This sort of perfection in the endgame is doubly powerful. First, unless there's a bug in your database, your program will never make an error. Second, failing to have an endgame database is a tremendous handicap, because it forces you to rely on an evaluation function to control the play in the endgame. Instead of simply knowing that a certain position can lead to a win or a loss, the program must search ahead, anticipating the opponent's responses to each possible move, one turn, two turns, and more into the future and then use the evaluation function to determine which position to favor, and therefore which move to make. Unfortunately, many endgame positions require an extensive series of moves to manipulate a winning position (or a draw), and these can cripple a program that relies on an evaluation function. For example, consider figure 25, which shows what's known as "first position" in checkers. With white to move, white can force a win. But if you were to rely on brute force to search far enough ahead to actually see the desired end state of the game, you'd need to look ahead CHINOOK
1 21
ten moves on each side, which might seem impractical (although with programming tricks, this can be accomplished). Let's examine this scenario a little further. As white, you have two kings and six possible moves. Two of these moves lead directly to a loss (look at figure 25 to see this for yourself), so let's say there are four alternatives to consider. In reply, your opponent will have one, two, or possibly three reasonable responses. For the sake of the example, let's assume that you have four reasonable moves on each turn and that your opponent has two reasonable replies.You must therefore consider eight (four times two) different possible outcomes for every complete turn. Looking two turns ahead for both you and your opponent means considering eight times eight, or sixty-four, possibilities. Extrapolating on this, peering ten moves into the future on each side requires examining 81~, or 1,073,741,824, alternatives. N o doubt this n u m b e r overestimates the real n u m b e r a bit, because we might find that, on average, there were fewer than eight possible alternatives for each move and countermove, and we might be able to discard more moves that were obviously foolish. But the point remains the same. If the goal is to develop a superior checkers program, it's m u c h better to store perfect knowledge in an endgame look-up table than it is to search ahead and evaluate alternatives. What's more, by using an endgame database, you effectively move the end of the game back several moves. All you have to do is find a path into the endgame database that results in a known win, or at least
122
S E T T I N G T H E STAGE
FIGURE
25
A standard checkers position known as "first position." White has two kings, while red has a single king and a checker. With white to move, white can win. With red to move, red can hold on to a draw. Students of checkers study positions like this carefully. Mastering the endgame is critical to superb play, but some of the required sequences of moves extend beyond ten or even twenty moves per side.
a draw. F r o m that point, the database takes over and you can search for the next perfect move. All the pressure's on your o p p o n e n t , w h o must c o n t e n d with your perfect moves in next to no time at all.
Chinook Heads to the National Championship W i t h the four-piece e n d g a m e fixed in m e m o r y , C h i n o o k w o n the C o m p u t e r O l y m p i a d in I 9 8 9 . B y the time o f the U.S. National C h e c k ers Championship, held August 13 to
I
8, 199O, C h i n o o k ' s e n d g a m e
database included all six-piece endings. D u r i n g the t o u r n a m e n t , Schaeffer and his colleague Treloar m a d e an interesting modification to C h i n o o k ' s evaluation function: T h e y multiplied the value o f a board by a "fudge factor" that favored boards with m o r e pieces. T h e t h o u g h t was that their h u m a n opponents would
CHINOOK
123
have a tougher time with more complex positions. All other things being equal, Chinook should favor those positions with more pieces remaining. The question was, How much more deference should Chinook give to those positions? Over dinner, Schaeffer and Treloar decided to tinker with Chinook's evaluation function by multiplying boards with twenty to twenty-four pieces by a factor of 1.34, boards with sixteen to nineteen pieces by a factor of 1.17, boards with twelve to fifteen pieces by 1.00, and boards with eleven or fewer pieces by 0.83. This meant that if Chinook compared two boards and its evaluation function measured the two positions as being equal, but one had eighteen pieces on the board and the other had only fourteen pieces, the former position would be I7 percent better than the latter. Their choice of the key value 0.17 was arbitrary, but they never sought to improve the value. 9 There was little impetus for making more modifications, because at the end of the tournament, Chinook finished in second place to the all-time great player and then-world champion Marion Tinsley. Along the way, Chinook had defeated Don Lafferty, regarded as the second-best player in the world.
The Tinsle), Challenge The winner of the U.S. Championship earns the right to challenge the world champion. By finishing second in the 199o tournament, Chinook was the rightful challenger to Tinsley, but the American Checkers Federation (ACF) and the European Draughts Association 124
S E T T I N G T H E STAGE
(EDA) refused to sanction any eventual world championship match between a human being and a computer. In protest, Tinsley resigned his title and played a one-on-one match against Chinook in I992 for a man-machine world championship. Before the contest, the ACF modified its position, creating a new world championship title of "man versus machine" and awarding Tinsley the title of world champion emeritus. 1~At the time, Tinsley was rated
2,812, with Chinook slightly more than one hundred points behind at 2,7o6.11 The Tinsley challenge was held August I7 to 29, I992, in London. Silicon Graphics sponsored the match and loaned Schaeffer a $3oo,ooo Silicon Graphics 4D/48o computer to run Chinook. (Comparing it to today's machines, Schaeffer informed me that this machine ran Chinook at about half the speed of a I GHz PC.)12 In the end, Tinsley edged out a victory in the forty-game match with four wins to two for Chinook and the remainder finishing in a draw. 13 A thirty-game rematch was held two years later in August 1994. By then, Chinook employed twice as many processors as its 1992 version, each of which was four times faster than what it had used before. 14 Chinook's endgame was upgraded to include all positions with eight or fewer pieces. The program would never make a mistake with eight or fewer pieces on the board. In addition, Schaeffer had obtained a huge opening book of forty thousand moves from the lead developer of a competing checkers program, called Colossus, from the United Kingdom. Chinook was playing at an apparently flawless pace. In preparation for the 1994 Tinsley rematch, Chinook played in the Southern States CHINOOK
! 25
and U.S. Championships, as well as in head-to-head matches with grand masters Derek Oldbury and Don Lafferty, scoring no losses in ninety-four games. 15 Six games into the rematch with Tinsley, with each one ending in a draw, Tinsley resigned, citing health reasons. He had been experiencing stomach pains, which were later diagnosed as symptoms of pancreatic cancer. 16 With Tinsley's resignation, Chinook acquired the world manmachine checkers championship title, but the way in which Chinook had earned it generated heated debate over whether the title really meant anything. Don Lafferty, then the second-highest-rated checkers player behind Tinsley, followed with a twenty-game match against Chinook, resulting in a draw, with one win each. A rematch between Lafferty and Chinook in January 1995 earned Chinook an "uncontested" man-machine title with one win and thirty-one draws in thirty-two games. Unfortunately, Tinsley's condition degraded, and a rematch with Chinook became impossible. Tinsley died on April 3, 1995, at the age sixty-eight, unarguably the best checkers player ever to have lived. 17
The Man-Machine Champion Retires Today Chinook is effectively retired. In 1996 Schaeffer offered that Chinook was rated at 2,814, well beyond the closest human competitors, Ron King and Don Lafferty, rated 2,632 and 2,625, respectively.18you can play against the 199o version of Chinook over the Internet at www.cs.ualberta.ca/-chinook. 19The program includes the 26
S E T T I N G T H E STAGE
six-piece endgame database and plays at three different settings: novice, amateur, and intermediate, in increasing order of difficulty. Even the novice setting generates a formidable opponent that can sometimes defeat human masters. A "Wall of H o n o r " has been erected on the website to honor those who have defeated Chinook at any of these settings since December I995 .20 The Chinook Wall of H o n o r is billed as " O n e of the highest honors in the Game of Checkers." Chinook was the first computer program to become the world champion in any of the standard games of skill, such as checkers, chess, backgammon, and Othello, a remarkable achievement. 21 Chinook is a formidable opponent, and one that Kumar and I would keep in mind as an eventual challenger to our evolved checkers player.
CHINOOK
$ 27
This Page Intentionally Left Blank
S a m u e l ' s Learning Machine
Given the fact that Chinook was the first checkers program to become a world champion, you might think that it would have received a great deal of attention in the computer science community. But I hadn't heard much about Chinook before I began my research. In contrast, the best-known effort in programming a computer to play checkers is practically a legend. It dates back to the late I95OS and the work of Arthur Samuel. There's a certain irony in this, which we'll see as we proceed, since Chinook's obscurity likely stems directly from Samuel's prominence. Conventional wisdom says that Samuel essentially solved the game of checkers by allowing a computer to teach itself to play while defeating a master-level human opponent in the process. In fact, this view of Samuel's pioneering efforts has been so pervasive that it undoubtedly served to focus attention away from checkers and onto chess during the 197os and 198os. After all, few people want to devote their time to a game that's already been "solved." As I learned while researching Chinook, however, conventional wisdom is mistaken. In truth, Samuel never came close to solving checkers, and his program's success was more mirage than reality. 129
Samuel's approach, nevertheless, was quite innovative and attractive. Let's review the details surrounding his effort.
Experiments in Machine Learning In the I95OS, Samuel worked at IBM and had access to the state of the art in computing hardware. His interest in game playing served as a useful means for testing the new computers.As early as 1954, Samuel had programmed an IBM 7o4 to play a self-described "interesting" game of checkers based on a routine in which the computer could improve by playing against itself. Samuel's procedure pitted two programs against each other. Each program relied on a set of parameters (weighting numbers) that described the relative importance of a variety of possible features used to assess different positions on the board, much like the evaluation function used later in Chinook. Samuel, however, used a form of what we would now call "reinforcement learning" to adjust the parameters for one of the programs during a game rather than tune them by hand. This was one of the earliest, if not the first, use of such an automatic programming procedure.
Starting Again with Human Expertise Samuel began by identifying the features to incorporate in his evaluation function and assigning them point values for their incremental contribution in assessing a position. He started with the advantage
30
S E T T I N G T H E STAGE
in the number of pieces that one side has over the other, then cataloged an array of possible additional features. Some of these features represented very sophisticated concepts of checkers play. For example, consider the following three features:
Gap: This feature assigned one point for every empty square that separated two passive pieces along a diagonal or that separated a passive piece from the edge of the board. (A piece was considered active if it was that player's turn to move, otherwise it was passive.) Dyke: This feature assigned one point for each string of passive pieces that occupied three adjacent diagonal squares. Dia: This feature assigned one point for each passive piece located in the diagonal files that terminated in the double-corner squares. The rationale behind these features isn't immediately apparent, at least not to someone untrained in checkers. W h y are they important? What is the gap feature intended to measure? W h y is it important to have three pieces in a row on a diagonal? W h y are diagonals that connect to the double-corner squares important? Presumably, in answer to the last question, the diagonals that connect to the double-corner squares are important because if you have a king that's threatened and can reach a double corner, then you might be able to toggle back and forth between those squares and stave off defeat. I really can't answer the other two questions. I certainly can't say that the point score for each parameter is reasonable, but then again, I know very little about the fine points of checkers. Rather than embark from a place of ignorance, as Kumar and I were
SAMUEL'S LEARNING MACHINE
131
doing, Samuel assimilated a total of thirty-eight features that were generally believed to be important in assessing the value of a checkerboard. Samuel had clearly done his homework. In fact, some of the features involved obviously advanced knowledge about checkers. For example, consider the following feature: Guard: With the idea of trying to reward controlling the back row,
this feature was credited with one point if there were no active kings and if either the "bridge" or the "triangle of oreo" was occupied by passive pieces. I'm not ashamed to say that when I first read this passage, I had no idea what the "triangle oforeo" was. Then I saw that Samuel included another feature: Oreo: This parameter was credited with one point if there were no passive kings and if squares z, 3, and 7 for red (black) or squares 26, 3o, and 31 for white were occupied by passive pieces.
Figure 26 shows the triangle of oreo for each player. Is this an important defensive position? It seems that it might be. If, say, red occupied squares z, 3, and 7, then white might have a tough time trying to get a king. White's only pathway that couldn't be interdicted by red would be along the leftmost edge, following squares 13, 9, 5, and i. Similarly, if white could crown a king on square I, it would have to retreat along the same path on which it had arrived, retracing its steps. Even when reaching square 9, it would still need to proceed to square 13. Moving to square 14 would pose the risk of having the red player swap out the king for a single checker by playing 7-IO. 132
S E T T I N G T H E STAGE
FIGURE 26
Illustrating the "triangle of oreo" for both red and white. The triangular pattern offers a strong defense of the back rank, allowing the opponent only a single uncontested path in and out.
W h a t about the "bridge"? I didn't k n o w what that was either. Samuel described it by offering another feature: Back: This feature was credited with one point if there were no active kings on the board and if squares I and 3 or 3o and 32 in the back row were occupied by passive pieces.
By examining the positions of these squares on the board, it seems that the bridge tries to accomplish much the same thing as the triangle of oreo. By having, say, red occupy squares I and 3, it becomes impossible for white to advance a checker freely to red's back row. Any white piece that reached square 5 would be stuck. You might recall that this position is called the dog hole (see chapter 7). Any white piece on square 5 would have to wait for red to move 1-6 before it could come back into play. Furthermore, any white piece that reached squares 6, 7, or 8 could be taken. SAMUEL'S LEARNING M A C H I N E
t 33
O f course, this play presumes that white leaves the squares 9, I o, I I, or 12 empty. If instead, white occupies, say, square I o, then white could advance its checkers to get a king on square 2 by moving 11-7-2 or 9 - 6 - 2 without red being able to do anything about it.
Evaluating Alternative Checkerboards Having assembled almost forty such features of checker formations, Samuel's next step was to devise an evaluation function that placed the appropriate emphasis on each feature. Rather than attempt to weight each feature by hand using trial and error, Samuel sought to have the computer learn its own weights automatically. I'll describe his method shortly. Presumably because of computer memory limitations, Samuel restricted the evaluation function to use only sixteen features at a time and created a means for swapping features in and out of the evaluation function over the course of several games. Samuel called this "term replacement," which I'll detail below. Using the sixteen features that were included at any given time, Samuel's evaluation function had to weight each feature properly both in terms of its sign (positive or negative) and its magnitude. A large positive value for a feature meant that if that feature were found in the current checkerboard, the score for the board would increase by that same large positive value multiplied by the number of points assigned to the feature in question. A small negative value for a feature meant conversely that the score would be decremented
$ 34
S E T T I N G T H E STAGE
by that same small amount multiplied by the n u m b e r of points for the feature. For instance, suppose we were playing red, it was our turn, and we wanted to evaluate the board shown in figure 27 using the following three features:
Apex: This feature was debited by one point if there were no kings on the board, if either square 7 or square 26 was occupied by an active piece, and if neither of these squares was occupied by a passive piece. Mob: This feature was used to assess a player's mobility. It was credited with one point for each square that the active side could move to, disregarding jumps. Node: This feature was credited with one point for each passive piece that was surrounded by at least three empty squares. Suppose further that the weights for these features were 2, 8, and - 4 . (I'm not asserting that these are good weights for these features. The values are chosen simply for the sake of the exercise of computing the evaluation.) By examining the board in figure 27, we see that apex is debited one point because there are no kings on the board, red has a checker on square 7, and white does not have a passive checker on square 26. M o b is credited with six points, because there are three active pieces for red and each can move into two possible squares. Finally, node is credited with two points, because white has two passive pieces that are each surrounded by at least three empty squares. Therefore, using these three features and their associated weights, the evaluation function scores the configurations as:
SAMUEL'S LEARNING MACHINE
~ 35
FIGURE 27
Suppose we're playing red and it's our move. Evaluating the position in light of the features apex, mob, and node, we find that apex is debited one point, mob is credited with six points, and node is credited with two points. If we were weighting these three features with values of 2, 8, and -4, respectively, then the overall score for the position would equate to 38 points.
2 x - 1 + 8 x 6 + - 4 x 2 = 38 points. T h e score for each feature is multiplied by its associated weight, and the total score is c o m p u t e d by adding up all the individual scores.
Tke Nitty-Gritty of Samuel's Reinforcement Learning, Samuel's idea for having the c o m p u t e r d e t e r m i n e its o w n weights was to let the c o m p u t e r learn the appropriate values by playing against itself. T h e m e t h o d w e n t as follows: Two versions o f the evaluation function were used at the same time, each representing one o f the players in the game. Samuel n a m e d these functions " A l p h a , " and "Beta. ''1 At the start o f the procedure, Alpha and Beta used the same weights for every feature, but Alpha's weights were m o d i f i e d d u r i n g each game. T h e idea was to have a position's current evalu! 36
S E T T I N G T H E STAGE
ation match the value that would be calculated for a board that would be seen several moves into the future. In essence, the evaluation function was no longer assessing the current position but rather making a prediction about what the situation would be like several moves ahead. Alpha stored its computed values and compared them with the values that were later assigned to positions encountered on successive moves. The difference between these moves was called delta. If delta were positive, that meant that the initial assessment was too low and that those features that contributed a positive score should have been given more weight, whereas those that contributed a negative score should have been weighted less. Conversely, if delta were negative, then the initial assessment was too high, and those features that contributed a negative score should have been given more weight, while those features that contributed a positive score should have been weighted less. Samuel adjusted the weights according to this scheme, but the actual procedure for making the adjustment was a bit "involved," as he put it. The essence of the routine was to examine the correlation between each individual term and the sign of delta. After each play, the term that had the greatest correlation between its weight and delta was set to a m a x i m u m preset weight, and all other terms were scaled in proportion to that weight. This meant that Alpha's weights were being updated continually, while Beta remained constant from the first move of the game to the last. What's more, based on the correlation between each feature's weight and delta, those terms for Alpha that had very low correlation SAMUEL'S LEARNING MACHINE
~ 37
with delta consistently were removed and replaced by terms in the inactive list of twenty-two remaining features. In this way, Alpha could search continually for the best combination of features, and the weights of those features, in its attempt to improve on the play offered by Beta. At the end of a game, if Alpha defeated Beta, then Beta took on Alpha's weights. If, on the other hand, Beta defeated Alpha, Alpha received "black marks." After a number of these black marks, some radical changes were made to Alpha in an attempt to find a better combination of features and weights. To summarize, then, Samuel's program attempted to learn how to play a good game of checkers by starting with thirty-nine possible features of the game. The first of these features was the advantage in pieces that one side had over the other. This is basic information, and even a novice player would recognize its importance. The remaining thirty-eight features represented advanced human expertise about patterns believed to be important in playing competitively. Samuel's program was instructed to find an optimal subset of sixteen of these thirty-eight features and to learn how to weight them, so that it could improve its play continually. Alpha changed its weights and features during every game with the goal of finding an evaluation function that would be a good predictor of the value of future boards. If Alpha managed to beat Beta, Beta then became the repository for that learned information. If instead Beta defeated Alpha, after some number of setbacks, Alpha was essentially restarted, and the search was begun anew.
! 38
S E T T I N G T H E STAGE
Look-Ahead and Memorization Samuel's program included some other procedures that deserve mention. As Chinook would do later, Samuel's program searched ahead in the game using a minimax algorithm to choose the appropriate move. At this point, it would be useful to introduce a new term, called the ply. A ply is a move made by either person in the game. If you move and I respond, that's two ply. 2 If you move, I respond, and then you respond, that's three ply. Samuel's program searched ahead a m i n i m u m of three ply, but possibly as far ahead as twenty, depending on the number of forced jumps encountered along the way. W h e n the program encountered jumps, the ply was extended. The baseline case was to stop at four ply and to calculate the presumed value of the resulting board as long as the next move wasn't a j u m p and no exchange offer was possible. Samuel also included a procedure in his program for what he called rote learning. As the program evaluated different boards, it stored the computed value in a look-up table so that it could more quickly compute more positions. Accomplishing this efficiently involves many details, which are best left for technical papers. Making a table for looking up the computed value of positions that you have already seen is reasonable, but it doesn't necessarily make you a better player. If your evaluation function is poor, all that the look-up table will enable you to do is to search even farther ahead with your poor evaluation function and to lose faster. W h e n com-
SAMUEL'S LEARNING MACHINE
139
paring the effects of rote learning versus learning by self-play, Samuel's innovative adjustment of the weighting terms in the evaluation function that occurred during self-play was much more important.
The Fine Line Between Legend and Myth Many interesting anecdotes surround Samuel's program. Perhaps the best place to begin is back at IBM in I956. By then, Samuel's program had learned to play well enough to defeat novice players. 3 Early in the year, on February 24, I956, Samuel demonstrated his checkers program on television. IBM seemed to take significant interest in promoting the event. Legend has it that Thomas Watson, president of IBM, arranged for the program to be shown to stockholders and predicted, correctly, that IBM's stock would rise fifteen points. 4 Three years later, in 1959, Samuel published his first paper on the checkers program in IBM's own journal of research and development. The late 195os were the formative years of artificial intelligence, and a lot of exaggeration about computers and what they would eventually be able to accomplish was in the air. Samuel, in contrast, was actually quite cautious, writing in the concluding section of his paper: "While it is believed that these tests have reached the stage of diminishing returns, some effort might well be expended in an attempt to get the program to generate its own parameters for the evaluation polynomial." Put more plainly, Samuel was asserting that a great deal of work would be needed to make relatively smaller gains in the program's 40
S E T T I N G T H E STAGE
level of play. There would be no leapfrogging straight into a world championship. In his paper, Samuel was also acknowledging his reliance on human expertise and expressing his hope of finding a way to circumvent the need for such prior knowledge. But almost ten years later, when he published a follow-up paper in 1967 describing the advances he'd made in the intervening years, he wrote that "no progress has been made in overcoming [this defect]. ''5 (I noted this remark well, since it indicated an opportunity for Kumar and me to solve a problem that appeared beyond Samuel's reach.) Overall, Samuel seemed reticent to join in the hype that surrounded artificial intelligence research. This reluctance is all the more ironic, given the mythic status that Samuel's program was about to attain.
The Nealey Challenge Match As the field of artificial intelligence blossomed in I 9 6 I , Edward Feigenbaum and Julian Feldman, two professors at the University of California at Berkeley, prepared an edited volume of papers in a nowclassic text titled Computers and Thought. They asked Samuel to contribute to their book by reprinting his 1959 paper, along with an appendix of the best game that the program had ever played. This event marked the beginning of a fable. Samuel used this opportunity to play a challenge match against Robert W. Nealey, a checkers player from Stamford, Connecticut. The connection between Samuel and Nealey is unclear, but in the prefSAMUEL'S L E A R N I N G M A C H I N E
! 4|
ace to Computers and Thought, published in I963, Feigenbaum and Feldman credit Samuel "for service beyond the call of duty in arranging and running the 7o9o-Nealey checker game. ''6 It seems safe to assume that Samuel was the person primarily responsible for selecting Nealey. Why did Samuel pick Nealey? How good was Nealey? Here is where the facts become most uncertain. At the time, IBM trumpeted the challenge match, describing Nealey as a "former Connecticut checkers champion, and one of the nation's foremost players. ''7 In contrast, in One Jump Ahead, Schaeffer writes that Nealey was not a former Connecticut champion as advertised at the time of the match, although he did earn this title four years later in 1966. 8 The facts, however, didn't get in the way of a good story, and IBM's spin carried the day, building up Nealey's reputation and the credit that Samuel's program eventually received. Samuel played Nealey in the summer of 1962. The record in Computers and Thought shows that the game took place on July 12 inYorktown, NewYork, the location oflBM's headquarters. It's of some interest to note that Nealey was effectively blind, playing by feeling the pieces on the board and remembering their locations. It's not entirely clear if Nealey went to Yorktown for the match or if he played from his residence in Stamford, Connecticut, by phone or other communications device, but we can presume that Samuel's program met its adversary " face-to-face. , , 9 Nealey received the option of which side to play and chose to defend with white. On the sixteenth move, Nealey failed to properly assess his position on the board and made a poor move. Samuel's pro| 4.2
S E T T I N G T H E STAGE
gram capitalized on Nealey's mistake, gaining a king. Just over ten moves later, Samuel's king was in position to capture one ofNealey's checkers uncontested. Sensing defeat, Nealey finally resigned on move twenty-seven. 1~A computer program had trounced a state champion! This was big news! 11
Did Samuel Win, or Did Nealey Lose? Figure 28 shows the board after the fifteenth move, which sets up Nealey's misplay. The sixteenth move for Samuel is a forced jump, 12-19. A quick look at the position shows that both sides now had an equal number of checkers. Samuel's defensive position was shored up by his triangle of oreo, but his offensive position was somewhat weak. His checker on square 14 was trapped, and his checker on square 19 would be unable to advance any farther than square 28 (the dog hole). It looked as if Samuel's program would soon have to break up the defensive triangle of oreo, unless Nealey were to blunder, which he did by moving 32-27. Nealey's decision to move 32-27 is really very poor, so poor, in fact, that it raises a number of doubts about his level of expertise. By moving 32-27, Nealey allowed Samuel to retain the triangle of oreo and move his checker on 19 up to 24, guaranteeing that a king was forthcoming. There was no need for Nealey to surrender this position. W h y did he do it? Was it a simple oversight? That seems unlikely for someone who was one of the nation's foremost players.You don't rise to that level if you make this kind of mistake. 12 An apt analogy to Nealey's selection would be to imagine yourSAMUEL'S LEARNING MACHINE
143
FIGURE
28
The position after the fifteenth move in Nealey's match against Samuel's program. The program (playing red) has a forced jump, Iz-I 9. After the jump, each side had five checkers and an apparently equal position. Nealey's next move, 32-z7, however, left him on the defensive. Samuel's program could then advance its checker on I9 to a king uncontested. This was the beginning of the end for Nealey.
self in a tight competition against an average duffer in a Sunday game of golf.You end up in a bunker just short of the green and need to make par to stay alive.You look into your bag of clubs and pull out the driver, leaving the sand wedge untouched. W i t h a mighty swing and a cloud of dust, you drive the ball deep into the sand trap, unable to extricate yourself even with a bulldozer. Game over. 13 After Samuel's program moved 19-24, Nealey evaded with 27-23, and the play continued. Figure 29 shows the position at move twentyfive after Samuel's program moved 2-6. Nealey's reply of 16-I I guaranteed his impending downfall. Schaeffer, using C h i n o o k to analyze the position, suggested that if Nealey had instead moved 16-12 and, followed up by advancing his piece 5-1, the position could have been played to a draw. Nealey wasn't lucky enough to happen into a draw that day. W h e n Samuel's program moved its king 23-19, Nealey's checker on I I was lost. 144
SETTING
THE STAGE
FIGURE
29
The position after Samuel's program made its twenty-fifth move, 2-6. Nealey replied with 16-I I, sealing his fate. After the forced exchange, 7-I 6, 2o-I I, Samuel's program moved 23-I 9, guaranteeing that Nealey would lose his checker on II. Nealey resigned.
Nealey could have played 5-1, which seems to threaten Samuel's piece on square 6.But rather than flee 6-1o, Samuel's program would likely have countered with I 9 - I 6. Then after Nealey would j u m p I-IO, Samuel's program would double j u m p I 6 - 7 - I 4 , and the victory would be complete. In retrospect then, Nealey made not one, but two, critical missteps in the match. His first gaffe gave Samuel's program a king unnecessarily, and his second snatched defeat from the jaws of a draw. Nealey, apparently, had a different perspective on his play in the endgame: Our g a m e . . , did have its points. Up to the [sixteenth move], all of our play had been previously published, except where I evaded "the book" several times in a vain effort to throw the computer's timing off. At the 32-27 loser and onwards, all the play is original with us, so far as I have been able to find. It is very interesting to me to note SAMUEL'S LEARNING MACHINE
145
that the computer had to make several star moves in order to get the win, and that I had several opportunities to draw otherwise. That is why I kept the game going. The machine, therefore, played a perfect ending without one misstep. In the matter of the end game, I have not had such competition from any human being since 1954, when I lost my last game. 14 If Nealey's assessment was completely forthcoming, then he truly didn't understand the endgame position he faced, and yet one of the nation's foremost players should have understood that position. 15 An alternative view is that Nealey's assessment was some after-the-fact storytelling, the computer equivalent of a fishing tale about the one that got away. It helps if the ones that get away are big fish. Speculating won't help us find the answer, and unfortunately all that's left now is speculation. What's most disappointing is that we're left in a position in which speculation needs to come into play at all.
A Dose of Reality So Samuel's program won the game. H o w good was his program? This early success seemed impressive, in the absence of rigorous analysis of the game. But like many early successes of artificial intelligence, Samuel's achievement was overstated and even now continues to be overstated. 16 (These overstatements have unfortunately overshadowed other successful efforts to build world-class checkers programs, like Chinook.) Samuel and Nealey held a six-game rematch the next year. It was $ 46
S E T T I N G T H E STAGE
played by mail and took more than five months. Nealey won with one win and five draws. 17 Three years later, in 1966, IBM sponsored a world championship match between champion Walter Hellman and challenger Derek Oldbury on the condition that they play some games with Samuel's program. 18 Hellman and Oldbury each played four games against the machine, winning every game. In 1967 Samuel reported that four games with Hellman were played in 1965 by mail, with Hellman winning all four as well. Samuel also reported that Pacific Coast Champion K. D. Hanson had defeated the program twice. 19 Samuel's program, to the best of my knowledge, went without another win in every match of any stature that followed the victory over Nealey. A decade later, Samuel's program lost a two-game match to a checkers program from Duke University named Paaslow. Burke Grandjean of the ACF remarked on the play of the two programs: "In annotating the ~977 Duke vs. Samuel programs~two g a m e s ~ A C F Games Editor Richard Fortman made this comment: 'The end-play, especially in Game 2, was terrible. I should say, at present, there are several thousand just average Class B players who could beat either computer without difficulty.' ,,20 Recall from chapter 7, note 5, that Richard Fortman is one of the foremost authorities on checkers, having compiled the seven-volume set of Basic Checkers. Fortman's assessment that both Samuel's program and Duke's Paaslow, developed by Eric Jensen, Tom Truscott, and Alan Bierman, were below the Class B level is a sobering reality check.As Schaeffer wrote: "The promise of the 1962 Nealey game was an illusion. ''21 SAMUEL'S LEARNING M A C H I N E
t 47
It Makes Perfect Sense From what we now know, with a baseline performance of searching ahead only four ply, it's almost inconceivable that Samuel's program would be able to compete with human grand masters, or even with strong opponents. Four ply is just two moves per player; even novices can often enumerate all possible boards that might be encountered in two moves, if they have the patience. Most don't, and that's why they remain poor players. But strong players see farther ahead. Grand masters can sometimes see sequences of moves that last twenty to thirty ply or more. It's as if Samuel's program were resting on a surfboard squinting to see the horizon, while its competitors, like Hellman and Oldbury, were sitting in a crow's nest on a whaling ship with pair of binoculars peering out many miles farther than Samuel's program could ever see.You have a tremendous advantage if you know what's coming before your opponent does. Samuel's learning method was intended to overcome this limitation by rewarding the case in which the evaluation function correlated with the value of future boards encountered during play. But with only a very limited quantity of practice games, it's not clear that the method was successful. 22
Samuel's Unrealized Desire Samuel wrote, "It should be noted that the emphasis throughout all of these studies has been on learning techniques. The temptation to t 48
SETTING
T H E STAGE
improve the machine's game by giving it standard openings or other man-generated knowledge of playing techniques has been consistently resisted. ''23 However, the recommendations that Samuel received for improving his program to legitimately challenge the world champion Hellman were to do exactly the opposite, namely, to include books of knowledge acquired by people over many years of experience about opening moves, endgame positions, and known losing positions. 24 Samuel at first refused to do this, focusing steadfastly on the problem of having a machine learn to play checkers rather than be told how to play, although he later included books on opening moves. 25 Samuel also relied on considerable human expertise when he selected the thirty-eight possible features to assess, in addition to the obvious feature of the piece advantage that one player has over another. Nevertheless, when we compare Samuel's approach to what became the standard approach to AI, whereby every bit of human knowledge available is loaded into a program before it's ever run, it becomes clear that his approach was unorthodox and inventive. Still, Samuel's program never really lived up to its promise, and his desire to have a computer invent its own features for assessing the positions of checkers was left unrealized. Here again, Kumar and I thought that the evolutionary approach would have something to offer. We took Samuel's failure as another challenge.
SAMUEL'S LEARNING MACHINE
149
This Page Intentionally Left Blank
The
Samuel-Newell
Challenge
9
Our twofold challenge was plainly laid out. First, there was Samuel's challenge. Could we design a program that would invent its own features in a game of checkers and learn how to play, even up to the level of an expert? Could the program "see" the position and types of pieces on the board and create its own descriptors, perhaps like Samuel's thirty-nine features, but in its own vernacular? Second, there was Newell's challenge, which I described in chapter 6. Could the program learn just by playing games against itself and receiving feedback, not after each game, but only after a series of games, even to the point where the program wouldn't even know which games had been won or lost? Kumar and I called our double dare the "Samuel-Newell challenge" in honor of these two pioneers of artificial intelligence. We believed that evolution showed the way to overcoming the challenge. With regard to Newell's part of the challenge, I've already highlighted evolution's ability to design highly adapted creatures based only on the information contained in "life and death." No species benefits from divine insight that assigns credit or blame to individual genes or specific behaviors. Metaphorically, the only feedback that a species re151
ceives in nature is captured in "win, lose, or draw" as its individuals compete for survival. Kumar and I hoped that a computer simulation of evolution would similarly be able to design a highly adapted checkers program, even without explicit credit assignment to individual moves. But what about Samuel's part of the challenge? What about inventing features? Well, when it comes to inventing features, evolution is a grand master.
Inventing Features: A Matter of Life and Death Individuals in nature can be viewed as pattern-recognition devices (and in some cases as pattern-discovery devices). They must assess their environment and respond appropriately or face the sieve of selection. To make the assessment, individuals carry an array of sensors, evolved through random variation and selection. Their sensors gather information about their immediate environment. If an individual can recognize patterns in those sensed data, it can associate them with symbols~featuresmand use those symbols to trigger different behavioral responsesJ Selection can then expose the flaws in those responses, cases in which the effected behaviors failed to ensure the individual's survival. Individuals face a constant struggle to characterize their environment with meaningful symbols that elicit appropriate behaviors. We've already seen the example of an insect, the yellow jacket, that has evolved a yellow-and-black banding that communicates its poison152
S E T T I N G T H E STAGE
ous threat to potential predators. We've also seen the example of another insect, the flower fly, borrowing a similar pattern to effect a bluff. A predator that could ascertain any subtle difference between the two patterns and make the proper association of which pattern goes with the poisonous insect might have a selective advantage over a predator that couldn't tell a yellow jacket from a fly. In nature, random variation provides a constant source of new patterns and new rules to associate those patterns with specific behaviors. Selection provides an effective means for weeding out inappropriate responses. My challenge, and Kumar's, was to harness these natural processes in our checkers program to meet Samuel's challenge.
Discovering Features in Checkers: Following Evolution 's Prescription Kumar and I expected that an evolutionary algorithm, operating on a population of individual artificial neural networks, would be capable of discovering features, patterns that arise during the game. During many generations of random variation and selection, we anticipated that the neural networks that survived competition would to learn properly associate those patterns or features with good and bad outcomes. In natural evolution, individuals live or die based on their ability to identify stimuli correctly and to associate them with desired responses. In our simulation, individual neural networks would compete for "survival" based on their ability to discover features that alTHE SAMUEL-NEWELL
CHALLENGE
153
lowed them to assess the worth of alternative positions on the checkerboard. In this way, an evolutionary algorithm would extract the information required to assess positions simply by playing the game, instead of being told what information to look for from the beginning.
Simulating Life's Strugglefor Existence Nature is replete with examples of species inventing features in this manner. Consider the hunting wasp, Sphexflavipennis. W h e n the female wasp must lay her eggs, she builds a burrow and hunts out a cricket, which she paralyzes with three injections of venom. 2 Before entering the burrow with the cricket, the wasp carefully positions her paralyzed prey with its antennae just touching the opening of the burrow. She proceeds inside and inspects her burrow, then emerges and drags the cricket into the burrow. Finally, she lays her eggs next to the cricket, seals the burrow, and flies away. W h e n the eggs hatch, the grubs feed on the paralyzed cricket. An individual wasp isn't really intelligentmit's more of an automaton. 3 But the evolving phyletic line of wasps has learned quite a lot. It has adapted its behavior to meet the goal of survival in its range of environments. The learning organism here is the species Sphex
flavipennis, not any individual wasp. Sphexflavipennis, as a species, has learned
how to dig a burrow, recognize its cricket prey, attack that prey, and carry it to its burrow. The species has learned to create an individual that can remember where that burrow is and has identified features needed to find it. The evolv-
1 BAg
S E T T I N G T H E STAGE
ing species of wasps has also learned that it's important to inspect the burrow before dragging the cricket inside. Think of all the features that have been invented to facilitate this behavior. Wasps must somehow identify a location that's suitable for digging a burrow. Each wasp must decide by some means when to stop digging and go find food. It must then identify its cricket prey and determine that it's large enough to provide sustenance for its offspring. It must recognize the features of its environment that allow it to find the empty burrow, and it must also recognize when the burrow is empty and safe for dragging its prey inside. That's a lot of pattern recognition. Each wasp's behavior is in large part, if not completely, genetically hardwired. During many generations, hunting wasps have invented their own features and behaviors that satisfy the needs for survival. Random variation of the wasp's genome coupled with selection for survival has captured the wasp's array of behaviors in a simple genetic code. Similarly, an individual neural network, once it's wired up with all its weights and connections defined, is hardwired. Effectively, its weights become the equivalent of an individual wasp's genes. Those weights, coupled with different stimuli, elicit responses. Some responses may be judged to be more appropriate than others. Kumar and I reasoned that by playing out life's struggle for existence in simulation, a computer could evolve neural networks that would invent their own features and behaviors that satisfy their own needs for survival in the game of checkers, in which success is measured by winning, and not losing. THE SAMUEL-NEWELL
CHALLENGE
155
The Waggle Dance I mentioned at the opening of the chapter that Kumar and I wanted our computer to evolve its own language of descriptors for describing the checkers "environment." Another instance from nature illustrates how successful evolution can be in this regard. One of the greatest examples of how living organisms have invented features to describe their environment is evidenced by the waggle dance of the honeybee. W h e n a worker honeybee returns to its hive after foraging for nectar, it communicates the location and distance of the food to its co-workers. The basics of this communication were discovered by Nobel prize winner Karl von Frisch in the ~94os. Von Frisch placed plates of scented sugar at varying distances and directions from a hive with an open vertical honeycomb. He then watched the bees go through a so-called dance when they returned from the various sources of food. W h e n the food was nearby, within about fifty meters, the bee performed a "round dance," as illustrated in figure 3o. The bee also regurgitated some of'the nectar that likely provided a scent for the other bees to follow. The workers then left the hive and foraged nearby. More interesting, when the food was more distant, say between one hundred to one thousand meters from the hive, a returning bee performed a "waggle dance" (see figure 3 I) that communicated not only the distance to the source but also its direction relative to the sun. 4 The honeybee's dance has evolved in concert with the honeybee's interpretation of the dance, one bootstrapping offthe other. It doesn't do any good for a bee to buzz about telling its co-workers where the 156
S E T T I N G T H E STAGE
FIGURs 30
When a honeybee finds food nearby, it returns to the hive and performs a "round dance." It also regurgitates some nectar, which very likely helps other worker bees find the local food source.
food is if they don't understand the language. Just like an American in Paris who speaks no French, a honeybee that can't understand the waggle dance might have a tough time finding food (or at least getting served). S There are probably trillions of similar examples to be found in nature. Many involve organisms that are more advanced than the wasp or bee, like birds, dogs, or even people. I've focused on insects here because doing so removes the arguments about "nature or nurture." Clearly, it's n a t u r e - - e v o l u t i o n - - t h a t is responsible for the sophisticated yet genetically hardwired behaviors described above. 6
Meeting the Samuel-Newell Challenge just as honeybees evolved a language to describe the features of their environment, and wasps have invented features that facilitate their own stimulus-response behavior, Kumar and I sought to harness evolution THE SAMUEL-NEWELL
CHALLENGE
'1157
FIGURE 31
When a honeybee finds food that's more distant from the hive, it comes back and performs a "waggle dance." The dance consists of two half-circles joined by a straight line. The angle of the straight line to the vertical indicates the direction of the food source relative to the sun. The speed at which the circles are completed indicates the distance to the food source. Faster circles translate into closer food. The honeybee also regurgitates the food, so the workers in the hive know the distance, direction, and scent of the food they seek.
to capture features about the world of checkers. O u r plan was to create an evolutionary algorithm that would in turn create neural networks that distilled the information presented in the checkerboard down to a small set of numbers. Those numbers would mean something to the neural networks, but not to us. No matter; we didn't need to understand them any more than the bees need us to understand how they tell one another where the food is. Those neural networks that captured the relevant details of their environment more effectively would in turn earn more wins and suffer fewer losses. They'd have a selective advantage. They would pass along the "ideas" that they created to their progeny. Those neural networks whose ideas did not bear fruit would be eliminated by selection. Over generations, we hoped, our evolutionary algorithm would be equal to the Samuel-Newell challenge and invent expert checkers players without having to rely on human expertise, just as nature has done so many times in so many different ways.
THE S A M U E L - N E W E L L CHALLENGE
! S9
This Page Intentionally Left Blank
E v o l v i n g in the C h e c k e r s E n v i r o n m e n t
10
N o w that we'd established the Samuel-Newell challenge, it was time for Kumar and me to put our theories to the test and see if we could meet that challenge. We began by sketching out the basic procedures that we would use. We'd need a checkers-simulation program, some code to emulate artificial neural networks, more code to have two neural networks face off in competition, and modules to handle random variation and selection. We wanted to start the evolutionary process with no more information than a novice player has when he or she sits down for his or her first game of checkers. For example, when a beginner prepares to play, he or she first learns the rules, the "physics" of the game. Pieces move forward diagonally, they become kings when reaching the back rank, and so forth. Novices also know the position and type of each piece on the board. Yet they don't know sophisticated features, like the triangle of oreo, the bridge, or any concept of mobility. Our neural networks thus began with the primitive knowledge of the location and type of each piece on the board, but without the advanced concepts that grand masters possess. 163
Each neural network had thirty-two input neurons, one for each available square on the board. These input neurons acted like sensors, indicating whether a piece was located in their corresponding square, and if so, what type of piece it was.
What Pricefor a King? We quickly faced a dilemma. H o w would we numerically encode the possible contents of each square on the checkerboard? It was a straightforward choice to use the value 0.0 to represent an open square, +1.0 to represent a player's own checker, a n d - 1 . 0 to represent an opponent's checker, but how should we differentiate between checkers and kings? The common heuristic in checkers is to value kings at 1.5 checkers.An exchange of three checkers for two kings is considered an even swap. But we felt that programming this would be providing too much of a hint to the evolving neural networks. The neural networks should learn the appropriate value, not have it handed to them. We therefore represented kings with a value of +K o r - K (for the player's king or the opponent's king, respectively) where K was a variable number that was unique to each neural network. Whenever a neural network generated an offspring, that offspring would inherit its parent's value of K, along with the possibility of some mutation serving to vary that value. If a neural network had a very large value for K, then by consequence it would likely place a great importance on kings. A neural network with a K value that was too large wouldn't play t 64
THE MAKING
OF B L O N D I E
very well, because it would surrender too many checkers in its pursuit of kings. Similarly, a neural network with a K value that was close to 1.0 wouldn't recognize much difference between checkers and kings. It would play poorly, because it would allow its opponent to gain kings without recognizing the extra power of those pieces. In any case, however, the best value of K was to be evolved along with each neural network rather than given as divine knowledge.
After the Stimulus Comes the Response With all thirty-two inputs and the representation for each possible piece defined, the next step was to determine the neural network's output. O u r evolutionary process was aimed at generating neural networks that would serve as board evaluation functions. We arbitrarily defined the value of a winning board as +1.0 and a losing board as -1.0. All other boards would receive values between -1.0 and +1.0, with a neural network favoring boards with higher values. O f course, the trick was that our evolutionary program would have to learn how to adjust the weights of the neural networks so that they could evaluate alternative positions properly in light of their coevolving opponents. Thus each neural network took thirty-two values as i n p u t s ~ o n e for the contents of each square on the board and returned a single value b e t w e e n - 1 . 0 and +1.0, indicating how much it liked that board. In between the inputs and the output were a number of other neurons that would construct the features that each neural network would rely on to adjudicate alternative positions. At this point, Kumar and EVOLVING IN THE CHECKERS
ENVIRONMENT
165
I had to use a little of our own knowledge to offer a reasonable chance that the neural networks would be successful.
The Neural Architecture It turns out that if you construct a neural network with an immensely large number of intermediate neurons, it can compute almost any function. Here that would mean it could learn almost any feature that you could imagine. The trade-off is that the more neurons you add, the more weights evolution must adjust and the slower the process might be.Yet a neural network with relatively few neurons can't compute too many functions (so it can't learn more than a limited n u m ber of features about the checkerboard positions), but it may be faster in learning what it can. M y prior research with tic-tac-toe showed that neural networks with ten intermediate neurons could play pretty well. We figured, however, that checkers would be significantly more complicated. Based on intuition and only a little experimentation, we settled on a neural network architecture that had forty intermediate neurons in one layer, ten intermediate neurons in the next layer, and then the output node. We still don't know if this structure was the best choice, and in fact we'd be disappointed if it were, because we wanted to avoid making "best choices" for the neural networks. We simply wanted to give the evolutionary process a chance to have a sufficiently large neural network available to store the information it would need to learn about how to play checkers. We never tinkered with the number of nodes in the neural net166
T H E M A K I N G OF B L O N D I E
works trying to find a setting that would work. Indeed, if this had been necessary, we would have considered our experiment to be a comparative failure. Our objective was to determine if the evolutionary program was capable of learning at a high level of play without any such tinkering. O u r final choice for the neural network design was to determine the networks' connectivity: Which neurons would communicate with which other neurons? To avoid any bias on our part, we used a simple "feed-forward" design wherein each neuron in one layer connects to each neuron in the next layer. We also decided to connect all the input neurons directly to the output neuron. Figure 32 shows our initial design. Summarizing, each of the thirty-two positions on the board corresponded to an input neuron. Each input neuron connected to every one of the forty neurons in the first hidden layer and also directly to the output neuron. The hidden neurons in the first layer connected to every one of the ten neurons in the second hidden layer. Finally, these ten neurons connected to the output neuron, which computed the overall evaluation of the board position. Each connection was weighted with a value that was to be evolved in competition against other neural networks. Each neuron added up all the incoming signal strength multiplied by the associated connection weights and used the sigmoid function shown in figure 33. The greater the total weighted input activation, the closer a neuron's output would be to 1.0. Conversely, the smaller the total weighted input activation, where - 1 0 0 is smaller than -10, the closer a neuron's output would be t o - 1 . 0 . E V O L V I N G IN T H E C H E C K E R S E N V I R O N M E N T
| 67
FIGURE 32
The initial neural network design that Kumar and I chose for evaluating checkerboards. The neural network had a "feed-forward" architecture.
.75
.5 co if)
.25
z o. - . 2 5 0 -,5
-.75 -1 -5
-3
i 1
-1 Input
i 3
Activity
FIGURE 3 3
The sigmoid function used by each neuron in the neural network. The function is called a hyperbolic tangent. The more positive the input activity, the closer the neuron's output is to 1.0. Conversely, the more negative the input activity, the closer the neuron's output is to-1.0.
As we began, our neural networks were quite large, possessing m o r e than 1,7oo variable weights. 1 C o m p a r e this to the sixteen weights that Samuel searched for to properly trade off sixteen preordained features in his p o l y n o m i a l evaluation function. We had an evaluation function that was m o r e than one h u n d r e d times m o r e c o m p l e x and E V O L V I N G IN T H E C H E C K E R S
ENVIRONMENT
169
involved numerous nonlinear functions, none of which meant anything at the start. The evolutionary algorithm was faced with finding good values for more than 1,700 parameters in order to properly evaluate alternative checkerboards. This process was akin to optimizing the picture on a Sinister television with 1,700 control k n o b s - quite a challenge.
Looking Closer It might be helpful to step through some of the functioning of the neural network design on a sample board to see how it works. Suppose the board to be evaluated is the one shown in figure 34(a), and it is red's turn to move. The board is scanned from square I to square 32. Each empty square is represented by the value 0. Red checkers are represented by +1. White checkers are represented b y - 1 , and we see there is also a white king, which is represented by the value - K , where K is a number between 1.0 and 3.0 that is specific to the neural network used to evaluate the board. Figure 34(b) shows how the thirty-two input neurons of the neural network correspond to the positions of the pieces on the board. Each input neuron is in turn connected to all forty hidden neurons in the first layer. Only the first and fortieth are shown in the figure. These in turn connect to the ten hidden neurons in the second layer, where only the first and tenth are shown. These then connect to the output neuron. Furthermore, all the input neurons connect directly to the output neuron. The final output of the neural network de-
! 70
THE MAKING
OF B L O N D I E
pends on all the weights that connect all the neurons in each layer, along with the value of K.
Random Variation, Competition, and Selection The evolutionary algorithm operated as follows. To begin, we initialized a population of thirty individual neural networks at random. Each neural network had its own weights, drawn randomly using a computer random number generator. Thus each neural network was a unique individual. We then placed all the neural networks in competition. Each neural network in the population played five games as the red player against opponents selected at random from the population. The neural networks evaluated the possible alternative future positions that might be encountered based on all possible moves arising from each current position. The "best" move was defined by the minimax principle, which favored whatever move would minimize the maximum damage that the opponent neural network could do. Games were played until either one side won and the other lost, or until one hundred moves were made by both sides, in which case a draw was declared. For each game, each competing neural network earned +1 point for a win, 0 points for a draw, and - 2 for a loss. 2 While playing a minimum of five games, a neural network would earn a total point score and not be able to discern which games contributed which values to that score. Furthermore, the neural networks didn't all play the same
E V O L V I N G IN THE C H E C K E R S
ENVIRONMENT
1 71
FIGURE
34
A A candidate checkerboard position. E The corresponding input to the neural network.
number of games because, by chance, some would be selected as opponents more often than others. Nevertheless, every game counted toward the total point score. More opportunities to play meant more opportunities to earn points, but also more opportunities to lose points. After every neural network in the population played its five games as the red player, the fifteen neural networks with the highest point totals were saved as parents for the next generation. The remaining fifteen neural networks with the lowest point totals were killed off, victims of natural selection. Then, to begin the next generation, each surviving parent was copied to create a new offspring neural network, in which each weight of every offspring was varied at random, and the competition was started anew with the thirty members of the population. 172
THE MAKING
OF B L O N D I E
A Generation in the Life of a Neural Network To gain a better understanding of how this process worked, it might be helpful to imagine yourself as a neural network in this evolutionary process and to live the life of a neural checkers player over the course of a generation. You are created.Your internal structure is a complex mesh of connections between neurons, each with a weight that has been assigned at random.You are Neural Network 17, one of thirty such neural networks in a room. A loud voice commands each of you to play five games of checkers as the red player. Neural Network I comes forward for the first game, and its opponent is chosen at random. It might be you, but instead Neural Network 19 is selected. They do battle.You aren't allowed to watch the match lest you learn something from their play. Once their game is over, another opponent is called on for Neural N e t w o r k I, and so forth until five games have been completed. Then it is Neural Network 2's chance to play as red. This time, your number comes up as the competitor.You will play as white.You compete against Neural Network 2, using your intrinsic neural computation to evaluate alternative positions.Your output neuron tells you which positions appear more valuable. After one hundred moves, the game ends.You don't know how well you did.You return to waiting your turn to play as red, but before you do, you are randomly selected to play against four other neural networks, earning some unknown number of points in each game. Finally, it's your turn to play the five-
! 74
T H E M A K I N G OF B L O N D I E
game series as red.You match up against five opponents and play out the games, then return to waiting. After Neural Network 3o completes its fifth game as red, the loud voice assigns each of you a point value.Your value is +3. Looking around the room, you see that some neural networks have values as high as +10, others as low as -20. Suddenly, fifteen of your fellow neural networks vanish, vaporized into the electronic ether. Fortunately for you, you're still here. Then, appearing next to you is another neural network that looks a lot like you, but if you were to examine each of its more than 1,7oo weights, you'd find that each was different from your own.Most weights would be very similar, while a few would be radically different. The loud voice commands Neural Network I t o start the play again, and you wait your turn, repeating this complete process until, in one instant, you too vanish into the electronic ether, a victim of superior opponents that you helped create.
Extracting Knowledgefrom the Game You might be asking yourself how this process can enable a computer to learn to play better checkers. Where does it get its knowledge? In the absence of a human designer, how can the neural networks be expected to play very well? Each neural network is created randomly, either completely by happenstance at the beginning of the evolutionary experiment or as a random perturbation from a sur-
EVOLVING IN THE CHECKERS
ENVIRONMENT
$ 75
viving parent. H o w can this random process be expected to learn anything? The answer comes with the recognition that evolution is not a blind random process but a biased random process. By this I mean that what happens in previous generations affects the probabilities of what will happen in future generations. Prior events alter the probabilities of future events. Each series of competitions provides an evaluation of which behaviors, as captured by the neural networks and their weights, are worth retaining and which should be discarded. Selection focuses attention on those individuals that have proven their worth in competition. Random variation then provides the creative part of the process, generating new solutions to the task at hand based on what has worked so far. Because checkers is a game of skill, not of luck, the evolutionary process of random variation and selection can bootstrap, starting from players who essentially move at random, and literally create expertise on its o w n . 3
From Parent to Offspring The only detail about our evolutionary process that I haven't provided concerns how offspring neural networks were created from their parents.You've probably heard of a "bell curve. ''4 Kumar and I implemented a variation process whereby each weight of a surviving parent neural network was mutated using a bell curve. The details of how to accomplish this procedure are presented in technical papers that we've published. 5 The essence of the idea is to use a method that's likely to generate values for an offspring's weights 176
THE MAKING
OF B L O N D I E
that are close to its parent's values but that with some lower probability could generate radically different values. Figure 35 illustrates the mutation probabilities for one example. Suppose that the current weight on a certain connection in a neural network was 2.75. The value could be mutated using a bell curve. Here the mean of the bell curve is centered right at the current value of 2.75, so that on average there will be no change to the weight. The curve can be made more narrow or more spread out by changing what is termed the "standard deviation." Here the standard deviation is 1.5, which means that there's a 68 percent chance that the new weight for this connection will be between 1.25 and 4.25 (the average, plus or minus one standard deviation). This form of mutation was applied to every weight of every neural network when generating offspring from the surviving parents at each generation. The bell curve has two parameters, its center point and its spread, known as the mean and the standard deviation. Here the curve was centered around the parent's weights so on average the offspring would resemble its parent. The degree of spread was adapted by the evolutionary process itself. Again, the details are more technical than are appropriate here but can be found in our scientific publications. The important point for you to remember is this: Nowhere did Kumar or I inject any expertise in how to create new offspring from surviving parents by tinkering with specific weights or otherwise introduce our opinions about which weights were good and which were bad. The evolutionary program had to learn how to perform random variation on its own,just E V O L V I N G IN T H E C H E C K E R S E N V I R O N M E N T
$ 77
.20
.18 .16 .14
.12
4.1 ~)
.10
>,
~9
.08
~
.06 .04
.02 .00
- . 0 2
-2
,
,
,
I
-1
,
,
,
I
0
,
,
,
l
1
,
,
,
,
2
,
C o n n e c t i o n
F I G U R E :
,
,
,
3
,
,
,
~
4
,
.
.
,
5
,
,
,
~
6
,
,
,
7
W e i g h t
3 5
Illustration of the mutation probability for a weight in a neural network.
as a checkers novice would have to consider how to change his or her strategy in future games.
The Checkers Engine and the Darwin Engine T h e design for our program decomposed naturally into two separate modules. The firstmthe "checkers e n g i n e " m h a n d l e d playing the 178
T H E M A K I N G OF B L O N D I E
games of checkers. The second--the "Darwin engine"--controlled the evolving neural network players. The mechanics of playing the games with the checkers engine required three separate routines. Primary among these was the actual checker-playing procedure, which maintained a record of where the pieces were from the beginning to the end of any game and indicated which moves were possible at any time. In addition, the checkers engine needed a search algorithm that could generate the tree of moves looking ahead in time and use a neural network to evaluate any potential position. Finally, the checkers engine also needed to implement a minimax algorithm, whereby the best move would be chosen based on minimizing the maximum damage that the opponent could do. The Darwin engine, in contrast, had to do all the bookkeeping on evolving the neural network players. It had to start by initializing a population of random neural networks. It then had to send pairs of those networks to the checkers engine for evaluation. Based on the eventual outcomes of all the games played within a single generation, it would assign scores to the neural networks and use "survival of the fittest" to eliminate the worst players. Finally, the Darwin engine would be responsible for taking each surviving neural network and creating an "offspring" by mutating all the connection weights and the network's king value. The evolutionary process was an iterative exchange of information between the Darwin engine and the checkers engine. Having worked on evolving neural networks in many different applications, we already had most of the Darwin engine complete right from the start. Kumar generated the first prototype for the checkers engine in EVOLVING IN THE CHECKERS
ENVIRONMENT
$ 79
about a week. All the decisions that we faced in designing the checkers engine were made with an emphasis on getting up and running as quickly as possible. In retrospect, perhaps this wasn't the best policy, because we later had to find ways to speed up the procedures, and Kumar's initial efforts didn't always leave us with obvious options.
Bugs, Ghosts, and Monkeys The code for playing a game of checkers was simple enough, so we thought, but the algorithm for creating the tree of possible moves was more complex and would require more effort. Fortunately, a great deal of work has been done in minimax game playing, and Kumar located enough information to quickly write a depth-first search that created all the possible moves up to a specified number of ply for any position on the board. Methodically, the search started in the upper left corner of the board and considered each piece in turn, noting if it had a potential move. If so, that move was extended by looking at all the opponent's possible replies, again starting by scanning from the upper left corner of the board, and so forth up to the maximum ply. We initially chose a depth of six ply for the search, but our resulting trial experiments were too time consuming, so we backed off to four ply. This was about the same level of search that Samuel had access to forty years ago, so, in retrospect, it provided a good baseline for comparison. Like Samuel, we extended the search whenever we encountered a board where a piece could jump over another, aH the way until the 180
THE MAKING
OF B L O N D I E
board was left in a "quiescent" state (with no further jumps on the next move). 6 W h e n we reached the terminal state, the checkers engine would call the neural network and present the positions of all the pieces on the board for evaluation. Using the minimax procedure, the algorithm was then able to determine which future checkerboard offered the least potential for harm and choose the move that headed in that direction. Kumar finished the first version of our evolutionary program, and it was time to do a few more tests. We watched various neural networks play against one another. They were horrible players, but that's what we expected from neural networks with random weights. Even so, after some intensive examination of several games, we noticed that the neural networks would advance a checker to the seventh row and then never promote that checker to a king. They'd get the piece all the way to the next-to-last row and then stop abruptly, even when advancing the piece to become a king would provide an immediate threat to one of their opponent's pieces. This seemed strange, because the piece advantage earned for gaining a king, particularly when coupled with capturing other pieces thereafter, should have provided a compelling rationale for advancing the piece to the last row. What was going on here? Kurnar looked a little closer at the program that controlled turning a checker into a king and found the problem. There was a subtle bug in the code. As a result, when a checker advanced to the last row, instead of becoming a king, it vanished from the board altogether! We dubbed this the "ghost-king" bug. In transforming checkers into kings, we had managed to transform them right out of existence. EVOLVING IN THE CHECKERS ENVIRONMENT
1 O1
While fixing the problem with our ghost kings we also realized that we were sneaking a bit more information into the neural network than we'd intended. By connecting all the input nodes to the output node directly, we'd essentially given the neural network an easy opportunity to compute the piece differential. Consider that if all the weights on the connections between the input nodes and the output node were set to 1.0, or close to that value, the neural network would be adding up positive values for each of its own pieces and subtracting values for each of its opponent's pieces. The result would indicate the advantage one side or the other enjoyed in material. We figured that the evolutionary algorithm would quickly learn to set the weights on the input-output connections to supply the output node with this obviously important information. We faced a dilemma: Should we leave the design as it was, or should we simply go ahead and feed the piece differential to the output node explicitly? After some discussion we chose the latter approach for two basic reasons. The primary reason was that the piece differential is not "expert" knowledge. Any novice playing her first game of checkers knows that she's winning when she has more pieces than her opponent. In our case, the neural networks wouldn't know whether a positive piece differential was good or bad until they could induce that information by playing several games. Those networks that thought losing pieces was good wouldn't last long in the competition for survival. 7 But this seemed like trivial knowledge, and we didn't feel like we were violating the spirit of our experiment by including it. 182
THE MAKING
OF B L O N D I E
The secondary reason was that to leave the direct input-output connections untouched and not include the piece differential explicitly would be close to cheating. It would make us seem like we were hiding behind a facade, using an intricate ruse whereby the evolutionary program would quickly learn an important but computationally trivial feature and start to play well. Without explicitly acknowledging the simple nature of the piece differential, we might later be accused of trying to hype our results, claiming more success for the evolutionary process than it deserved. I wouldn't hesitate to offer that opinion of anyone else who took the same approach. Kumar and I didn't want to look in the mirror and find ourselves staring at two of the hear-no-evil, see-no-evil, speak-no-evil monkeys. So .just as Samuel had done forty years earlier, we went ahead and coded in the piece advantage as a feature computed separately and fed directly to the output node. We then cut all the direct connections between the input nodes and the output node. Whatever else the neural networks would learn would be captured in their internal architecture and not given by our divine intervention.
Good to Go Finally, the ghost kings had been exorcised, everything looked to be bug free, and we set the evolution in motion. We waited. The generations ticked by slowly. We watched the total point score of the best-evolved neural network at each generation. Sometimes the best network had mostly draws and scored very few points. Other EVOLVING IN THE CHECKERS
ENVIRONMENT
1 83
times, the best network had a large positive score and had evidently beat most of its competition. We surmised that these generations indicated an improvement in the level of play, where the best network had found a new way to defeat the opposition. We stopped the evolution and played out a few games between the neural networks by hand. We tried to find moves that were obviously stupid, cases that might be evidence of remaining bugs in the code, but we didn't see any. We did notice that the networks seemed to be moving in patterns of play, sequences of moves that would repeat across different networks. We figured that this was a result of the reproduction process, in which the better neural networks were being copied before mutation. We were sharing processing time with another program running on the same desktop computer at a higher priority, and at just five generations per day we realized that we were going to need some patience. After two days, our 4oo M H z Pentium II computer had racked up ten generations. We decided to stop the machine and play a few games with the best-evolved neural network. We didn't expect much. After all, how much could be learned in such a small number of generations?
In for a Surprise You learn the most from a scientific experiment when it generates results that you don't anticipate. After two games between Kumar and the best-evolved neural network from the tenth generation, the neural network had racked up two wins. Kumar looked at me and said, 184
THE MAKING
OF B L O N D I E
"Okay, you try." He thought that, between us, I might be the better player and would offer stiffer competition. Maybe that's so, but I wasn't good enough. After two more games, I was 0-2 and the neural network remained undefeated.
We had cleared the first hurdle, creating a checkers player that was unarguably better than its creators. But we were quick to admit that we knew very little about checkers strategymin fact, virtually n o t h i n g m a n d we were no measure of success for our creation. We needed to find some better players. We restarted the evolutionary program at a high priority setting with the population of neural networks from the tenth generation. Then we started searching on the Internet for other checkers programs that we might be able to use as competition. There was Chinook, but going up against Chinook seemed premature. We found one website put up by a computer science student who had programmed a checkers player as part of his class project. His program came with a disclaimer that its endgame was "a little weak." "Nonexistent" would have been more accurate. Apparently, when there were only a few pieces left on the board, the program just moved at random. O u r evolved neural network from generation ten won easily, which was reassuring but not much to hang our hats on. Four days later, we went back to the evolutionary program to check its progress. It had died somewhere between the fortieth and fiftieth generations. It seems one of the other programs that had been running on the computer crashed and took our neural checkers players down in flames with it. One of the nice things about an evolutionary program is that if EVOLVING IN THE CHECKERS
ENVIRONMENT
185
anything changes along the way, even having your program crash, you can restart from a prior population and pick up where you left off. Kumar modified the code to write the most recent population of neural networks to a file every ten generations. That way, even if there were a power outage, we'd never lose more than ten generations' worth of evolution again. We restarted from the tenth generation and went back to the Internet in search of better competition. Kumar and I discussed how long we should let the evolution go before testing the best neural network again. If we let the program run at high priority, one hundred generations would take about a week. It's a bit nerve-racking to start a program and not know for a week what would emerge from it, but that's what we decided to do. We had the sense that whatever emerged from generation one hundred wouldn't be ready to compete for championships, and yet the product of generation ten was already good enough to beat us and another simple checkers program. We needed a means for assessing the quality of the eventual neural network (the result of the hundredth generation), which would likely have some intermediate level of skill. Would it be better than Samuel's program? Remember, back in 1977, Richard Fortman asserted that Samuel's program was below the Class B level. Where could we play against rated opponents so that we could determine the class level of the best-evolved neural network?
186
THE MAKING
OF B L O N D I E
In the Z o n e
Back in the 196os and 197os, Arthur Samuel had to arrange his checkers matches personally. Fortunately, we live in the age of the Internet. Kumar and ! had the luxury of being able to connect with other checkers players anywhere in the world by computer. Several companies were posting gaming Web pages, where you could play such games as backgammon, chess, and checkers. Microsoft Corporation sponsored one such website at www.zone.com. We decided to log on and check it out. The site used the same rating system as that adopted by the U.S. Chess Federation and the American Checkers Federation. (See table I, which I introduced in chapter 2. This is the table that zone.corn used.) As a new player, you start with a rating of 1,6OO,just at the border between Class C and Class B. From there, your rating changes based on your performance and the level of your competition. 1 This was perfect~just what we n e e d e d n a n d best of all, we could play as much as we wanted for free. We had to log onto the website with a user name. Initially, we chose names like "DavidI IOI," standing for my name, the month of N o vember, and our first attempt, and later "Kumar 1201 ," for similar rea187
sons. These were pretty boring names, I admit, compared to other players' monikers, like "Checkermaster" and "DarthVader." Some people clearly had impressive imaginations. After downloading the software that we needed to run the scripts for the website, we clicked into the room for rated games. There were about fifty players. Some were already playing games, and others were waiting on the sidelines. Figure 3 6 shows how the rated game room site looks. O n c e you enter the room, if you want to play a game, you can sit down at an empty table and wait for an opponent or find someone else waiting for a game and sit down with that player. Once two players are at a table, both must approve playing the match. Very good players don't often want to play a n e w c o m e r or someone with a low rating, so this system is a way of ensuring that both players are willing participants. If your potential opponent gives you the cold shoulder and doesn't want to play, you just have to move on to another table. But there's always another table. After your game, the results are reported to a main server that updates your records and your rating automatically. We'd found the perfect environment for putting our neural checkers player to the test. We didn't have to tell our opponents that they were playing against a computer program. We could just make all the moves that the program told us to make and keep track of the ultimate results. 2 The Java window that depicted the checkerboard during a game (figure 3 7) even allowed us to chat with our opposition, which would help make us seem more real. We played a few test games using our own mental prowess just to 188
T H E M A K I N G OF B L O N D I E
FIGURE
36
T h e typical screen that appears w h e n you enter the "rated r o o m " for checkers on zone.com. There are one h u n d r e d fifty tables, and you click on an empty chair to join a game. Each of the people in the r o o m has a user name, which is listed alphabetically in the right column.You can click on the names to find out each person's rating. By permission o f Microsoft Corporation.
FIGURE
37
The Java window that you see when playing checkers on zone.com. The players' names are placed in opposing corners.You move by using your mouse to click on a piece and moving it to the desired square. Below the board is a chat window where you can talk with your opponent. By permission of Microsoft Corporation.
get a feel of how the website worked. We lost the games but felt more comfortable using the system.
A Few Preliminaries While we were checking out zone.com, our evolutionary program had been quietly evolving new generations of neural networks. After a week of computing, the generation counter on our evolutionary program clicked over one hundred. It was time to extract the bestevolved neural network and begin to put it through its paces on zone.com. Again, we faced a bit of a dilemma. Our eventual rating was going to depend on the speed of our computer. The faster the computer, the more ply we could examine in the same amount of" time. Presumably, if the best neural network from generation one hundred were decent, then looking ahead more moves would be an advantage. (Recall that if the evaluation function is very poor, then looking ahead more moves may not be any advantage at all.) We had to choose the computer system we'd like to use. We thought about creating a parallel distributed architecture of computers, much like Schaeffer used in his world championship matches with Chinook, but doing this would require a donation of equipment from a corporation or foundation. That scenario didn't seem very likely, and we were anxious to evaluate the product of our evolutionary process, not canvass the scene for computer handouts. So we decided to use the same single C P U that we'd used for evolution: a single 400 M H z Pentium II PC. Using this computer no doubt handicapped the IN THE ZONE
191
effectiveness of the evolved neural network, but it was all we had, and it seemed fair to use the same machine for testing our neural networks as we'd used to evolve them. The rules of play on zone.corn dictated that each player has four minutes to make each move. Ifa player took longer than that, the Web server terminated the match and awarded the opponent the victory. Four minutes per move is longer than what tournaments allow (they typically offer two minutes per move for the first thirty moves but are negotiable in head-to-head matches), and it's longer than most people's patience. Poor players want to move quickly, and they want their opponents to move quickly. Better players take more time, and they don't mind when their opponents do the same. They're thinking while you're thinking. O u r initial tests showed that we could almost always complete a six-ply search, looking ahead three moves on each side, in less than two minutes, and most often in less than sixty to ninety seconds. We decided that we'd go forward with six ply as the baseline for our initial evaluations. We'd set up the checkers program to prompt us to enter the n u m ber of ply to use at each move. The program would wait for us to enter the ply to use (how far ahead we wanted the program to look) and then it would begin its search. Later, we'd remove this constraint, but for the time being we had to make a best guess at how long it would take to search ahead six or eight ply based on the configuration of pieces on the board. The vast majority of the time we simply went with six ply, to be consistent, but sometimes doing this seemed to be handicapping our 92
THE MAKING
OF B L O N D I E
neural network, because the program would return with the six-ply move in just a few seconds. This didn't seem very fair to the network, particularly if our opponent was taking a couple of minutes to consider his or her next move. To compensate, if a six-ply move came back very quickly, we considered doing an eight-ply move the next time, with the hope that the computer would complete its search in two minutes, and certainly within the four-minute time limit. A few times it didn't, and we had to forfeit, again handicapping our score.
Let the Games Begin We sat down for our first match against an unsuspecting opponent rated at 1,8oo, right on the border between Class A and Class B. We conversed with him briefly in the chat w i n d o w before making our first move. We watched the moves go by on the screen as we recapitulated our neural network's desires. The network's evaluation hovered around 0.0 for several moves but then started creeping higher. A few moves later and it showed that the six-ply move was rated above 0.5. This was good! Three moves later, our best-evolved neural network had a one-piece advantage over our opponent. Several moves later, we were up by two pieces, and our opposition resigned. We were exuberant! We now had an objective example of h o w good our evolved neural network was: good enough to beat someone with an 1,8oo rating on an international website. That's not too shabby. But one sample doesn't make a trend, and a subsequent loss against IN THE ZONE
$ 93
an opponent rated in the mid-I,9OOS quickly brought us back to reality. After completing a set of ten games, each taking about twenty to thirty minutes, we used the mathematical rating formula to compute our score. Fortunately, our result agreed with the rating that we were assigned on the website, so we knew we were computing it correctly. We were close to 1,68o, having improved eighty points since the first game. With our rating still increasing, we'd have to play several more games to find out where it would eventually settle. 3 After ten more games, our rating was just over 1,7oo and still climbing. We decided to play a total of one hundred games. We knew it would take the better part of a m o n t h to complete, playing each game by hand, logging onto zone.corn, finding opponents, and playing out each match, move by move. But we felt the time investment was worth it; we wanted to determine conclusively just how good our bestevolved neural network really was. Was it an expert? O r was it just pretty good?
The Journey to 1 o o Games
As we played matches, we kept track of our opponents, not wanting to play the same people over and over. We also kept track of the n u m ber of times we played as red or white, as well as the ratings of our opponents, trying to ensure a diverse range of playing conditions. As we played through the one hundred games, we learned a bit about human nature. Some of what we learned was heartening, as w h e n someone obviously much better than we were (actually, than 194
THE MAKING
OF B L O N D I E
our neural network was) would try to help by pointing out where we made mistakes and suggesting better moves. Sometimes, for example, they would remark that we were going to lose a piece four moves in the future. I'd look at the board dumbfounded, but as the play unfolded, sure enough, their predictions came true. Kumar and I learned a great deal about checkers this way. O f course, we didn't transfer any of that knowledge to the neural network, but the game was becoming more fun because we could appreciate it better, thanks to the help of our kindly opposition. Sometimes in these matches, however, the darker, more insecure side of human nature came to the fore. Some people were very protective of their ratingmso protective, in fact, that they'd rather disconnect their m o d e m than accept a loss. These few malcontents would start to fall behind and then begin typing disparaging remarks in the chat window, such as "you suck" or worse. Then they'd start to take almost the full four minutes to make each move, obviously hoping that we'd quit rather than endure the boredom of watching the screen. Finally, once we had endured the checkers equivalent of water torture, they would pull the plug on their m o d e m line, leaving the game incomplete. The rules of the website dictated that after your tenth incomplete game, your rating would fall by one hundred points. But even this penalty didn't seem to deter some individuals from exhibiting poor sportsmanship. We didn't keep track of exactly how many of these sad cases we saw, but we did have to find some way of assessing the outcome of the incomplete games for our records. This presented a bit of a quanIN THE ZONE
195
dary, because if we just recorded these games as incomplete, we'd certainly be handicapping our neural network's rating yet again, which was clearly unfair. After all, the reason the opponents disconnected was because we were killing them. O n the other hand, we hadn't achieved victory; was it appropriate for us to record these games as wins nevertheless? After discussing our options, we decided that if our neural network was ahead by two or more checkers w h e n an opponent disconnected we'd record a win. If we weren't ahead by that margin, we would simply not record the game; it would just be wasted time. In all, we had about five opponents disconnect. O f those, we only recorded two or three wins based on being way ahead in the match w h e n our opponent turned tail and ran, so the impact on our overall rating was minor. O n e aspect of the play, which had more than a minor impact on our neural network's rating, was the neural network's endgame. W i t h out an endgame database, the network would often play a poor endgame, failing to finish off opponents w h e n it had the opportunity. The reason was that at six ply, it could only see about three moves ahead for each side, in the absence of other j u m p moves, and that was often insufficient for ferreting out a winning combination. (Recall that the search was extended whenever forced j u m p moves were encountered.) For example, consider the case shown in figure 3 8(a). Here, the red player has the advantage, being up two kings to one, and can force a win over white. But for the red kings to get into position to execute the series of moves that pin the white king against the side of the board, 196
THE MAKING
OF B L O N D I E
the two kings must first each move four squares. That play corresponds to sixteen ply, because there are eight moves for red as well as for white. After getting into position, as shown in figure 3 8(b), it will still take five more moves on each side to complete the win, for a total of twenty-six ply. We'd hoped that, through the evolutionary process, the neural networks would learn what to do in such situations simply by playing games. Unfortunately, our evolutionary training used a look ahead of only four ply. It was therefore very unlikely that the evolving neural networks would ever get a chance to learn how to solve these endgame situations. Instead of trying to win, they would often try to avoid losing by failing to engage the opponent and remaining at arm's length across the board. In these situations, our opponents would often take the initiative and try to attack when they thought they might snatch a win, but our evolved neural network played well in defense. And on some occasions our neural network would have earned a win, if we had hooked in Chinook's endgame database to know what the right moves would be. N o t wanting to incorporate the expertise contained in the perfect knowledge of the endgame database, we simply recorded the outcome of the game as it stood, even if it meant accepting a draw when, with perfect knowledge, a win might have been possible. As we mowed through our first fifty games, we realized that playing checkers on the Web is only in part about the actual game of checkers. The other part concerns human psychology. For instance, as we were striving to ensure a wide array of different opponents with IN THE ZONE
I 9'7'
various ratings, we tried to entice people to play with us. We would pull up a chat box and invite them to play. Often, we'd just get ignored or put off with the all-too-familiar "let's play later" (and of course "later" never came). Finally, we realized that players with catchy or even sexy names were getting more action. We figured that nobody had an inherent investment in defeating someone named DavidI I 01 or Kumar120 I, but what if we changed our name to ObiWan, from the movie Star
Wars? W h a t a good idea! The idea was so good, in fact, that someone else had already registered with that name. Okay, how about ObiWanTheJedi? N o luck there either: already taken. H o w about Obi_WanTheJedi? Available. W i t h that, DavidI I 01 and K u m a r I 2 0 I were retired, and we seemed to have many more opponents who were eager to play.
! 98
THE MAKING
OF B L O N D I E
FIGURE 38
A Red is up two kings to one and can force a win; however, if white toggles in the 28-32 corner as long as possible, it will take red eight moves to get both kings into the position shown in figure 38(b) that portends certain victory. Eight moves correspond to sixteen ply. Beyond that, it will take another ten ply to execute the victory, for a total of twenty-six ply. Even a seemingly obvious winning situation can require a large ply to see all the way to the final p o s i t i o n . . White to move. Red now has its two kings in position to execute the win in five moves for each side. Can you see how it will play out?
Obi__ Wan The.Jedi in Action To give you a sense of how well the best-evolved neural network played during those one hundred games, I'll show you some scenes from two successful contests we recorded along the way. In the first example, we were playing against an opponent who was rated ~ ,926, in the high end of Class A. It was likely the best performance that we saw from the neural network, in terms of playing competitively with higher-rated competition, but it did require a misstep from our human opponent. The human played red, the neural network played white. I've put the complete listing of moves in the notes, so you can play along if you're a checkers fanatic. 4 I'll just review the highlights here. Figure 3 9(a) shows the position after the neural network's twenty-
IN THE ZONE
199
first move. Our human opponent is facing a forced jump, but he chose I o - I 9, which was surely a mistake. A much better move would have been to double jump 13-22-29, capturing two of the neural network's pieces and advancing to a king. In taking the neural network's piece on 15, our human opponent freed up the checker on 17, allowing it to advance for a king. Later, on move thirty, the neural network played I o - I 5, and figure 39(b) shows the resulting position. This struck me as a good play. The neural network now threatened to capture red's checker on 19. But the obvious move away, 19-23, would lead to bigger trouble, as we'll see. O u r human opponent indeed played 19-23. The neural network countered with 2 o - I 6, forcing the human to j u m p I 2 - I 9, which the network countered with the double j u m p 15-24-3 I. Finally, after the human's forty-fourth move, 15-I I, the position looked as shown in figure 39(c). Red's king on I I pinned down the neural network's checker on 12. But with only one remaining king, red would be unable to pin white's other king in the double corner. The game continued briefly before our opponent offered a draw. I accepted on the neural network's behalf. The second game I'll illustrate here involved an opponent rated 1,77I, in the upper ranks of Class B. The complete set of moves is listed in the notes, s Our opponent played white; the neural network played red. As we neared the end of the game, I was certain that there had to be a bug in our neural network. It seemed to be making a series of sloppy, if not stupid, moves. In the end, it had the last laugh on me. Skipping ahead to move twenty-nine, the neural network was up by two checkers. It had a king and five checkers, while its opponent 200
THE MAKING
OF B L O N D I E
FIGURE
39
A The position just before the human opponent rated 1,926 made his twentysecond move. The opponent played red. The neural network has just moved 19-I :5, and red responds by jumping I o - I 9. A much better move would have been to double jump 13-22-29, capturing two of the neural network's pieces and earning a king. Perhaps the neural network's last move of 19-I 5 distracted the human opponent into making a mistake h e r e . . The position just after the neural network played its thirtieth move, I o - I 5. The white king now threatens red's checker on 19, but the obvious reply of 19-23 leaves the human opponent open for a two-for-one exchange. The neural network follows up 2 o - I 6, forcing red to jump 12-I 9, whereupon the neural network double jumps 15-24-3 i. c The position after the human opponent made his forty-fourth move, 15-I I. The king on I I pinned the neural network's checker on 12 but at the same time left red unable to pin the neural network's king on 5. The game continued for ten more moves before our opponent offered a draw. We accepted.
had a king and only three checkers. Figure 4o(a) shows the board after the neural network's twenty-ninth move: 22-26. O u r human opponent then played 2-6, moving his king to threaten our checker on 10. W h e n I first saw our opponent's move, my initial reaction was totally negative. The logical play for the neural network seemed to be to flee I o - I 5. That move would stave off being captured for a turn but would be of no real help, because I expected our opponent to then move 6 - I o. The play would threaten both the neural network's checkers on I4 and 15, ensuring a capture of one checker. I went from feeling negative to abysmal w h e n I saw the neural network's next move: 4-8! W h a t was this? If you look at figure 4o(a) again, you'll see that by moving 4-8, the neural network not only freed up the human to j u m p 6 - I 5 but also gave him the option to j u m p I 1--4 and get a free king! ! was sure there was a bug in our program now. This was really de202
THE MAKING
OF B L O N D I E
FIGURE
40
A The position before the human opponent, playing white and rated 1,771, moved 2-6. The opponent's king on 6 then threatened the neural network's checker on I o. Rather than flee, the neural network moved 4-8, sacrificing a checker and setting up an interesting sequence of events.. The position after the neural network double jumped 27-I 8-1 I. The neural network's king on I I now pinned the human opponent's king on 4. Simultaneously, the neural network's checker on 14 pinned an opponent's checker on 2 I. Game over.
pressing. O u r o p p o n e n t j u m p e d 11--4 as predicted, but then the nightmare got worse. Instead o f trying to save its checker on I o, the neural n e t w o r k m o v e d 2 0 - 2 4 , simply giving up its checker on I O. "Oh, my." O u r o p p o n e n t t o o k the forced j u m p 6 - 1 5 , and then the neural netw o r k m o v e d its king 2 3 - 2 7 . I was left shaking m y head. L o o k i n g at figure 41 (a) and following the moves above, you m i g h t be shaking y o u r head too. N o t only had the neural n e t w o r k given up two pieces, giving our o p p o n e n t a free king, but n o w it had m o v e d its only king away from protecting the red checker on 26. W h i t e w o u l d be forced to j u m p 3 o - 2 3 . That's w h e n the fun started. T h e h u m a n t o o k the forced j u m p o f 3 o - 2 3 , but this m o v e n o w gave the neural n e t w o r k the double j u m p o f 27--18--1 I! After surr e n d e r i n g pieces left and right, the neural n e t w o r k n o w delivered the mortal blows. IN THE ZONE
203
FIGURE
41
The rating of the evolved neural network (ENN), shown by the solid line, during all one hundred games played on zone.com. The rating started at ~,6oo and climbed as high as 1,825.4 on game eighty-five, before settling at 1,75o.8 on game one hundred. This rating placed the neural network as an above-average Class B player. The ratings of the opponents played at each game are also shown, along with the results of each contest (win, draw, or loss).
Figure 4009 ) shows the situation. White's checker on 21 was trapped against the side. White's king, which had seemed like a gift, was now pinned in the corner. The rest of the play was a mere formality. I had to laugh because I had been so sure that the program was screwing up, and now the sacrifices all seemed so sentient. Two moves later, the game was over.
Time for an Assessment At long last, we'd finished the hundredth game, and it was time to assess how well we'd done. Figure 41 shows the neural network's rating during all the games that we played. Remember that the graph doesn't show the neural network learning during the one hundred games. Everything that it learned, it had learned during its evolution. The neural network was the same for game one hundred as it was for game one, but its rating was 150 points higher than when it started, placing it at 1,75o.8, a high-level Class B player. It had started with a 1,6oo rating, but its performance in the one hundred games had earned it 150 rating points. The highest rating that the neural network attained was 1,825.4, on game eighty-five. This result points up why, as a scientist, you need to pick your sample size or your stopping rule before you conduct an experiment. Kumar and I agreed that we'd perform one hundred trials, play one hundred games, before making our assessment. If we hadn't defined our stopping point ahead of time, it would have been tempting to quit while we were ahead. 6 Figure 42 shows the neural network's performance against oppoIN THE ZONE
205
FIGURE 42
The histogram shows the neural network's performance against opponents at different ratings. Each bar represents an interval of one hundred points. Thus "1,45o" includes players rated from 1,4oo-1,499, "1,550" includes players rated from 1,:5oo-1,599, and so forth. The neural network dominated players who were rated below 1,8oo, winning forty-two games, losing thirteen, and playing fourteen out to a draw. It didn't compete well with players rated above 1,8oo (Class A or higher), winning only three games, losing twenty-two, and playing three to a draw. The results graphed here support the 1,750 rating that the neural network earned.
nents at different rating levels. This graph provides a good way to double-check the validity of the 1,75o rating. Looking at the chart, you can see that our neural network dominated players rated below 1,7oo and in turn was trounced by players rated above 1,9oo. It played about even with opponents in the range from 1,7oo to 1,8oo and lost a bit more than it won when playing against people rated from 1,8oo to 1,9oo. The 1,75o rating seems perfectly reasonable in light of these data.
Stepping Back and Looking Ahead We took a moment to take stock of what had been achieved. An evolutionary algorithm using the minimax principle and starting with only basic information about the positions and types of pieces on the board, and the piece differential, had taught itself to play at the ClassB level without relying on human expertise. It had evolved its own value for the king--that value turned out to be 1 . 4 ~ a n d had, in just one hundred generations, created a neural network that could play on a level that was competitive with many human opponents. It had achieved this feat without knowing which games that it played were wins, losses, or draws. Only the final point score earned in a series of at least five games was available to help it discern how well any particular neural network in the population was playing. Simply by using random variation and selection within this framework, the evolutionary algorithm was able to create a fairly good neural network. It's performance surpassed that of Samuel's program,
IN THE ZONE
20W
based on Richard Fortman's earlier assessment of that program (below Class B). Furthermore, contrary to Newell's speculation, it demonstrated that there was indeed sufficient information contained in "win, lose, or draw" for learning how to play a good game of checkers. The best-evolved neural network couldn't play like an expert, but it did put up enough of a challenge to defeat many people rated Class B and below. We had equaled Newell's part of the Samuel-Newell challenge. O u r evolved program demonstrated that it wasn't necessary to assign specific credit to individual moves that are thought to correlate with good outcomes or even to know which games ended with good outcomes rather than bad. The evolutionary program was sufficiently robust to create neural networks that could overcome those limitations and still play at a worthy level. It wasn't clear, however, that we'd really met Samuel's part of the challenge: having the evolutionary program create its own features for assessing the worth of candidate boards. To a limited extent, we could claim some success here, because the neural network seemed to repeat the same moves in situations in which a few of the pieces on the board were in certain common formations. It was as if the neural network had recognized those particular patterns and had captured them in its internal weighted architecture. But we couldn't easily interpret those situations and had only a handful of these examples to draw on. The neural network didn't offer any consistent behavior that we could indentify as having captured a feature that would normally require human expertise. Still, we were encouraged by the progress we'd made. For kicks, 208
T H E M A K I N G OF B L O N D I E
we logged onto Chinook's website and played a game using our best neural network. The game was much like a typical Mike Tyson boxing match: It didn't last very long, and we got killed. Fair enough, our neural network wasn't able to compete with a world-class checkers program, just with average checker aficionados, and there was no shame in that.
What Comes Next Kumar and I began to ponder what would happen if we let the evolutionary program run for two hundred or more generations. Evolving for one hundred generations had created a 1,75o-rated neural network. H o w much better could evolution do in another one hundred generations? Would it create a Class A player? An expert? O u r initial results had been encouraging. N o w we could restart the evolutionary program with the members of the hundredth generation, then check back in a week and play some more games with the best network. A week seemed like forever, but we had just taken a month to complete the one hundred games that we played over the Internet, so in relative terms a week wasn't so bad. Besides, maybe we could find some ways to speed up our program. We also starting thinking more about the information that we'd provided to the neural networks competing in the evolutionary milieu. We realized another limitation of our procedure: The neural networks didn't know that the game was being played on an eightby-eight checkerboard. To them, the game was played on a one-bythirty-two row of squares (see figure 43). I N THE Z O N E
209
FIGURE
43
Imagine that the checker board wasn't an eight-by-eight board, but instead a one-by-thirw-two vector where the connecting lines tell you which squares are joined by the rules of how the pieces can move on the board. Without the "spatial nearness" inherent to a checkerboard, it becomes very difficult to envision how far apart squares are. Looking at the checkerboard, it's easy to see that square 15 is only two moves away from 2 3, so easy, in fact, that it's second nature. But recognizing this relationship on a one-by-thirty-two vector of squares is much more difficult. The neural network wasn't given any knowledge about the spatial nature of checkers. All it could sense was represented in the thirty two separate inputs. Any idea that it had about how close different squares were to one another was learned during evolution. Kumar and I thought this lack of information was a significant handicap.
Surely, this was a significant i m p e d i m e n t to learning a b o u t checkers, a g a m e in w h i c h the spatial nature o f the b o a r d is critical to i n t e r p r e t ing the potential o f various pieces. E v e n the m o s t naive n o v i c e player k n o w s that a c h e c k e r b o a r d is t w o dimensional. W e w o n d e r e d h o w g o o d the e v o l u t i o n a r y p r o g r a m m i g h t b e c o m e if we c o u l d r e m o v e this i m p e d i m e n t and let it k n o w that it was playing o n a c h e c k e r b o a r d w i t h o u t telling it a n y t h i n g a b o u t the spatial features o f the g a m e that m i g h t assist it in playing well. C o u l d it b e c o m e an e x p e r t player? C o u l d it c o m p e t e w i t h C h i n o o k ? W e had answered s o m e o f the m a i n questions that w e set d o w n in the b e g i n n i n g o f o u r effort, b u t in so d o i n g h a d raised o t h e r questions that deserved their o w n answers.
IN THE ZONE
21
1
This Page Intentionally Left Blank
A Repeat Performance
We were anxious to repeat our prior experiment. We wanted to know, definitively, that our one trial with evolving neural networks for evaluating checkerboards wasn't a fluke. We also wanted to know if increasing the duration of the evolution would lead to a better neural network. A Class-B player was good, but perhaps one hundred generations wasn't enough evolution and an even better neural network lay waiting to be discovered. We were also anxious, however, to speed up the process so that we wouldn't have to wait quite so long to see what would emerge from the evolutionary program. "Alpha-beta pruning" provided one obvious step we could take to improve the eflqciency of the minimax search.
The ABCs of Alpha-Beta Pruning Alpha-beta pruning is a means of reducing the computational work of minimax without changing the move selection. It's a faster way of choosing the minimax move. If you're not familiar with alpha-beta pruning, and would like to be led through an example, you'll find one in the notes. 1 213
Broadly speaking, alpha-beta pruning allows you to eliminate nodes and branches in the game tree in cases where no matter what values you find in those nodes, you already know that you wouldn't or couldn't make the move that would lead to those nodes. That might occur when you've evaluated one move and find that branches below another move lead to catastrophe. Once you realize that, you don't have to evaluate all the nitty-gritty of just how bad the alternative move would be. Similarly, you might find a move that would lead to victory, but you know that your opponent would never allow you to make that move, instead forcing you to play something else. The "alpha cutoff" eliminates nodes and branches where, for a particular move, you realize that the opponent could make your situation worse but you already have an alternative move that's known to be better. In contrast, the "beta cutoff" does the opposite job: It eliminates nodes on your move where the opponent would choose not to allow you the opportunity to take advantage of him or her. It might not seem that the alpha-beta pruning would save very much time. But sometimes many alternative moves emanate from each node, a situation you'd find in an average checkers game. (In checkers, it's typical to have seven or more choices at each position.) If you can use the alpha-beta method to eliminate branches, you might save yourself from evaluating not just one node, but possibly six or more at one level. What's more, you then save yourself from having to evaluate any of the nodes that are lower in any of the branches that you've eliminated. That's where the savings can really m o u n t up. 214
THE MAKING
OF B L O N D I E
The Proof's in the Pruning Alpha-beta pruning sped up our procedure by a factor of a little more than three, w e could now run through one hundred generations using four ply in just about two days (when not competing with another program for the computer's CPU). With our newfound faster procedure, Kumar and I decided to start the evolutionary program from scratch, beginning again with new random neural networks. We reasoned that we could get through 25o or so generations in five days, and by restarting we would have a second data point to determine if our initial moderate success was an anomaly. If we were to start from where we left off the last time, we'd bias the evolutionary search and not provide a suitable experimental control. By starting fresh, if the neural networks again learned how to play well, we'd have a second example, a repeated experiment. We started the new trial on a Monday, with the goal of having 25o generations completed by Friday afternoon, just in time for our initial evaluations over the weekend. Each day, fifty generations clicked by, and we watched the best and average scores racked up by the neural networks at each generation. There were some long periods, a set of ten or more generations, where it didn't seem as if any improvement was being made. Then a few generations would show that at least one neural network had performed exceedingly well. We guessed that the evolutionary program was making progress, but we really had no way of knowing for sure. It might have been that the best neural network in the population was superior simply because all the other neural networks were lousy. A REPEAT P E R F O R M A N C E
215
Was the evolutionary program really learning again? The only way to know was to wait for the product of the two hundred fiftieth generation, which showed up late Friday night.
Back to the Zone O n Saturday morning, we logged in to zone.com as Obi_WanTheJedi and loaded up our program to play the newly evolved neural network. With the faster alpha-beta search in place, we could have generated more moves at eight ply than before, but we decided to stick with a majority of the moves at six ply, to provide a better comparison to our previous trial. We wanted to know if the product of 250 generations of evolution was really better than the result ofjust one hundred generations. To do that, we needed to keep the other facets of the experiment as constant as possible. This time our first opponent was rated just above 1,7oo, a pretty good place to start, because that was close to the rating of our previous best-evolved neural network. After a long game, our opponent offered a draw. We had reached a stalemate with two kings each, and the draw seemed inevitable. We took this as a bit of a setback, but that was only natural. We had hoped that the new neural network would kill this "feeble" opposition and demonstrate that it was at least a Class A or expert-level player. A draw against someone rated about 1,725 wasn't bad, but it wasn't the dominating result we wanted, either. O u r next opponent was a weak player rated about 1,390. Our neu-
2 $ 6
THE MAKING
OF B L O N D I E
ral network easily made mincemeat of him, as it should. T h e n came a player w h o was rated 1,943, in the upper region of Class A. We became more focused w h e n our h u m a n opponent went down a checker after a two-for-one exchange. Attention turned to jubilation w h e n he later resigned after getting behind by two checkers. This win lifted our spirits considerably. It was the single best result we'd had to date. The best our previous neural network had done was the draw against the player rated ~,926 detailed in chapter I I. N o w we had a win over someone in the high range of Class A. O u r network had two wins and one draw and was still undefeated in three games. Maybe this was going to turn out better than we had hoped. O u r fourth game ended in a draw against a player rated just below 1,8oo. Still undefeated, we sought someone with a significantly higher rating. W e ' d have to play these top players eventually anyway, and if we could earn an early win here we'd be making plans to celebrate. We found a willing opponent rated about 2,I 50. We watched intently as our neural network calculated the moves at six ply, wishing we could eke out eight ply occasionally. By the fifteenth move, our neural network told us that it was going to lose a checker: Its board evaluation was negative, which was usually a sign of bad things to come. Sure enough, our expert opponent captured the piece and went on to defeat us. We were left shaking our heads, hoping that the rollercoaster ride that lay ahead in evaluating our new neural network wouldn't leave us with a sour stomach.
A REPEAT PERFORMANCE
2 | 7
A New Stopping Rule After we had played ten games with our new neural network, our rating had climbed to the mid-I,600s. We thought about playing another set of one hundred games but realized that there might be a better approach to determining the rating that would also offer a degree of statistical confidence in the final result. By repeatedly randomizing the order of games played, we could generate a statistical sample of results and examine the average. That average would then be a good estimate of the neural network's rating, and we could keep playing games until it settled around some rating score. Here's how our thinking went in more detail. W h e n considering each of the games that we were playing, we realized the order of those games really shouldn't matter. The neural network wasn't learning as it played now, so whether we played a particular opponent on the second game, or on the twenty-second game, shouldn't make any difference. It turns out, however, that the order of the games played and their associated results does affect your final rating, according to the rating formula. This variation occurs because the formula gives you more points when you defeat higher-rated opponents and takes away more points when you lose to lower-rated opponents. By consequence, your current rating determines how many points you can earn or lose against any given opponent, and the order of the wins, losses, and draws against various opponents does indeed lead to different ratings. 2 218
THE MAKING
OF B L O N D I E
Since the order of the games is important, the natural question is: H o w many ways are there to order the games? The answer is much like that from the traveling salesman problem I detailed in chapter 4. If you have two things to order, there are two ways to do it. If you have three things to order, there are six ways to do it. With four items, the number of permutations increases to twenty-four. The number of permutations increases as a factorial function of the number of items to be ordered. Recall that with a factorial function you start with a number and multiply it by one less than that number in succession until you reach one. That is, four factorial is four by three by two by one, which is twenty-four. For ten games of checkers and therefore ten results, we could order them in ten by nine by eight by seven by six by five by four by three by two by one ways, which is 362,8oo. We sampled several possible orderings of the first ten games. Each ordering yielded a similar but not identical rating. Each was as legitimate as any other. The correct procedure, therefore, was to take a random sampling of the different possible orderings, compute the rating that would result from each one, and then take the average of those ratings as the best estimate of the true rating obtained to that point. After ten games, our rating based on the games played was about
1,650. W h e n we computed the average rating based on 2,ooo different orderings of those ten games, we found the curve shown in figure 44- O u r average rating was just below 1,65o, but more significant, the rating curve certainly hadn't settled down. It was climbing rapidly. A REPEAT PERFORMANCE
219
1,650
1,640
1,630
1,620
1,610
1,600
o
Games
;
FIGURE: 44
The average rating based on two thousand different random permutations of the first ten games that we played using the best-evolved neural network from 25o generations of evolution. The average rating was just below 1,65o but was climbing rapidly, indicating that we had many more games left to play before determining the neural network's true rating.
Kumar and I decided that we'd play games until the average rating curve that resulted from two thousand random orderings of the games played settled at a particular value. In that way, there'd be no doubt that our rating was legitimate and not the result of ordering the games in some way that might result in our getting a higher rating than we otherwise deserved. 220
THE MAKING
OF B L O N D I E
Obi__Wan TheJedi Defeats an Expert While playing additional games until our average rating curve settled, we achieved two important milestones. The first came on game sixtytwo, when we faced and defeated an opponent rated 2,134, a highlevel expert. At the time, he was ranked forty-eight on the website out of over forty thousand registered players. He played red, and our neural network played white.You'll find all the moves for the game in the notes. 3 Figure 45 (a) shows the position after the neural network's twentieth move, 14-9. The neural network had set a trap. Bad things were about to happen to our human opponent, and all of his moves were forced. There was nothing he could do about it. The sequence went:
Red (Human Opponent)
White (Neural Network)
21" 5 - - 1 4
21" 29--25
22" 21--30
22" 3-7
After the neural network's move 3-7, red had to use his king to jump white's checker on 26, playing 3 o-23, as shown in figure 45 (b). This move led to a triple jump, 27-I 8 - 9 - 2 , from the neural network in a reply that earned a king on top of that. The neural network now led our expert opponent with two kings and three checkers to red's four checkers. By move thirty-three, when the neural network played 6-2 (figure 45 [c]), it had four kings and one checker, compared with our opponent's two kings and two checkers. What's more, our opponent's two checkers were pinned by the neural network's kings on squares 2 and A REPEAT PERFORMANCE
221
FIGURE
45
The best-evolved neural network (white) sets a trap for our opponent (red), rated 2,I 34, a high-level expert. The neural network has just moved I4-9, forcing the opponent to reply by jumping 5-I4. The neural network then proceeds to sacrifice another checker by moving 29-25. The human opponent (red) plays the forced jump of 2 I-3 o, then the neural network (white) moves 3-7, leaving its piece on 26 open for a forced capture. The resolution of the sequence can be seen in figure 4 5 ( b ) . . After our opponent plays 3o-23, as shown, the neural network has a triple jump, 27-18-9-2, earning a king in the process. r The position after move thirty-three. The neural network (white) has just played 6 - 2 and has four kings. In addition, red's two checkers on squares I and 4 are trapped. The neural network has a clear upper hand in the match. D Victory is in sight for the neural network (white). The position after move sixty-two, where the neural network played I o-I 5. It's red's turn, and all options lead to defeat. Red plays I I-I 8. The neural network counters with 3-7, setting up another triple jump. Red is forced to jump I-I o, then the neural network jumps 7-14-2 3-16. The human opponent resigned after the next move, giving the neural network a win over someone rated in the top fifty of more than forty thousand registered players on zone.com. A
3 - S o m e o p p o n e n t s m i g h t have resigned in this situation. O u r s , h o w ever, pressed on. It t o o k a n o t h e r t h i r t y moves, b u t the n e u r a l n e t w o r k m a n e u v e r e d its way i n t o certain victory. Figure 45(d) shows the p o s i t i o n after the n e u r a l n e t w o r k ' s s i x t y - s e c o n d m o v e , I o - I 5. N o w it was red's m o v e . If red w e r e to play 1 9 - I o, t h e n w h i t e w o u l d d o u b l e j u m p 6 - I 5 - 8 , a n d red w o u l d c o u n t e r w i t h a j u m p f r o m 4 to 1 I. B u t by then, the m a t c h w o u l d be over, since red w o u l d have o n l y t w o c h e c k ers, w h i l e w h i t e w o u l d have t h r e e kings. Instead, t h e h u m a n o p p o A REPEAT P E R F O R M A N C E
223
nent j u m p e d 11-18, but the o u t c o m e was just the same and executed with style.
Red (Human Opponent)
White (NeuralNetwork)
Comments
63" i I-I8
63" 3-7
The neural network sacrifices the king on 6.
64: I-IO
64" 7-I4-23-I6
Triple jump! Wow!
65" 4-8
65" 21-25
At this point, our human opponent resigned, since the match was clearly in hand in no more than two moves. We had just defeated someone in the top fifty on zone.com! Well, I say "we," but "we" had little to do with it. The evolutionary algorithm that discovered this neural network deserves the credit, not us. (Recall from chapter 11 that it was very unlikely that the best neural network from one hundred generations that we first evaluated would have had any chance of defeating an expert-rated player. O u r success here shored up our expectation that the neural network from generation 250 was far superior to the prior one.)
Obi_ Wan TheJedi Matches Up with a Master O u r victory over a high-level expert rated 2,134 was one of the high points of this series of games. An even more impressive result came in game seventy-three, which pitted our neural network against a master, rated 2,2o7 and ranked number eighteen on the website. This time 224
T H E M A K I N G OF B L O N D I E
the neural network played red, the opponent white.You'll find the moves for the game listed in the notes. 4 Early on, the neural network took advantage of an opening. Figure 46(a) shows the position just after the master's fifth move, 2 I - I 7. This set up the neural network to respond with 9 - I 4 , forcing the opponent to jump 18-9 and giving the neural network a two-for-one exchange by double jumping 5 - I 4 - 2 I . Down one checker after only the seventh move, our opponent would have a tough time coming back. Figure 46(b) shows the situation after the human opponent's nineteeth move, 8-3, which earned him a king. At first glance, the neural network looks stymied. The opponent has a seemingly solid defense of his back rank and seems secure in its defense, but not quite. The neural network moved 23-26, forcing the opponent to jump 3 o-23, thereby leaving him open to the counterjump 21-3o , and the neural network earned a king. Adding insult to injury, the piece with which the neural network earned a king is the same piece that executed the double jump on move seven. Figure 46(c) shows the position thirteen moves later, after the opponent's thirty-second move, ~9 - 2 3 . He was threatening the neural network's king on 18, and its checker on 26. The neural network was left to move 26-3 I, giving up the king, but getting one back and then jumping over white's checker on 27. A twenty-move defensive struggle ensued. The neural network and the human opponent often toggled positions, but our opponent occasionally explored an alternative line of play. O n move fifty-two, the neural network played 27-23, leaving the board as shown in figure 46(d). The human player offered a draw. At A REPEAT P E R F O R M A N C E
225
this point, I performed searches at six, eight, and ten ply, and each showed that the toggling would continue, so I accepted the draw. We had a material advantage, but the checker on square 5 was pinned. Would it have been possible to force this situation for a win? I didn't know. But I did know that our neural network had just played someone in the top twenty of more than forty thousand registered players to a draw. 226
THE MAKING
OF B L O N D I E
FIGURE
46
A Early in a game against an opponent (white) who was rated 2,207, in the master category, and ranked eighteenth on zone.com. The figure shows the position after the human opponent's fifth move, 2 I - I 7. The neural network (red) now has an opening to go up a piece by moving 9 - I 4 and replying to I8-9 with 5-14-21. ~ The position after the human opponent (white) moved 8-3, earning a king. The neural network (red) seems stymied in its attempts to get a king but plays 23-26, giving up the checker on 23 but earning a king in the resulting swap. The opponent plays the forced jump 3o-23, and the neural network responds with the forced jump 2 I-3o, gaining the king. c The position after the human opponent's (white) thirty-second move, I9-23. The neural network (red) is still up by one checker, but the opponent now threatens both pieces on squares 18 and 26. The neural network gives up its existing king by playing 26-3 I, thereby gaining an exchange. The human player takes the neural network's king on I8,jumping 23-I4. The neural network then takes the opponent's checker on 27,jumping 31-24 with its new king. The neural network still led by a checker, e The position after the neural network (red) made its fifty-second move, 2 7 - - 2 3 . The neural network was still leading by one checker, but it was now unable to find a sequence of moves that would allow it to free its trapped checker on square 5. The human player (white) offered a draw, which I accepted for the neural network. Thus the neural network earned a draw against someone in the top twenty of all rated players on zone.com.
Certainly, I had no expectation that our previous neural network, t h e result o f o n e h u n d r e d g e n e r a t i o n s , w o u l d h a v e b e e n c a p a b l e o f playing a master-level opponent
to a draw. T h e n e w b e s t - e v o l v e d
n e u r a l n e t w o r k t h a t e m e r g e d f r o m 2 5o g e n e r a t i o n s o f v a r i a t i o n a n d s e l e c t i o n was a real i m p r o v e m e n t . Still, w e w e r e left to f i n d its final rating. A REPEAT P E R F O R M A N C E
227
1,950 1,900 1,850 ~0 e:,= 1,800 Iz co 1,750 1,700 1,650 1,600
0
~o
6'o
90
G a m e Number FIGURE 47
The mean rating of the best-evolved neural network based on two thousand different random orderings of the games as we played more and more games on zone.com. By the ninetieth game, the neural network's rating had appeared to settle at 1,9oi.98, placing it in the Class A category.
And the Answer I s . . . After ninety games and two weeks o f playing checkers, our rating curve had mostly settled, as shown in figure 47. We decided to stop. A l o n g the way w e ' d played forty-seven o f the ninety games as red and forty-three as white, w h i c h was close e n o u g h to fifty-fifty in our opinion that we w o u l d n ' t bias the results based on h o w the neural net228
THE MAKING OF B LO N D IE
FIGURE
48
The results of each game played. The neural network's rating started at 1,6oo and climbed to 1,914.3 after ninety games. The figure shows the rating of the opponents that we played, the neural network's rating along the way, and the result of each contest. The final rating of 1,914.3 corresponds closely to the average rating of 1,9o 1.98 that comes from two thousand different random orderings of the ninety contests.
work played one side or the other. The average final rating, based on two thousand different random orderings of the opponents that we played, was 1,9o 1.98. By comparison, figure 48 shows the results of each game played, the rating of our opponent, and the rating of the neural network A REPEAT P E R F O R M A N C E
229
FIGURE
49
A histogram showing the number of wins, draws, and losses that the neural network earned based on the opponents' ratings. Each bar represents a range of one hundred points. For example, "1,45o" includes all opponents with ratings between 1,4oo and 1,499. The neural network clearly dominated players rated below 1,7oo, going undefeated in thirty-three games, with only five draws. The neural network was also superior to its opposition rated between 1,7oo and 1,9oo, earning fifteen wins, eight draws, and nine losses. When facing tougher opponents, rated 1,9oo or higher, the neural network's performance declined. These data are consistent with the neural network's rating of about 1,9oo.
computed along the way. After ninety games, our rating was 1,9 I4.3, pretty close to the average value that we obtained by randomizing the order of the games. The best rating that we obtained in the ninety-game series was 1,975.8, on game seventy-four. In fact, the last sixteen games generated just about as many wins as losses against opponents rated between 1,75o and 2,ooo, which served to lower the rating by sixty points. (See figure 49.) 5 We had accomplished something significant by repeating the experiment. With this second success, we showed that our initial result wasn't just a fluke. In two runs of the evolutionary algorithm, starting from completely random neural networks in each case, and with no human expertise in assessing alternative checkerboards beyond the inclusion of the piece differential, the result was one Class B player and one Class A player. The additional evolutionary generations that we executed in the second trial seemed to have a positive effect on the resulting neural network's performance. Our best neural network from generation 25 o had defeated someone rated 2,134, a player in the top fifty among more than forty thousand registered players on zone.com. It had even played to a draw with someone ranked eighteenth on the site, a player who was rated over 2,2oo, in the master category. There was no doubt that the best-evolved neural network was an improvement over the previous result from just one hundred generations, and there was no doubt that we had overcome Newell's part of the Samuel-Newell challenge.An evolutionary algorithm can most certainly learn to play an advanced game like checkers based on even less information than win, lose, or draw. A REPEAT P E R F O R M A N C E
231
This Page Intentionally Left Blank
A New
Dimension
m Despite the improved performance of the best-evolved neural network, Kumar and I still weren't content with our experimental design. As I mentioned at the end of chapter ~ ~, we still hadn't told the neural networks that they were playing on a two-dimensional checkerboard. As far as they knew, they were playing on a one-dimensional v e c t o r ~ a tremendous handicap. O u r problem was tricky. H o w could we tell the evolving neural networks that they were playing on a checkerboard without explicitly giving away features about the game that represent human expertise? We wanted our neural networks to realize, on their own, that there might be important information in the spatial characteristics of the game, without supplying that information directly. Frankly, we didn't know what information to supply anyway, so we really were stuck. We couldn't have cheated even if we'd wanted to. The only approach we could have fallen back on was that offered in Samuel's program or in Chinook, whereby explicit features were injected directly into the evaluation function. This is exactly what we wanted to avoid. We couldn't meet Samuel's challenge of having a program create its own features if we gave the program those same features. 233
An Inspiration One day Kumar and I went to Rubio's Baja Grill for lunch. Rubio's, an upscale Mexican fast-food restaurant, was one of our main hangouts when taking a break from playing checkers games on the weekends. More important, at least for us on this day, the tables at the restaurant are tiled with alternating colors, much like checkerboards. We sat looking at our table and discussing possibilities for communicating the spatial nature of checkers to our evolving neural networks. After dismissing a few ideas, we finally settled on something that seemed promising: Instead of relying only on the thirty-two inputs that correspond to each square of the checkerboard where we might find a piece, we could increase the number of inputs to represent subsections of the checkerboard. There are thirty-six possible three-by-three subsections of a checkerboard. One such subsection is shown in figure 5o(a), which includes squares I, 5, 6, and 9. Suppose we offered one neuron that would represent this subsection, with four weighted connections corresponding to the four squares contained within this subsection. By allowing the neural network to vary the weights associated with those squares, it might be able to learn something about different patterns that would appear within that subsection. At least it would have a chance to know that the four squares, I, 5, 6, and 9, are physically close to one another. Similarly, we could make thirty-five more neurons, one for each of the other thirty-five three-by-three subsections and include more weights in the neural networks to represent each of the possible squares in each subsection. Some three-by-three subsections have five active 234
T H E M A K I N G OF B L O N D I E
squares, like the one that includes squares I, 2, 6, 9, and IO. Others have only four active squares, like the one outlined in figure 5o(a). All totaled, this tessellation added i62 more weights to the neural networks. Having considered all possible three-by-three subsections, what about all possible four-by-four subsections? We didn't want to handicap the neural networks by giving them a representation that would ultimately be incapable of describing the nearness of different squares that we humans so easily take for granted. By the same token, we didn't want to bias the neural networks to look for patterns only in threeby-three subsections of the checkerboard. We had no idea if three-by-three subsections were really going to be important, but if we were going to give the neural networks a fighting chance to recognize the spatial nature of the checkerboard, to learn which squares were close to others, then it seemed like we had to go all the way. We had to create input nodes, not just for all three-bythree subsections but also for all four-by-four, five-by-five, six-by-six, seven-by-seven, and eight-by-eight subsections. O f course, the eightby-eight subsection is the entire board, and there's just one of those. We reconsidered how many more weights we'd need to cover all the possible subsections. Our previous design incorporated 1,74I weights. Now, by including all the subsections of the checkerboard, our new design would use 5,o46 weights. This design meant that it would take about three times longer than before to evaluate each board. Furthermore, there was additional overhead required to move through each subsection and recall the contents of each square on the A NEW
DIMENSION
235
FIGURE 50
A In describing the spatial nature of checkers, we assigned neurons to cover each possible three-by-three, four-by-four, five-by-five, sixby-six, seven-by-seven, and eight-by-eight subsection of the checkerboard. (Of course, there is only one eight-by-eight subsection, which corresponds to the entire board.) This allowed the competing neural networks to invent features based on the spatial characteristics of the checkers on the b o a r d . . The resulting "spatial neural network" design that incorporated the preprocessing layer of neurons, whereby each corresponded to a subsection of the checkerboard.
Hidden Layer #1 (91 Nodes) All 36 3x3 Overlapping Subsquares
-
All 25 4x4 Overlapping Subsquares
' C ~
Checkerboard
9
Encoded
~ ~~ \ ~ _~"-..
as a 32xl Vector All 4 7x7 Overlapping Subsquares
! /~C
~ "
Output
312nB~
Full Board 1 8x8
Subsquare Spatial Preprocessing Layer
B
board at each move. This extra burden would be carried to each possible move that would be evaluated in every game. It also meant that there were, essentially, three thousand more ways to really screw up what the neural networks were d o i n g ~ o n e for each new weight. As if designing a Sinister television with 1,741 knobs weren't bad enough, now we faced one with more than five thousand knobs. But A NEW DIMENSION
237
how else could we offer the neural networks the chance to learn about the spatial characteristics of the game without spoon-feeding them the information? We swallowed hard and went forward with our new design.
Murphy Strikes There's a principle in software e n g i n e e r i n g ~ s o r t of a corollary to Murphy's L a w - - w h i c h dictates that thinking up an idea takes about an order of magnitude less time than it does to write the computer program for that idea, which in turn takes about an order of magnitude less time than it does to figure out whether or not the idea was any good. If it takes you one minute to think of an idea, then it will take you ten minutes to program it and one hundred minutes to test it. Unfortunately, we were about to find out that Murphy wouldn't be that kind to us. It took about thirty minutes over lunch to come up with the idea to cover the checkerboard with all possible subsections and to assign input neurons to each subsection, with variable weights associated with each active square in each subsection. It took Kumar about three hundred minutes~five h o u r s ~ t o program the code and debug it. Then came the bad news. With the now larger neural networks, our benchmark tests indicated that it would take seven days to evolve thirty generations. O u r evolutionary program was now eight times slower than before, and 25o generations would take two months to complete. Two months corresponds to about 85,ooo minutes, if you're Z38
T H E M A K I N G OF B L O N D I E
counting. That's not ten times longer than Kumar needed for programming; it's more than one hundred times longer. The prospect of waiting two months for a result was frustrating. We thought about trying a shorter trial, perhaps stopping our evolution at one hundred generations. That process would take a little more than three weeks. But rather than focus on the time required, we returned to the scientific question we were trying to answer, namely, Would incorporating spatial information by including all subsections as inputs to the neural networks lead to improved performance? To answer that question, we really did need to run through the 250 generations, because we had to make a comparison with the neural network rated 1,9oi.78, which had emerged from 25o generations. Comparing one hundred generations of evolution with the now much larger neural networks to the result of 250 generations with the smaller but potentially less-informed neural networks would be difficult, much like comparing apples to oranges.
Here Goes Nothing We took a collective deep breath, set up the computer, hit the return key, and saw the evolutionary program wander off into the ether. At just more than four generations per day, there was no point in watching for progress. We had to forget that we'd even started the experiment in the first place and move on to something else. During the two months that we waited, we finished writing a paper that described our results with the previous best-evolved neural networks. 1 We checked the status of the computer dutifully as each A NEW DIMENSION
239
week went by. Everything seemed in order, with the usual periods during which many generations would transpire with no evident progress, followed by short transitions where the best neural network at some generation seemed to dominate its opposition. We guessed that these transitions marked breakthroughs toward better, higher-quality play. W h e n generation 230 clicked over and we'd marked off two months on the calendar, we couldn't hold out any longer. Our curiosity got the better of us. Rather than wait another five days for the evolutionary algorithm to crank generation 250, we downloaded the best product from generation 23 o and set to work evaluating its performance against players on zone.com.
Zoning Out Before finding our first opponent, Kumar and I played one practice game with the new neural network. It defeated us easily, as we expected, but we were more concerned with the time that it required for each move. Since the network was about three times larger than before, it took more time to evaluate each board. We found that we could still reliably get a search of six ply, and occasionally eight ply within a two-minute time frame, but most eight-ply moves were taking more than two minutes. This was longer than tournament regulation play allows and, more important, longer than our patience would tolerate. We decided that we'd continue to evaluate the neural network using six ply as a baseline while we tried to figure out new ways to improve the speed of the search algorithm. Our first opponent on zone.corn was a formidable expert, rated 240
THE MAKING
OF B L O N D I E
just below 2, I oo. We watched as the game unfolded. We analyzed each move as it went, looking for differences in how the new "spatial" neural network played as compared to the previous neural network. It did seem to play a different opening sequence. Was it a better sequence? We didn't know. At this point, all we knew was that it was different. We watched the play continue as our opponent methodically backed our neural network into a corner. At once, we were down one piece. We played out the game, ending up in a losing position, with one king facing two. O u r opponent pinned us against a side of the board, as he should, and we were o - o - I right out of the gate (using win-draw-loss notation). This wasn't the start we were hoping for, and things only got worse. In our first five games, we played the expert mentioned above, then a player rated just below 1,6oo, followed by three games against players in the low range of Class A, just above 1,8oo. We only managed to beat the weak 1,600 player. All the other games ended in defeat. N o w we were 1 - o - 4 , and our rating still hovered at 1,6OO. This result was downright discouraging. But just as w h e n our initial results had been more positive, we realized that we had many more games left to play before we could assess the performance of our new neural network definitively. With a more concerted effort, Kumar and I played one hundred games on zone.com in just seven days. It was a nonstop checkers marathon, move after move, game after game. O u r fates turned around quickly. After our disastrous first five games, our neural network w o n the next nine contests, including wins against two opponents rated above 2,ooo. O u r rating climbed rapidly toward 1,8oo. A NEW DIMENSION
241
First Signs of Mobility During our one hundred game series, two games were particularly noteworthy. The first was game eleven, which pitted us against an expert, rated 2,o24 and ranked 174 out of forty thousand registered players on zone.com. The game was memorable because it was the first time that I saw signs that the neural network had picked up on the concept of mobility, indicating that the spatial information we'd given the program was making a difference. Following are a few highlights of game eleven. The complete set of moves is listed in the notes. 2 Our neural network played white, while our opponent played red and went first. Figure 51 (a) shows the position early in the game,just before the neural network's tenth move, 2 3 - I 8 . The move forced our opponent to jump I4-23, and then the neural network double jumped 2 6 - i 9 - I o, going up one checker.But that didn't last long: Our opponent set up a double jump two moves later, and we were back to being even. Figure 5 I(b) shows the position just after the neural network's eighteenth move, 32-27. Here I saw the neural network was going to go up a piece again. O u r opponent was running out of options. His three pieces in his own back rank were pinned by the neural network's two checkers on I9 and 2o. If he moved I 4 - I 8 , t h e n the neural network could move 29-25 and block his advance. If he moved 13-I 7, then a 29-25 reply would leave him losing a piece on the next move. Either way, the neural network had constricted its opponent's mobility and was taking the lead. Seven moves later, our neural network moved 25-22, leaving the 242
T H E M A K I N G OF B L O N D I E
FIGURE 51
A The position just before the neural network (white) moved 23-18 in a game against an expert rated at 2,024. The human player (red) was forced to reply by jumping I4-23, leaving open a double jump for the neural network 2 6 - I 9 - I O , giving the neural network a temporary material advantage. O u r opponent was able to even the material count again two moves later, a The position after the neural network (white) made its eighteenth move, 32-27. The human opponent (red) didn't have any good replies and chose I 4 - I 8 . O f his six possible moves, four give up one or more checkers immediately. The other two, I 4 - I 8 and 13-I7, only delay this eventuality, c The position after the neural network (white) made its twenty-fifth move, 25-22, pinning the red checker on square 1:5. The human opponent was left without any means to engineer a swap. He resigned one move later.
board as shown in figure 5 I (C). Our opponent had just earned a king by moving 27-3 I, but the neural network's move of 25-22 pinned red's checker on 15. Red's piece couldn't be defended, and there was no way for our opponent to find a swap. If he moved 31--26, threatening the neural network's checker on 22, the neural network would likely let him take the piece, only to jump back 2 I - I 4 . It was almost over. O u r opponent moved 31-27. The neural network countered with 7-1o, and our opponent resigned. He was about to go down two checkers and had a very weak position. Game over. Beating someone in the top two hundred on zone.com was nice, but I was struck more by the way the game had unfolded than by the final outcome. This game was a little different from many others I'd seen our earlier neural networks play. Before, it seemed that the neural network was aggressive in the middle game, moving pieces forward, swapping out, and trying to advance toward a king whenever possible. This time, it seemed to play more conservatively, opting even to forestall getting a king on one occasion. The neural network appeared more "interested" in pinning its opponent than in sprinting for a king. It was the first hint that I had that the evolutionary algorithm had picked up on the concept of mobility.
W h e w t. With our recovery in games six through fourteen, we breathed a collective sigh of relief, but we knew that although we'd easily made a comeback, we could just as easily watch the neural network's rating 244
THE MAKING
OF B L O N D I E
1,900 1,850 1,800 ~o e'~ 1,750 1,700 1,650 1,600
0
i
10
i
20
i
30
40
Games
FIGURE
52
The neural network's average rating computed from more than two thousand random permutations of the first forty games played on zone.com. The neural network's rating was climbing over 1,9oo and hadn't stabilized yet. We played an additional sixty games for a total of one hundred before the rating was sufficiently stable.
slip away in a series o f defeats. Fortunately, that series never came. After a draw o n game fifteen against a player rated 1,78o, o u r spatial neural n e t w o r k w e n t on a n o t h e r w i n n i n g streak. F o u r m o r e victories, and we had f o u r t e e n wins, one draw, and four losses. T h e neural network's rating c o n t i n u e d upward t h r o u g h game forty, as it broke t h r o u g h the 1,9oo mark. We t o o k a r a n d o m sampling o f the different possible o r d e r i n g o f results that w e ' d obtained and plotA NEW DIMENSION
245
ted the trajectory of the neural network's rating. As shown in figure 52, it was climbing rapidly. We went through the remaining sixty games without any similar streaks, winning or losing. We did, however, have one outstanding result along the way.
Staring Down a Master In game sixty, we faced a master who was rated
2,210
and ranked n u m -
ber nineteen on the website. The neural network played red, the master white. Again, I'll show you some highlights from the contest, and you can look in the notes for the full listing of the moves. 3 Figure 53 (a) shows the position after the seventh move. Already the neural network was playing more cautiously, moving pieces from its back rank in support of its checkers that were farther ahead. Figure 53 (b) shows the position after five more moves. The neural network could have moved its checker on
22
ahead to 25, one step
closer to an inevitable king. Instead, it moved 7-11 and swapped out white's checker on 26. Already up one piece, the neural network preferred swapping out another checker rather than going for instant material gratification. The neural network seemed to make another such decision on move fifteen, when it could have advanced a checker ahead for a king but instead chose to play a positional move that seemed more intent on shoring up control of the center (or so it seemed to me, a novice player). Figure 53 (c) shows the position after the neural network's twentieth move, I I - I 5. O u r opponent was in serious trouble. He was go246
THE MAKING
OF B L O N D I E
ing to go down another checker, because he was unable to protect his piece on 19. His only move to make the best of a bad situation was 20--I6.
Figure 53 (d) shows the position ten moves later, after the neural network played its thirtieth move, 22-25, and our h u m a n opponent countered with 6--1, earning a king. Both sides had two kings, but the neural network was up by two checkers and about to get a third king. The neural network played 25-3o, and our opponent countered with 1-6. The neural network's thirty-second move was 14--18. O u r opponent's thirty-second move never came. He
resigned. For the first time, we'd defeated someone with a master rating. At the point in the game when our opponent resigned, no doubt he figured that we'd be marching toward a fourth king, 18--23, and perhaps he saw us moving one king 29-25 and then sacrificing our checker on 13, 13-I7, forcing him to j u m p 21-14 but setting up another sacrifice of our king on 3o, 3o-26. That would have in turn forced him to j u m p 31-22, but then we'd double j u m p 2 5 - I 8-9. No doubt he would move his king on 6 in time to avoid 2 5 - 1 8 - 9 - 2 ! Still, he would have two kings versus an eventual three kings, and for a master that usually means lights out.
How Do We Rate? After our hundredth game, we took five thousand random samples of different orderings of our opponents and the associated results and tabulated what the best-evolved neural network's rating would be in A NEW DIMENSION
247
FIGURE
53
A The position after the human player (white), playing at the master level with a rating of 2,21 o, moved 24-2o. The neural network (red) appeared to be playing more cautiously than did the best-evolved neural network from the previous evolutionary experiment. Here the neural network was moving its pieces from behind the front ranks in support. Within the next four moves, the neural network was going to gain the upper hand in the m a t c h . . The position after the master-level human (white) moved 3o-26. The neural network (red) could have replied with 22-25, ensuring a king on the next move, but instead played 7-I I. Already holding a one-piece advantage over its opponent, the neural network favored swapping out an additional piece over moving directly for a king.
each instance. Figure 54 shows the trajectory o f o u r rating, w h i c h had leveled o f f b y the h u n d r e d t h g a m e to a value o f 1,929.0. To get a sense o f h o w variable the c o m p u t e d rating c o u l d be, take a l o o k at the h i s t o g r a m in figure 5 5. R e m e m b e r , the final rating for o u r neural n e t w o r k d e p e n d s o n the o r d e r o f the o p p o n e n t s that we 248
THE MAKING
OF B L O N D I E
c The position after the neural network (red) moved I I-I 5, threatening the human player's white checker on square 19. The opponent, unable to defend the threatened checker, replied with 2o-16. The neural network was now up by two pieces, o The position after the human opponent (.white) played his thirtieth move, 6--1, earning his second king. The neural network (red) replied with 25-3o, earning its third king. White's position was very poor. His checkers on squares 21 and 31 were pinned, and the neural network's checker on square 14 was in no danger of being captured. The human resigned two moves later, giving the neural network its first win over a player with master rating, ranked in the top twenty of all registered players on zone.com.
played. In m o s t cases, that rating h o v e r e d right a r o u n d 1,929.o, p r o v i d i n g a d e g r e e o f c o n f i d e n c e that the rating was appropriate. Figure 56 shows the results o f each g a m e played in o r d e r a n d the rating o f the o p p o n e n t in each game. Figure 57 shows h o w the b e s t - e v o l v e d neural n e t w o r k fared against o p p o n e n t s o f different ratings. It d o m i n a t e d players rated b e l o w 1,8oo, e a r n i n g thirty-five wins w h i l e suffering f o u r draws and f o u r A NEW DIMENSION
249
1,950 1,900 1,850 11) =-9 1 , 8 0 0 ,i-, 1,750 1,700 1,650
~*~176
2'0
io
6'0
Game Number
8'o
1oo
FIGU RE 5 4
After one hundred games, Kumar and I examined the average rating found during five thousand random orderings of the games. The curve appeared to stabilize around the value 1,929.o, placing the best-evolved spatial neural network from generation 230 in Class A, about thirty points higher than the best-evolved neural network from the previous evolutionary experiment that did not have access to neurons that were associated with subsets of the checkerboard.
setbacks. It didn't c o m p e t e well w i t h players rated above 2,IOO, earning only the single victory against the master-level player detailed above while eking out two draws in the face o f seven defeats. Against o p p o n e n t s rated b e t w e e n 1,8oo and 2000, results w e r e n ' t quite so definitive. In the 1 , 8 o o - ~ , 9 o o category, the best-evolved 250
THE MAKING
OF B L O N D I E
FIGURE: 55
Each of the five thousand random orderings generates a different rating. Whereas figure 55 shows the average rating as the number of games increases toward one hundred, here I present a histogram that shows the distribution of the final rating score taken across the five thousand random permutations of the games. The most frequent rating category is very close to 1,929.o, and the standard deviation is 32.75. The standard deviation is a measure that statisticians use to describe the degree of variability of data. Here, about 7o percent of the five thousand different orderings yield ratings that are between 1,9oo and 1,96o. The data support the rating of 1,929.o shown in figure 54.
FIGURE
56
This figure shows the outcome of every game played by the best-evolved spatial neural network, each opponent's rating, and the neural network's rating along the way. Recall that the best-evolved neural network didn't learn while we tested it against human opponents over the Internet. Thus the best estimate of the neural network's true rating comes from averaging its performance over thousands of different possible orderings of the games played.
neural network had more wins than losses by a score of eight to six, with three draws. In the 1,9oo-2,ooo category, the neural network had five wins and two draws but lost eleven times. As a crude measure, if we take the number of wins, five, and divide that by the sixteen games that resulted in a decision (that is, not a draw), we get o.3125. It's not surprising then that the neural network's rating would settle out at ~,929, because that is 29 percent of the way between 1,9oo and 2,000, and the neural network's results indicate that a good estimate of its chances of defeating players in that range is 31.25 percent. Again, by examining a different means for measuring the neural network's final rating, we get a similar score, providing further support for our assessment. A thirty-point increase over the previous best neural network's performance was nice, but it wasn't decisive. We had hoped that the spatial neural network would break into the expert class; instead we simply had moved a bit higher in Class A.Yet it seemed that the spatial inputs were indeed providing some benefit to the evolving neural networks. We could see it in the quality of the endgame. The neural network was often advancing its checkers toward its opponent's pieces, even if those pieces were across the board. Remember, when searching ahead up to six or even eight ply, the neural network wouldn't be able to see any direct material advantage in those situations.Any potential capture of an opponent's piece would be "over the horizon." Nevertheless, the spatial neural network had apparently captured some of the information required to assess the position without relying directly on searching ahead, because it was playing more aggressively in the endgame now. It A NEW
DIMENSION
253
seemed to know what to do even when it couldn't see the outcome directly.
Hurry Up and Wait Perhaps 23o generations just wasn't enough evolution for these larger networks. They had three times as many weights; maybe they needed three times as much evolution for us to fairly assess whether the spatial neurons associated with each subsection of the checkerboard were being used to advantage. But if 23o generations had taken two 2S4
T H E M A K I N G OF B L O N D I E
FIGURE 57
A histogram showing the number of wins, draws, and losses earned by the best-evolved neural network based on the opponents' ratings. As before, each column represents a range of one hundred rating points. For example, "I,45o" includes opponents who were rated between 1,4oo and 1,499. The neural network had forty-three wins, seven draws, and ten losses to opponents who were rated below 1,9oo. It dominated this caliber of play. When pitted against opposition in the 1,9oo-2,ooo range, the neural network won five games, played two games to a draw, and lost eleven. Taking the result of five wins in sixteen decisions, which is a winning percentage of 31.25 percent, we get a figure that correlates very well with the estimated rating of 1,929, which is 29 percent of the way between ~,9oo and 2,ooo. The neural network actually had better statistics against players rated between 2,ooo and 2,I oo than it did against players rated between 1,900 and 2,0oo; however, it had only one win against players rated higher than 2, ~oo.
months, then three times as m a n y generations w o u l d take six months. C o u l d we b r i n g ourselves to let an evolutionary p r o g r a m run for six m o n t h s before w e ' d be able to assess the quality o f the best-evolved neural network? We didn't have m a n y alternatives. We t h o u g h t briefly about tying in the six-piece e n d g a m e database that Jonathan Schaeffer offers on C h i n o o k ' s website. It was clear that, although the spatial neural netw o r k played an improved endgame, it was still n o w h e r e near perfect. T h e r e were several times w h e n w e ' d taken the advantage in the m i d dle o f a game, only to fail at putting our o p p o n e n t away in the endA NEW DIMENSION
2155
game. We figured we could change several draws into wins by relying on the perfect information offered in Chinook's databases. We logged on to Chinook's site and played a game against it on the novice setting. Our neural network was summarily dismissed by Chinook in fewer than thirty moves. The game wasn't even close. It was as if Chinook had said: "Denied. N o t Worthy." Kumar downloaded Chinook's endgame database but, on further reflection, we decided not to pursue that approach. Integrating the database into our neural network would have increased its performance, but at the cost of losing sight of the scientific question we had set out to answer. Incorporating the perfect information in the database wouldn't help us determine whether an evolutionary program could invent its own features or learn how to play a decent endgame. It wouldn't help us determine whether an evolutionary program could create a neural network that played like an expert without relying on human expertise. All it would do is derail us from that investigation. It seemed much like a drug, a tempting stimulant that would, for the moment, make us feel better but would in the long run do far more damage. We decided not to play that game. So the game we were left with was the waiting game: six months of the waiting game. We started a new evolutionary trial, again from scratch, and tried to keep busy for half a year. That was easy, for there was a lot to talk about.
256
THE MAKING
OF B L O N D I E
Letting the Genie Out of the Bottle
With six months of waiting ahead of us, Kumar and I had lots of time to write up what we'd accomplished so far. We took about two months to draft a paper covering the new results with the best-evolved spatial neural network. 1 I then had the opportunity to give four invited lectures during the short span of the next two months. In April 1999, I spoke at a major conference in Orlando, Florida, sponsored by the Society of PhotoOptics and Instrumentation Engineers, also known as SPIE. I took the opportunity to talk about the very first results Kumar and I had with our neural checkers player. )` The audience seemed very receptive to our results. Then, in late May, I gave the keynote lecture at the 1999 European Conference on Genetic Programming in G6teborg, Sweden. In that lecture, I discussed how evolutionary algorithms can generate solutions to complex problems that are competitive with solutions we humans generate. 3 1 concluded my talk by again describing the first results that Kumar and I had generated, along with some anecdotes from games we played with the neural network rated 1,75o. A week later, I gave tWO invited lectures at E U R O G E N 9 9 , another 257
conference on evolutionary algorithms, held in Jfwaskpla, Finland. There, too, I had a chance to share the early work that we'd done in evolving neural networks to play checkers. I couldn't hold back telling everyone that our best neural networks were now rated above 1,9oo and that we'd even defeated a master once. Uniformly, the reaction to my talks was quite encouraging.
Looking Over Our Shoulders W h e n I returned home from Finland, Kumar and I rechecked the progress that the evolutionary algorithm was making. It was now early June, and I was thankful to see that the program was cranking away. It had been nearly six months since we'd hit the return key and walked away from playing checkers. In the meantime, I'd had the chance to tell about two hundred people about our work. I couldn't help but feel a bit insecure about the situation, like I'd opened the genie's bottle. O n the one hand, it was gratifying to have had the opportunity to tell everyone about what we were doing. O n the other hand, many people had access to much faster equipment than we were using. By taking the obvious step of using a parallel distributed architecture of computers, they might be able to use what I'd reported, implement their own evolutionary programs, and beat us in creating an expert player. It was easy to rationalize that if that were to happen, then we really would have succeeded, because we'd have the answers to our orig258
THE MAKING
OF B L O N D I E
inal questions. But frankly, Kumar and I both wanted to answer our own questions, and we both wanted to be the first to answer them.
Time Is of the Essence June 15, I999, came and the evolutionary program had completed 84o generations. It had taken a little less time to get there than we'd projected, but the pressure was on us to find out just h o w good the newly evolved neural network really was. Time wasn't on our side. July was fast approaching. That was significant, because the first Congress on Evolutionary Computation, a joint meeting of three long-standing conferences on evolutionary algorithms, was about to be held in Washington, D.C. 4 I had a tutorial to present at the meeting, during which I would provide a fundamental introduction to the field of evolutionary computation for newcomers. Part of that tutorial included more descriptions of our checkers research. We'd waited the six months, and n o w I really wanted to find out h o w good or bad the best-evolved neural network was, and I wanted to k n o w before I had to fly back to the East Coast and tell more people about what we were doing. Kumar and I decided that we'd concentrate on playing games as the red player to more rapidly determine the quality of play w h e n moving first. It had been a long time since we'd played games on zone.com. But we were ready to get back to it. In the interim, we'd been thinking about various ways we could speed up the search algorithm to allow the neural network to see a little farther ahead in the same amount of time. We had a number of good ideas. LETTING
THE GENIE
OUT
OF T H E B O T T L E
259
Can't This Thing Go Any Faster? The first idea was solely Kumar's. He reasoned that the program was losing time by using variables to look up the weights for the neural network. What if we compiled the program with the weights written explicitly into the code that the program used to compute the neural network's evaluation? This approach would have to be faster than loading those same weights into the computer's m e m o r y and then looking them up repeatedly throughout the searches being made on every move. Making the change to the program didn't take very long, and we were delighted to find that the search now went twice as fast. It's rare that you get such big increases in speed at such a small cost in sweat and tears. 5 Another area in which we could help the neural network get a fairer shake was our choice of the ply to use for each move. Choosing the right ply had always been a source of trouble. We had to guess at how long a search would take before we started. Only later would we find out that we'd been either too conservative or too aggressive. (Being too conservative meant that we could have allowed the neural network to process more alternative moves in the allowed time. Being too aggressive could force us to forfeit the match if we ended up using too much time without having the program return to us with a move.) The obvious solution, and what we should have done much earlier, was to implement a procedure called "iterative deepening." Here's how the process works: The computer starts with a two260
THE MAKING
OF B L O N D I E
ply search and returns the best move. It then immediately starts on a four-ply search, and when that's completed, it again returns the best move. The computer then immediately begins on a six-ply search, and so forth, until we interrupt it. In this way, the search for the best move proceeds deeper and deeper into the game tree until the user says that there's no more time. We set up the iterative deepening routine so that we could interrupt it at our discretion. We could then enter the best move suggested at the highest ply level that had been completed up to that time. This allowed us to set a time limit (such as two minutes), and if we could complete an eight-ply search in that time, we wouldn't have to guess. We'd know, because the computer would have already returned with the eight-ply move.
Getting Things in the Right Order There's another feature of iterative deepening that can help speed things up when combined with alpha-beta pruning. Remember from chapter 12 that the alpha-beta pruning starts down the leftmost branch of the tree. If the best move is found in that leftmost branch, then the alpha-beta values will be set to prune the maximum amount of the remainder of the tree.Alpha-beta works best when the minimax move is found as early as possible. Quite often, the best move to make at, say, four ply, is also the best move at six ply. But it might not be the move the minimax procedure would start with.You can often save considerable time by pruning unnecessary branches of the tree if you make the extra effort to LETTING
THE GENIE
OUT
OF THE BOTTLE
26t
reorder the moves so that the best known move from earlier lowerply searches is used as the first move to search when proceeding to a deeper ply. Iterative deepening and move reordering are well-known tricks of conventional artificial intelligence, and Kumar had no problem integrating them into our program. (Note that neither of these tricks alters the move that the neural network would suggest. They simply allow us to discover that move faster.)
Storing Previous Evaluations I had another idea for speeding things up that was, at first, a bit controversial, meaning that I thought it might be worth the effort, but Kumar wasn't so sure. It seemed like our neural network might be spending a lot of its time reevaluating positions that it had already evaluated on earlier moves, or even on the same move at a lower ply. Look at figure 58. Suppose our neural network were playing red. There are ten possible moves to consider, each having its own branch in the game tree. But suppose we were to move 32-28, and our opponent were to move 4-8; then we moved 28-32, and he moved 8-4. In that case, we'd be right back where we started. All the evaluations that we'd made along the way, across the four ply, would be just as applicable now as they had been four ply ago. W h y should we force the neural network to reevaluate those possibilities w h e n it had just done so mere seconds earlier? W h y not just make a look-up table that stores all the evaluations for positions that the neural network has already seen? We could search that table first before making the neural network go through the effort, and more 262
T H E M A K I N G OF B L O N D I E
FIGURE:
58
Suppose that the neural network plays red and the opponent plays white. The neural network has ten different possible moves. Suppose that it played 32-28, and the opponent replied 4-8. Further suppose that the neural network responded with 28-32 and the opponent moved 8-4. The resulting position would be the same as shown here. Without any look-up table by which the neural network could store the board positions that it had already visited, it would have to recompute the best move in this position. Many future positions in the game tree can be reached in multiple branches. Having a look-up table can facilitate a faster search.
i m p o r t a n t , the time required to assess the board. T h a t way, the n e u ral n e t w o r k w o u l d only evaluate boards o n e time and never have to repeat effort that had already b e e n e x p e n d e d . H e r e ' s w h e r e we had to t h i n k a b o u t trade-offs. It sounds appealing to create a l o o k - u p table and store boards in that table as y o u go. It's easy to do too. T h e p r o b l e m is that the n u m b e r o f boards exLETTING
T H E G E N I E O U T OF T H E B O T T L E
263
amined in even an eight-ply search can reach more than one hundred thousand. Each board has thirty-two elements, each of which can take on five different conditions: empty, your checker, your king, your opponent's checker, your opponent's king. It takes three bits to store five items, so that means that each checkerboard requires ninety-six bits. Therefore, storing one hundred thousand checkerboards in m e m o r y requires 9,6oo,ooo bits, which is a little more than I. I megabytes. That amount of m e m o r y may not seem so large, but keep in mind that it is what you might use to store all the possibilities emanating from a single move. O n the next move, farther down the tree, is another set of possibilities, including some of the same positions that you evaluated on the previous move. Others would be brand new. To determine which ones are old and which are new, you'd have to search the list of one hundred thousand boards to find out if the board in question was already stored in memory. That searching takes t i m e - - t i m e that you might instead spend on having the neural network just evaluate the board. To be useful, we'd have to search fast enough so that the effort made in searching would offset the effort made in having the neural network reevaluate positions it had seen before. Kumar and I decided to use a "hash table" to facilitate storing the checkerboards. A hash table is a computer science term for a look-up table that's constructed according to a specific code (a hash code) that facilitates storage and efficient recall. The m e t h o d we finally settled on used a relatively small amount of memory, so it couldn't hold too many boards, but the encoding method allowed us to search the mere264
THE MAKING
OF B L O N D I E
ory quickly. Whenever the m e m o r y became full, any new board would simply overwrite another one stored in m e m o r y at random. O u r initial tests showed that we could store a m a x i m u m of about 270,000 boards using this hash-table method. The real test was to see if the neural network was revisiting enough positions to make the look-up table worth our effort. We were pleasantly surprised. In many cases, the ratio of the number of table look-ups on positions that were being revisited to the number of evaluations of new checkerboards was greater than 2:I. This finding was super, because it meant that we could save significant time. (If there had been only a few table look-ups, the hash table wouldn't have been valuable. Also, remember, nowhere in the table did we tell the neural network how to evaluate any particular board. We simply gave the neural network a m e m o r y so that it could recall the values that it had assigned to boards it had seen before.) The hash table sped up the search by another 33 percent. We were now able to get eight-ply searches to complete routinely in less than a minute.
Climbing Mount Expert Armed with the improved search engine, we took the next two weeks to crank out fifty-five games on zone.com. We watched our rating follow the usual increasing trajectory, but this time it was a bit steeper. By the thirtieth game, we thought there might be a chance that the new neural network would break 2,000, the expert barrier, but we weren't sure. We were playing some tough opponents, mostly rated above 1,9oo, and there were no easy wins. LETTING
THE GENIE
OUT
OF T H E B O T T L E
Z65
FIGURE
59
A The position after our opponent's seventeenth move, 32-28. Our opponent, rated 2,I 92, played white. The neural network, playing red, proceeded with 7-I I. I found this play surprising, because it allowed the human player to advance 12-8 toward a king. In examining the alternatives, however, this was the only move that wouldn't have lost us a checker.. The position after our opponent (white) played 2 8 - 2 4 o n move twenty-three. At first glance, the neural network (red) looks trapped, but it sees a way out: 18-23.
H e r e are the highlights f r o m o n e g a m e against an o p p o n e n t rated 2,192. It was a tight g a m e against s o m e o n e almost at the master level. T h e neural n e t w o r k played red, m o v i n g first, and the h u m a n o p p o n e n t played w h i t e . Y o u can find all the moves in the notes. 6 We'll skip ahead to the position after the s e v e n t e e n t h move, w h i c h is s h o w n in figure 59(a). T h e neural n e t w o r k surprised m e by m o v ing 7 - 1 1 next. This m o v e plainly let its h u m a n o p p o n e n t advance his c h e c k e r that was p i n n e d on 12 toward a king. N o t b e i n g an e x p e r t 266
T H E M A K I N G OF B L O N D I E
c The position after our human opponent (white) moved 25-22. Both sides have equal material, but the neural network's position (red) appears weaker. Its checkers cannot advance freely to the back row, and its king is pinned in the lower-right corner of the board. Once again, the neural network found an opening: II-I6. o The position after the neural network (red) played 9-I4 on move thirty-five. The human player (white) chose to swap pieces, moving 19--I 5, rather than advance the checker on I I. The neural network followed up with 14-17, giving white the choice of swapping kings or swapping checkers. Our opponent chose to swap checkers and then followed up with an offer to draw. I accepted the offer.
checkers player, I can't k n o w w i t h c e r t a i n t y w h e t h e r this m o v e was a mistake, b u t it d i d n ' t s e e m like the r i g h t play to me. O u r o p p o n e n t m o v e d 1 2 - 8 a n d a d v a n c e d the c h e c k e r to a k i n g three m o v e s later. Figure 59(b) shows the p o s i t i o n after the t w e n t y - t h i r d m o v e , in w h i c h o u r o p p o n e n t played 2 8 - 2 4 . I t h o u g h t the neural n e t w o r k was trapped, w h i c h proves that I ' m n o t a v e r y g o o d player. T h e neural n e t w o r k m o v e d 1 8 - 2 3 , sacrificing its c h e c k e r o n 18 b u t s w a p p i n g it L E T T I N G T H E G E N I E O U T OF T H E B O T T L E
267
out for white's checker on 24. White jumped :z7-I 8, and the neural network played 2o-27, bringing it back to even on material. After the twenty-eighth move, the situation we faced is shown in figure 59(c). The human player had just played 25-22. Even though both sides were even on material, the neural network's position appeared precarious. There seemed no way for it to advance its checkers to become kings, and its king was trapped. Ever the elusive prey, the neural network again played a swap that extricated itself from the predicament, moving 11-16. Our human opponent was forced to jump I9-12, but then the neural network moved 27-23 and our opponent gave up on his checker at 18. We were still even on pieces. By the thirty-fifth move, the board was thinning out. The neural network played 9 - I 4 , leaving the position shown in figure 59(d). It was a good move. O u r opponent had the option of proceeding toward his second king and letting us advance toward our second king or playing a swap. He chose the latter, moving 19-15- The neural network countered with 14-17 . O u r opponent now had a choice:jump 15-22, and swap out kings, or jump 2 I - I 4 and swap out checkers. He chose to swap out checkers, moving 21-14.. The neural network's reply was forced, 18-- 9. O u r opponent followed up with 15-18 and an offer to draw. The neural network rated the position at b e l o w - 0 . 5 . It "thought" it was losing. Accepting the draw was easy. I've since played out the game from this position using Blitz98, a very good shareware checkers program that's available over the In-
268
THE MAKING
OF B L O N D I E
ternet. W i t h its m a x i m u m search capabilities fully enabled, Blitz98 played out both sides to a draw. Whew! We had another draw against someone w h o was almost at the master level. We were edging closer to a 2,ooo rating.
To Err Is Human The neural network was having an easier time with the opponents rated around 2,ooo. O n e win came over an opponent rated 2,o54. It was a tight game until the very end. Again we played red. The complete listing of moves is printed in the notes. 7 Figure 6o(a) shows the position after the twelfth move. The neural network then played 9--13, forcing its opponent to j u m p 1 8-- 9. This move didn't make m u c h sense to me, because the reply of 18-- 9 left an obvious path for the human opponent to earn a king, but the neural network's forced counter of I 3-22 wasn't a guarantee for earning itself a king. Curiously, the human then played 30-26, freeing the neural network's checker on 22 to move for a king. I have to think that our opponent made a mistake there. Things were looking bleak after the twenty-first move, w h e n white (our opponent) played 13-9, as seen in figure 6o(b). At least, they looked bleak to me. W h i t e was headed for a second king. Furthermore, the neural network didn't seem to have the position required to advance its checker on I4 toward white's back row. W h i t e could swap him out if he moved 14-17 by moving 26-22, forcing an exchange.
LETTING
T H E G E N I E O U T OF T H E B O T T L E
269
FIGURE: 6 0
A The position just before the neural network (red) moved 9 - I 3 when playing against a human expert (white) rated 2,o54. It was in the early stages of the match, but the move seemed to leave the expert with an opening, starting with a forced jump, 18-- 9. B The position after the human player (white) moved 13-9. Things don't look very promising for the neural network (red). White appears to be moving toward his second king, and the neural network's pieces are not in a position to advance to a king. c Our opponent (white) moved 22-18, sealing his fate. He must have overlooked the neural network's forced reply: I 4 - 2 3 - I 6 . Our opponent resigned immediately thereafter.
Just when things looked bad for our neural network, it played 1 I - I 5. I didn't see that move coming, but then I don't usually look for moves that give up two pieces. With this move, the neural network loses two checkers but gains back a king and a checker and earns itselfa king in the process. The human player double jumped 2 - I I - I 8, and the neural network countered with 14-23-3 o. The game concluded after the thirty-first move, when our human opponent played 2 2 - I 8 , leaving the board as shown in figure 6o(c). I know what he was thinking. This move, 22--I 8, would be advantageous, because it would swap out the neural network's king on 14. The neural network would be forced to jump I4-23, and the reply of 2 7 - I 8 would capture the king and also open a path for him to earn a king. There was just one small problem: The neural network moved I 4 - 2 3 - I 6 . Oops! N o t so fast. The jump isn't I4-23; it's I 4 - 2 3 - I 6 ! It was all over. With that, white resigned.
Arriving at the Summit of Mount Expert By game fifty-five, on June 29, 1999, our rating had climbed above 2,ooo, all the way to 2,006.0 I. We checked the standard error of the rating over five thousand random orderings of the games. It was 0.24. Even if we subtracted 0.24 from 2,006.0I, we were still above 2,ooo. It was the first time that we'd demonstrated a neural network capable of playing at the level of a human expert, at least when moving first. Obi_WanTheJedi was now in the top six hundred of all players on zone.com, and I had something more definitive to say in my upcoming L E T T I N G T H E G E N I E O U T OF T H E B O T T L E
ZT|
tutorial at the Congress on Evolutionary Computation! We still had to figure out how good or bad the neural network was when moving second, and we still had to play enough games to see where our rating would settle. But now we had a basis for believing that the evolutionary algorithm really had developed an expert checkers player, even without using human expertise.
272
T H E M A K I N G OF B L O N D I E
Blondie24
15
The Congress on Evolutionary C o m p u t a t i o n came and went, and we were back to playing checkers on the Internet, trying to complete our assessment of the best-evolved neural network from generation 840. O u r rating was higher than 2,00o now and was posted with our name for everyone to see. You'd think that a rating above 2,000 would earn us some respect, and it did, but only from other players who were also rated above 2,000. Figure 6I shows a screen in which "RedBob 2" (an expert player) and I had a conversation after my m o d e m disconnected during our match. RedBob_2 found a new opponent, and I kibitzed on their game to apologize for getting disconnected. RedBob_2 was very complimentary of my level of play--all due to the evolved neural network, of course. Whereas opponents with expert or higher ratings were gracious in both victory and defeat, for our victims with ratings between I, 8oo and 2,000 it was the same old story: We'd beat them and end up on the receiving end of a series of expletives sent flaming through the chat box. After a while, we grew tired of this behavior and decided to conduct a little experiment. 273
FIGURE 61
A screen showing a game between RedBob_2 and viper_734, during which I kibitzed as Obi_WanTheJedi. RedBob_2 complimented our level of play. By permission of Microsoft Corporation.
Introducing Blondie2 4 K u m a r and I figured that most o f these sore losers were guys. M o s t w o m e n d o n ' t swear at you w h e n they lose. W h a t if we c h a n g e d o u r n a m e from O b i _ W a n T h e J e d i to, say, Blondie24? W h a t sort o f response w o u l d we get then? 274
T H E M A K I N G OF B L O N D I E
We'd need a cover story for Blondie (from now on, I'll use "Blondie" and "Blondie24" interchangeably). H o w did she get to be so good at checkers? We brainstormed and came up with her persona: Blondie is a twenty-four-year-old mathematics major, currently enrolled at the University of California at San Diego. Her parents named her after the comic strip character, and yes, she's a natural blonde. Blondie's very athletic and enjoys surfing and skiing, but she broke her leg last winter in a skiing accident. While recuperating, she had ample time to get really good at checkers. Oh, and yes, she's single, extremely attractive, and looking for a boyfriend. There's no provision for switching your screen name on zone.com, so Kumar and I logged on simultaneously, he as Obi_WanTheJedi and I as Blondie24. He proceeded to lose games to me on purpose. At first, he made every mistake he could think of, giving up checkers as quickly as he could. If only Blondie's real opposition would be this cooperative and lose this quickly, we joked. Then we realized that we were doing things the hard way. Instead of playing games out by hand, with Kumar committing the checkers' version of suicide time after time, all he had to do was hit the "resign" button. It wasn't long before Blondie24's rating rose to 2,o3o, where Obi_WanTheJedi's had been before we started this little adventure. With that, we retired old Obi_WanTheJedi. What followed was an amazing lesson in human psychology, and a great deal of fun. N o w instead of being flamed while we were trouncing our weaker opposition, we were being asked out on dates. One guy from across the country even bet Blondie a free dinner that he'd win. (He didn't, BLONDIE24
275
and I have no desire to collect.) A typical conversation went something like this: "Where are you?" "San Diego. And you?" "Los Angeles. Can we meet?" "Uh, I don't think so." "Are you really 247" "Yes.Your move." "I bet you're hot." "You wouldn't be disappointed.-) but it's your move." I figured that using the "=)" smiley face would be more feminine than the typical ":-)" or the more aggressive "%^)." It seemed to work. Kumar and I were completely convincing as Blondiez4, ~oo percent successful in our effort to play out the first half of Turing's t e s t ~ t w o guys pretending to be a woman who was really good at checkers. It's not an overstatement to say that we were on the receiving end of some rude or crude remark just about once in every three games. N o t being a woman, I hadn't been exposed to this side of male behavior. Ladies, I empathize. Do we men always behave like this? I had fun commiserating with the women who played on the website. In the guise of Blondie, I'd end up chatting about the other guys on the site, comparing notes about how they behaved and what jerks some of them were. Men would often come to kibitz on our games, and if we couldn't take any more of their vulgarity, one of us would turn off the kibitzing feature and summarily dismiss them from the room. I'd usually remark, "Girl power! =)" whenever I did that, which 276
T H E M A K I N G OF B L O N D I E
FIGURE 62
Some nice comments from an expert-level player on zone.com. By permission of Microsoft Corporation.
brought back sympathetic smiley faces or the acronym LOL for "laughing out loud." While many of the checkers room regulars were apparently interested in more than just a game of checkers with Blondie, the better opposition was as complimentary as ever. Figure 62 shows a typical screen that captures the comments of another player, again in the expert category. (I've redacted the screen to remove the opponent's name, because it may completely identify him.) Blondie was often told how well she was playing for her age and that she should practice and enter tournaments. It was all very flattering, and it was nice to see such a spirit of camaraderie. The really good checkers players truly love the game, and they'll go to great lengths to help others excel and enjoy the game too.
Playing the Games We'd already established good evidence that Blondie24 was an expertlevel player when moving first, but we needed to find out just how good "she" was overall. For six weeks, from mid-July through the end of August, we played I I o more games as Blondie24 on zone.corn, for a total of 165. In all, we played eighty-four games as red and eightyone as white, an almost even distribution. I'll provide one quick anecdotal game here that we played against a human opponent rated 2,I73, in the high-expert category. It was a surprisingly quick contest. Since playing our games with the previous spatial neural network, the number of registered players on zone.com had doubled to more 278
THE MAKING
OF BLONDIE
than eighty thousand. Our opponent was ranked ninety-eight and played red.You can find the complete listing of moves in the notes. 1 Figure 63 (a) shows the position after the ninth move, when Blondie played 24-2o. This move set up a two-move combination in which Blondie gained a checker. Her opponent played the 15-24 forced jump, and Blondie responded by choosing to jump 2 o - I I. Her opponent's reply of 7 - I 6 was again forced, and Blondie proceeded to double jump 27--20--1 I. After Blondie's nineteenth move, 8-3, she was in the position shown in figure 63 (b). Her opponent had left himself open. He'd made the plays to threaten Blondie's checker on 9, but in the meantime, he had left himself vulnerable to a double jump. That's exactly how it played out, as he jumped 5-14 and Blondie captured two of his checkers by playing 3--10--I 9This put the board in the position shown in figure 63 (c). Down a checker and a king, our opponent resigned, and Blondie had taken just twenty moves to defeat someone in the top one hundred of more than eighty-thousand registered players, someone only twenty-seven points away from the master level. (I reflected on our earliest efforts and the 1,75o-rated neural network, thinking, "You've come a long way, baby.")
Eight Minutes to Victory While we were between games on zone.com, I did a little searching on the Internet for checkers tournaments. I thought it might be fun to enter Blondie24 in a contest. I didn't expect her to win, but I BLONDIE24
279
FIGURE 63
A The position after Blondie24 (white) moved 24-2o. This play forced an exchange whereby Blondie24 gained a one-piece advantage over her opponent (red), a human expert rated 2,I73, ninety-eighth out of more than eighty thousand registered players.. The position after Blondie24 (white) moved 8-3, gaining a king. The king threatens ted's checkers on squares 7 and 15. Red is forced to capture 5-I4, by consequence going down two checkers, c The final position after Blondie24 (white) double-jumped 3-I o - I 9. With this play, the human opponent (red) resigned, and Blondie24 had defeated someone in the top one hundred of eighty thousand registered players.
thought she might make a good showing and, in any case, the experience would probably be enjoyable. Surprisingly, using one of the typical search engines (metacrawler.corn), I stumbled across a checkers tournament online at Carl Buddig's website. Carl Buddig is a national dell meat distribution company, so I had no expectation that its site would have anything to do with checkers, but the company had arranged a special promotion of holding weekly checkers contests to bring attention to its site. Kumar linked up to the website to enter in the last open tournament that was held on the site. I had just run to get some potato chips to snack on and came back to find Kumar saying that we just barely beat the deadline for registrations to close. Fifteen people had entered the tournament. There was no time limit for the games.You could take as long as you liked, but if it appeared that you were stalling, the tournament director would come to your game and tell you to make a move or forfeit. We paired up in the first round against ectasy2 I, who played red. It was an easy win for us, using mainly six-ply searches. Next came rista, and we played red. Again, we won. In the third round, we matched up with winnerlst. We played red and had a real challenge on our hands. Winnerlst put up a good fight, and in the end we were on the losing side, down one checker. We watched as winnerlst went up against Israel22 for the championship. Israel22 took the match pretty easily. Certainly, we'd given winnerlst a much better match than he'd given Israel22. O u r final result: third place. We sent an electronic instant message to Israel22 congratulating BLONDIE24
281
him on the victory and asking if he was rated. He responded that he was rated over 2,200 (master level) on a checkers website that was run by Excite, and that he was the defending champion from the last tournament held on Carl Buddig's Web page. Third place to a master didn't seem too shabby. With our success in the tournament, I looked for another venue and found it at playsite.com. Playsite had open checkers tournaments as well as tournaments for those rated under I, 3 oo, held multiple times each day. We entered the next open tournament, figuring that we had nothing to prove by beating weaker players. The difference between the format for Carl Buddig and that used by Playsite was that Playsite's games were all timed: You had eight minutes. If you ran out of time, you'd lose. So the trick was to make your moves quickly, while avoiding mistakes that put you in a losing position. Once you're down a checker, its very difficult to win, because if your opponent gets a king, he or she can toggle back and forth between squares very quickly and simply run out your clock as you try to figure a way to get back in the game. It was mid-afternoon on a Sunday, August 29, 1999, and twentytwo entrants had signed up for the open tournament. We set up the program to run the neural network and waited for our first match. After only a short wait, we were "invited" to join a checkers room, where the initial contest began. Ready, set, go! With only eight minutes to make all our moves, we had to move as soon as we could. This experience was intense. (It reminded me of the way world-class ping-pong players complete, with the ball flying back and forth at blinding speed.) Kumar had the neural network running on a separate computer. I would call out the move that our op282
T H E M A K I N G OF B L O N D I E
ponent made, and Kumar would quickly type it in, then yell back the response from our neural network at six ply. There never was time for an eight-ply move. We squeaked through game after game, all the way to the finals. It was a nerve-racking hour, but we ended up winning the entire tournament. It had been a good day. The next day was going to be even better.
August 3 o, 1999 After finishing our work for the day, Kumar and I were back to playing games at zone.com. We'd completed just more than 155 games, and our rating was still above the 2,ooo level, but it hadn't stabilized completely. Almost, but not quite. We recomputed Blondie's rating after every set of five games. W h e n we reached game 165 late that night, we graphed the rating based on five thousand random orderings of our opponents and saw the curve shown in figure 64. We were finished. Blondie's average rating was 2,o45.85 with a standard deviation of 33.94. After 165 games, the trajectory had stabilized, and Blondie was clearly an expert. 2 In fact, Blondie's rating of 2,o45.85 placed her in the top five hundred of all registered players on zone.com, which had now increased to more than 12o,ooo people. That is, Blondie was better than 99.61 percent of all the rated players at the website. Not too bad for a computer that had taught itself how to play. Figure 65 shows a histogram that displays all the results against opponents of different calibers over the 165-game series. W h e n playing opponents who were rated below 2,ooo, Blondie racked up an imBLONDIE24
283
2,100 2,050 2,000 1,950 .~ 1,900 ~: 1,850 =E 1,800 1,750 1,700 1,650 1,600 0
R 40
I 80
= 120
J 160
Game Number FIGURE
64
Blondie24's average rating as a function of the number of games played when considering five thousand different possible or&rings of the I65 games played. Blondie24's rating stabilized at 2,045.85, placing her in the top five hundred of all registered players on zone.com, which had increased to more than I zo,ooo people.
pressive eighty-four wins, twenty draws, and eleven losses. W h e n placed in competition against other experts, rated 2,000 to 2,200, Blondie earned ten wins, twelve draws, and twenty-two losses. With ten wins in thirty-two decisions, Blondie won 3 I.Z5 percent of her games against other experts. Using this figure to double-check the rating we obtained above, if we go 31.25 percent of the distance 284
THE MAKING
OF B L O N D I E
FIGURE 65
A histogram showing Blondie24's performance against players of different caliber. Recall, the width of each bar is one hundred rating points, so "1,55o" includes opponents rated between 1,5oo and 1,599. Blondie24 defeated players rated below 2,ooo consistently, earning eighty-four wins, twenty draws, and only eleven losses. She managed ten wins, twelve draws, and twentytwo losses against opponents rated in the expert category (2,ooo-2,2oo). This score provides additional support for Blondie24's 2,o45.85 expert rating, as indicated in the text.
between 2,000 and 2,200, we get 2,062.5o, which is in fairly close agreement with Blondie's 2,o45.85 rating. While we were playing out the 165 games, we noted that Blondie continued to win, not by taking advantage of an opponent's blunder, but more often by systematically limiting the number of possible moves that the opponent could make, until he or she was out of options and had to surrender a piece. Blondie would often move to pin an opponent's king even when the required sequence of moves extended well beyond the ply we were using. Time and time again, it seemed like Blondie's neural network had captured the idea of mobility and, furthermore, that it was a good idea to constrict her opponent's mobility. We'd had some fun naming our program Blondie24, but let's face it, Blondie24 isn't a very imposing name! So we decided to change her name to Anaconda, after the giant constrictor snake from South America, given that both the snake and our neural network seemed to have similar strategies (constricting the opposition). With that, Blondie24 graduated from UCSD, found her boyfriend, and retired from playing checkers. Kumar and I stuck with Anaconda for a while, particularly when writing scientific papers, but like many of today's sports figures, Blondie's retirement was short-lived. Okay, I admit it, I just have more fun playing checkers as Blondie than as Anaconda. =)
Blondie, Meet Chinook Five months passed after the final game that ended our experiment on August 3o, 1999. I hadn't thought too much about checkers and 286
T H E M A K I N G OF B L O N D I E
Blondie after Kumar and I finished preparing a paper detailing the results of the 165-game series. But on the last day of January 2ooo, I remembered that I'd never played Blondie against Chinook. I logged on to Chinook's website and started up a game with Chinook, again playing at the novice setting. I took the red side for Blondie, with Chinook playing white. All the moves are listed here, with my comments along the way (which you can read in lieu of the detailed moves and still get a feeling for how the game proceeded).
Red (Blondie)
White (Chinook)
Comments
I" 9--I 4
I" 2 2 - - I 8
I later found out that the 9-I 4 opening is called the "double corner" and is regarded as the second strongest opening behind II--I 5.
2" 6 - 9
2" z S - 2 z
3" i - 6
3" 2 4 - 2 0
3
I was three moves into the game and, as shown in figure 66, Chinook already thought it had a small advantage. 4" I I - I 6
4" 2 o - I I
Blondie swaps.
5" 7 - I 6
5" 2 8 - 2 4
Chinook still says it has a small advantage.
6- 3-7
6" 24-I9 BLONDIE24
287
FIGURE[ 6 6
The position after Chinook's (white) third move, 24-20. I took the figure straight from the screen while playing Chinook. Note that red (black) starts at the bottom of the checkerboard. Chinook's remarks appear at the top. Here Chinook says it has a small advantage.
Just six m o v e s
into the game, C h i n o o k said it had a big advantage (see
figure 67). This d i d n ' t l o o k v e r y p r o m i s i n g . 7: I6-2o
7:30-25
8: 8 - I I
8: 22-I 7
Chinook still says it has a big advantage.
9 : 9 -13
9:18-9
Three-piece swap coming.
IO" 13-22
IO" 25-18
II" 5-14
II: 18- 9
12:6-13
12:26-22
Chinook still says it holds a big advantage.
13:11--15
13:32-28
Chinook swaps.
14:15--24
14:28-19
15" 4--8
15:31-26
Whew! N o w Chinook says it's back to a small advantage.
I6:2--6
I6:23--18
Bummer. Back to a big advantage.
I7:6--9
I7: I8--I 5
Gulp! N o w Chinook says, "You are in serious trouble."
I8: IO--I 4
I8: I S--IO
Chinook moves for a king.
I9: 7 - - I I
I9:IO--7
20: I I--I 6
2O: 19--I 5
2 I : 16--19
2 I : 15--10
Figure 68 shows the p o s i t i o n after C h i n o o k ' s m o v e , 1 5 - 1 o . Its assessment was that it was b a c k to h a v i n g a big advantage, w h i c h s e e m e d BLONDIE24 2 8 9
FIGURE
67
The position after Chinook's (white) sixth move, 24-I 9. Chinook now says that it has a big advantage. By permission of Professor Jonathan Schaeffer.
like an i m p r o v e m e n t . C h i n o o k had two pieces that could m o v e u n i m p e d e d to Blondie's back rank. Blondie had only one piece that could similarly advice to b e c o m e a king at this point. 22" 19-24
290
THE MAKING
OF B L O N D I E
22:27-23
Oooh. Now only a small advantage.
FIGURE
68
The position after Chinook's (white) twenty-first move, 15--I0. Two of Chinook's checkers can advance to become kings. Only one of Blondie's checkers (red) has a clear path to a king. Chinook had said, "You are in serious trouble," but now was back to saying that it held only a "big advantage." By permission of Professor Jonathan Schaeffer.
23:8-11
23:7-2
Chinook gets a king, back to a big advantage.
24:1 I - I 5
24:2-7
Again to a small advantage. Maybe Blondie is hanging in there.
25:24-27
25:IO-6
26:27-32
26: 6 - I
Blondie gets a king. So does Chinook.
27:32--27
27:29--25
Blondie threatens Chinook's checker on square 23. Chinook can't protect it. Still Chinook says it has a big advantage.
28:27--18
28:I--6
Blondie is up one checker.
29:20--24
29:7--I0
Amazing: After Chinook's move 7 - I 0 , C h i n o o k said, "You have a big advantage." C h i n o o k seemed to k n o w what to do, but would Blondie? 3o: 15-19
3o: IO-I7
Chinook gets it back.
We were back to being essentially even on pieces, as shown in figure 69. C h i n o o k had two kings, while Blondie only had one, but Blondie had opened up the side of the board and seemed to be able to advance two or three checkers toward eventual kings. Chinook said, "You are w i n n i n g . . ,
don't blow it!" Obviously, C h i n o o k understood the
situation far better than I did. It looked pretty even to me. 29Z
T H E M A K I N G OF B L O N D I E
FIGURE
69
After Chinook's thirtieth move, I o-17, Chinook says, "You are winning.., don't blow it!" I didn't do a "screen capture" during the game here, so I've recreated the situation. The material looks fairly even, but the positional advantage seems to lean toward Blondie (red). Blondie has three pieces that appear capable of advancing to a king.
31:24-27
31:22-15
32:13-22-29
32:6-13
33:29-25
33:13-9
34:27-31
34:26-23
Interesting double swap here.
Blondie gets a king. Chinook can't protect its checker on square 26.
Figure 7o shows w h y C h i n o o k was w o r r i e d . B l o n d i e had m a n i p u lated the exchanges and forced the capture o f C h i n o o k ' s c h e c k e r on square 2 3. B l o n d i e was w i n n i n g , and I really h o p e d she w o u l d n ' t blow it! T h e next fifteen moves were all preordained. B l o n d i e and C h i n o o k c o n v e r t e d their r e m a i n i n g checkers into kings. 35: 19- 26
35:9-14
36: 31- 27
36:21-17
Blondie up one piece.
BLONDIE24
293
FIGURE 70
After the dust settles on move thirty-four, the position looked as shown. Chinook (white) has just moved 26--23, unable to protect its checker. Blondie makes the forced jump, 19-26, going up one piece.
294
37: 2 6 - 3 I
37:15-IO
38:12-16
38:I7-I3
39:25-22
39:lO-6
40:I6-19
4o: 6-2
41:19-24
41:2-7
42:27-23
42: 14-1o
43:24-27
43:lO-14
44:27-32
44:I3-9
45" 22-18
45: I 4 - I O
46" 31-26
46:7-11
47" 32-27
47: I I - I 6
T H E M A K I N G OF B L O N D I E
Blondie gets a king.
C h i n o o k gets a king.
Blondie up four kings to two plus one checker.
FIGURE 71
After move forty-nine, when Chinook (white) played 6-1, Blondie (red) held the advantage with four kings to three. Blondie would have to play out the game without using an endgame database, while Chinook used perfect information about any position with six or fewer checkers remaining.
48" 26-22
48" 9-6
49" 22-17
49" 6-1
Chinook gets a king.
Figure 71 shows the story. Blondie held the lead, four kings to three. W o u l d she find the w i n n i n g c o m b i n a t i o n even w i t h o u t relying on an e n d g a m e database? In the next fifteen moves the players toggled for position. I really can't c o m m e n t on w h e t h e r Blondie played well. T h e r e were two situations in w h i c h I chose the best move at six ply w h e n Blondie and C h i n o o k toggled back and forth at eight ply. 5o: 17-21
5o: lO-6
51" 2,1-17
51" 6-1o
32:17-21
52:lO-6
BLONDIE2 4
ZD5
F I G U R E "72
Blondie (red) can force a swap of kings to go up three kings to two. Blondie systematically pinned Chinook's kings against the side and corner and forced the swap. Chinook (white) resigns on move seventy-two. By permission of Professor Jonathan Schaeffer.
53:2I-I7
53: 6-IO
54:I8-I4
54: IO-I 5
55:I4-9
55: I S - I I
56:23-I8
56:I-5
57:I8-I4
57:5 - I
58: I4-IO
58: I - 5
59:17-14
59:16-19
6o: 9-13
6o: 5-1
61: 14- 9
61: I - 5
62:9-14
62: 5--I
63:13-17
63:1--5
64:17-22
64:5--1
65:22--18
65:1--5
66:27--3 I
66:19--24
67: i o - i 5
67: I I - I 6
68:I8-23
68:16-20
69:23-27
69:5-1
70:I4-I8
70:I-6
7I: I8-23
7I: 6-9
72:23-I9
72:24-28
Went with six-ply move.
Went with six-ply move.
Blondie gives Chinook an opening to the double corner.
Blondie at work, constricting mobility.
Chinook gets pinned in the corner.
BLONDIE24
297
With Chinook's seventy-second move, I saw the board shown in figure 72. Chinook had resigned. It knew Blondie could swap out kings now and go up three kings to two: "You win. Congratulations!" I slumped back in my chair with quiet contentment, then got up and told my wife the good news. It was late at night, but I phoned Kumar anyway and filled him in. We had finally made a dent in Chinook, and it was time to write a book. 4
298
THE MAKING
OF B L O N D I E
Epilogue: The Future of Artificial Intelligence
To date, merely recapitulating in a computer program what we already know has passed as "artificial intelligence." That's not what the term meant to a pioneer of artificial intelligence like Arthur Samuel, who dreamed of building a computer that would create its own solutions to problems. We would just "tell the computer" what we wanted it to do, and it would figure out a way to do it. Samuel and many other pioneers were optimistic about the prospects for building such an intelligent machine, but designing one has proved to be a diflqcult challenge. Perhaps because of this difficulty, over the years, AI has shifted its focus and evolved into an effort to program what we already know into computers. The result has been the creation of computers like Deep Blue, which can outperform us in solving specific problems by virtue of their sheer speed and accuracy. But the real promise of artificial intelligence lies in the ability to solve new problems in new ways, without relying on existing expertise. 1 So far, that promise has gone largely unrealized. In essence, we cheated on the test. We gave the computer a "crib sheet" of answers right before the final exam. 299
Samuel's Challenge Samuel framed the challenge well within his domain of checkers. His goal wasn't just to have his checkers machine learn to play a good game, but also to have it invent its own parameters, its own features for understanding which positions to favor. Unfortunately, that goal remained elusive. In 1967 Samuel identified the primary "glaring defect" of his 1959 checkers program as "the absence of an effective machine procedure for generating new parameters for the evaluation procedure." He continued: While the g o a l . . , of getting the program to generate its own parameters remains as far in the future as it seemed to be in 1959, we can conclude that techniques are now in hand for dealing with many of the tree pruning and parameter interaction problems which were certainly much less well understood at the time of the earlier paper. Perhaps with these newer tools we may be able to apply machine learning techniques to many problems of economic importance without waiting for the long-sought ultimate solution. 2 Samuel's prediction has since been fulfilled. Many of the tools that he discussed, involving tree pruning and other techniques, proved invaluable in creating faster, more effective search methods. These improved the performance of AI programs for many decades. But Samuel was also correct in distinguishing these tools from the "ultimate solution," a program that would invent its own features for interpreting the positions of the pieces on the board and would learn on its own how to interpret those features. 300
EPILOGUE
Kumar and I have demonstrated that an evolutionary algorithm, coupled with artificial neural networks and a minimax procedure, can accomplish this feat, realizing Samuel's challenge.
The Evolutionary Approach to Artificial Intelligence The field of artificial intelligence has seen many important accomplishments. Building programs and machines to defeat the world's best human chess and checkers players are significant engineering achievements. Many related techniques have been invented in ef[icient theorem proving, pattern recognition, and tree searching, all of which are symptoms of intelligent behavior. Certainly, AI programs have been applied successfully to specific problems, but they have not generally advanced our understanding of intelligence. They've solved problems, but they haven't solved the problem of how to solve problems. Having failed to realize the dreams of the artificial intelligence pioneers, artificial intelligence is sadly better known for its "AI winter"-when the technology couldn't live up to the hype it received. Perhaps the root cause of the failure has been our hubris. We've focused repeatedly on emulating our own behaviors, trying to recapitulate our own intelligence. 3 Instead, a more promising direction for AI may lie in emulating the learning process that generated human beings and all other intelligent forms of life: evolution. Thus far, the field of artificial intelligence has not demonstrated an ability to produce anything like the marvels of evolution. Even today, if we were to ask the most talented people in robotics to design a robot that would roam about in the world, perhaps even T H E F U T U R E OF A R T I F I C I A L I N T E L L I G E N C E
301
fly, find food, communicate its location to other similar robots, swarm to defend its "colony," and reproduce itself, they would be left shaking their heads. We don't know how to do this. Moreover, if we raise the bar and demand a colony of robots that can evolve in light of new environmental demands, then the prospects shift from implausible to seemingly impossible. 4 The products of evolution achieve such feats quite naturally. It's equally natural to look to nature for inspiration as we endeavor to create intelligent machines, such as HAL. The results I've offered in this book provide an impetus for further research into the potential for evolving intelligent machines.
Building HAL: The Art of the Possible Kumar and I started our research with a scientific point to prove. We wanted to demonstrate that a computer could learn how to play checkers based only on the knowledge that a novice player would have and without any explicit feedback on which games were won, lost, or played to a d r a w ~ o u r answer to the challenges posed by Samuel and Newell. When we consider the possibilities for designing HAL, however, no such scientific point needs to be proved. HAL won't have to relearn everything that we humans already know just to prove the scientific point that such machine learning is indeed possible. Evolutionary computation will offer machines like HAL the ability to surpass our own knowledge, but we don't have to start HAL off with completely random ideas (such as the randomly weighted neural networks on which Kumar and I relied in the first generation 302
EPILOGUE
of our evolutionary experiments). Whatever hints we can give HAL, or similar future machines, will save valuable time, provided that knowledge is accurate. Perhaps this will be the new role for traditional artificial intelligence techniques: to provide a leg up for machines that will go beyond the limits of our own knowledge. 5 Combining our knowledge with an evolutionary means for enabling a machine to learn on its own is a formidable challenge in its own right. 6 Arthur C. Clarke erred in his prediction that by 2oo I we'd have built computers like HAL--fully conversant in our languages, able to play chess flawlessly, completely indistinguishable from our own cognitive capabilities. I noted in chapter 1 that designing HAL lies wholly in the realm of science fiction, that we have no idea how to build HAL. That was true before Kumar and I started our experiments, and it is still true today. Our experiments offer only a small step toward such a creative computer. I hope that this step will encourage others to join in the pursuit of evolution's ability to discover new solutions in new ways, perhaps to the point of someday evolving an intelligent companion such as HAL. I also hope that along the way we resist the temptation to hype what is possible beyond what is reasonable. 7 Allen Newell's quote on the opening pages of this book is most apt: "Everything must wait until its time; science is the art of the possible." Evolutionary computation is still in its infancy. Its time is coming.
T H E F U T U R E OF A R T I F I C I A L I N T E L L I G E N C E
303
This Page Intentionally Left Blank
Appendix: Your Honor, I Object~
It's useful to consider a number of possible objections to the claims I've made in this book, in particular about what evolutionary computation has been able to achieve. (Some are actual objections that have been leveled in anonymous peer reviews of papers that I've submitted to technical journals for consideration.) The neural network was useless. The only thing that was working for you was the p i e c e - c o u n t feature. Any checkers program that relies on finding positions that offer a material advantage will p e r f o r m well.You haven't demonstrated that the neural network did anything m o r e than a simple p i e c e - c o u n t program would do. I.
If you have thought of this objection, congratulations; you're suitably skeptical. This is a valid objection, based solely on the experiments I've detailed in the book. Further experimental evidence, however, has shown that the neural network did indeed learn more than just the piece count. In the summer of 2ooo, I conducted two sets of control experiments to assess the effectiveness of the best-evolved neural network, 305
Blondie24, as compared to a player that uses only the piece differential and assigns kings the standard value of 1.5 checkers. In the first experiment, I played Blondie against a piece-count player in a series of fourteen games in which both programs were restricted to looking ahead eight ply. There are seven possible moves to open a checkers game, and I played each program as red and as white in each of these games. A program that relies only on the piece differential will play a weak endgame, because it will almost always be unable to see far enough ahead to capture pieces, and that is the only information that it has available. Thus to be fair, I played out the games until either (I) the game was completed with one side earning a win, or (2) the neural network and the piece-count program toggled between two alternating positions. In the latter case, I then assessed the outcome by examining the material advantage that one side had over the other, and also by playing out the game using another strong computer program called Blitz98, available over the Internet as shareware. Blitz98 played out the remainder of the game from the toggled position and declared a winner. If the best-evolved neural network truly offered no advantage over the piece-count player, we'd expect an equal number of wins and losses in the games, with some variable number of draws. The results proved otherwise. O f the fourteen games, two were played out to completion, with Blondie winning both of these. O f the remaining twelve games, Blondie held an advantage in material in ten games and was behind in the other two. Using Blitz98 to play out the twelve incomplete games, Blondie was awarded the victory in eight of them, the piece306
APPENDIX
count player earned the win in one game, and the remaining three games were played out as draws. By all measures, the best-evolved neural network was significantly better than the piece-count player. The results were statistically significant using probabilistic mathematical tests, and they demonstrate the superiority of the neural network. Using the standard rating formula, the results suggest that Blondie at eight ply is about 311 tO 400 points better than the piece-count player at the same eight ply (based on material advantage or final outcome). A program that relies on only the piece count and an eight-ply search will defeat a lot of people, but it's not an expert. Blondie is. 2. T h e best-evolved neural network is significantly better than a p i e c e - c o u n t player at eight ply, but it's not worth the extra c o m p u t a t i o n . Y o u have m o r e than five thousand weights in your neural network. All the c o m p u t a t i o n that you p e r f o r m in every evaluation m i g h t be put to better use simply by e m ploying a p i e c e - c o u n t player that can search to a greater ply. By l o o k i n g ahead farther, it will see material advantages before the neural network does.
After completing the set of fourteen games referred to in the previous objection, I conducted a separate test that demonstrated the extra computation was indeed worth the effort. I played out fourteen games during which both Blondie and the piece-count player were allowed two minutes of computing time to make each move. Both programs relied on a 466 M H z Celeron processor to eliminate any variation that might have occurred if different CPUs were used. YOUR HONOR, I OBJECT!
307
With the same termination criteria as before, Blondie won two games outright and lost none. O f the remaining twelve games, Blondie was ahead on material in six games, behind in one, and even in five. Based on using Blitz98 to complete the twelve games, the final tally was eight wins, two losses, and four draws for Blondie. These results are all the more compelling when you note that the piece-count player was typically searching two ply more than Blondie, and occasionally as many as six ply more. Blondie could typically complete an eight-ply search in two minutes, but in some cases could only complete six ply. In contrast, the piece-count player could normally complete a ten-ply search in the allowed two minutes, but sometimes could crank out a fourteen-ply search. Despite having a horizon of two or more ply beyond the neural network, the piece-count player was beaten decisively. Not only does this demonstrate that Blondie's extra computation is indeed valuable, it confirms that Kumar and I have met the SamuelNewell challenge. The only compelling rationale for explaining how the piece-count player can lose to Blondie when the piece-count player looks farther ahead is that the evolutionary program has captured useful positional information that can overcome the deficit. Although I cannot identify what features might be contained in the neural network, beyond mobility, there's no doubt that they are indeed real and that they are more valuable than the farther look into the future provides. 3. The expert neural network was structured perfectly. You gave the evolutionary program the perfect neural network design. H o w could it fail? 308
APPENDIX
In fact, Kumar and I tested only two different neural designs, and our primary choice was almost arbitrary. We selected forty hidden nodes in the first layer and ten in the second simply because my prior efforts in evolving neural networks to play tic-tac-toe had required ten. We figured that checkers was a more complicated game, but we made no effort to determine just how much more computation would be required or that the feed-forward architecture and sigmoidal processing elements were somehow optimal for the task. Furthermore, our design for covering the spatial subsections of the board only offered the evolutionary program the chance to learn about the two-dimensional aspects of the board. Certainly, any novice who plays the game, even for the first time, recognizes that the board is two-dimensional. O u r initial neural networks had to treat the game as if it were played on a one-by-thirty-two string of squares with seemingly strange rules for how the pieces could move between those squares. We believe this was quite a handicap. We never hard coded any weights or patterns into the neural architecture that represented features or specific combinations of patterns. O u r most recent architecture includes a variation on the spatial subsets wherein the diagonal files are presented as inputs. This structure is obvious to the beginner as well. If any interesting results emerge from using this architecture, I'll be pleased to include them in a second edition of this book. 4- O f course your neural networks did very well.You cheated by giving them the piece advantage, which is the m o s t important feature o f the game. YOUR HONOR, I OBJECT!
309
The piece advantage is certainly an important featuremperhaps the most importantmbut it is also information that any novice would use. And by itself, it does not explain Blondie's performance. We know that Blondie learned features beyond the piece advantage because she can defeat a piece-count player at the same ply, and can win even when allowing the piece-count player to search at a greater ply. 5- Your choice of points for win, lose, or draw wasn't symmetric.You penalized losing m o r e than you rewarded winning. This was an unfair use of your own knowledge. O u r choice of points was indeed a use of our knowledge, but it's not at all dear how that choice would favor the evolution of expert-level players. Our rationale was simply to discourage neural networks from losing, which is always easier to do than winning. To be fair, we haven't conducted an experiment in which the penalty for losing is the same as the reward for winning, so I can't confirm that the evolutionary program is able to generate an expert-level neural network under that condition. There's no reason I can see, however, why it should fail under that condition. 6. Nowhere did you treat the a m o u n t of time that the neural network should take on each move, as would be needed in t o u r n a m e n t play.You handled this yourself, which was also an unfair use of your knowledge. O f all the criticisms, perhaps this one is most valid, although our handling of the time per move almost assuredly lowered Blondie's rating
3 ! 0
APPENDIX
rather than boosted it. Tournament conditions provide an average of two minutes per move or, more accurately, sixty minutes for thirty moves. W i t h our goal of using no more than sixty to ninety seconds per move, and sometimes as few as thirty seconds, we likely handicapped the neural network, inducing poorer moves than it might have discovered if we had taken more time. W h e n evaluating our evolved neural networks, we simply used the search depth that would complete in what we thought was a reasonable amount of time. If we had instead made many of our moves in, say, fifteen seconds, and spent, say, six minutes on moves that we thought were more demanding, then the criticism would be valid. But we didn't do that, and we couldn't have even if we had wanted to, because the four-minute time limit on zone.corn on each move restricted us in this regard. The criticism that we managed the neural network's time is valid, but it's only relevant to the case of playing in tournaments in which time management is an issue. W i t h respect to the neural network's expert rating, our control of time was so rudimentary and unbiased that we likely hindered the neural network substantially. We certainly didn't help it. I expect that Blondie could be rated above 2,200 with proper time management, but I admit that's purely speculation.
7- The Gedanken experiment you presented at the end o f chapter 6 isn't appropriate because the neural network still requires a m i n i m a x search, which goes beyond what a person could c o m p u t e in his or her head.
YOUR HONOR, I OBJECT!
3 $
If the evolving neural networks had used a deep search, this would be a stronger objection. Recall, however, that the neural networks worked only with a four-ply search during the evolutionary trial. At four ply, there really isn't any "deep" search beyond what a novice could do with paper and pencil if he or she wanted to. If you're reading this book, then you have the mental wherewithal to execute the Gedanken experiment as I've laid it out. Unfortunately, you can't start with a clean slate, the same way the neural networks did, because you already know too much about checkers, even if you've never played.You know about two-dimensional space, spatial nearness, jumping, the concept of a diagonal, and so forth. Even with this additional knowledge, I find the Gedanken experiment instructive. If I posed a new game that was as complex as checkers yet completely different from checkers, how many games would you have to play to rise to the expert level? Would you need more or fewer games than the evolutionary program needed to create an expert? 8. You didn't use any sophisticated form of evolutionary computation to achieve your result.
Exactlymnone was necessary. Any more sophisticated form of evolutionary computation would have shown the need for a specialized approach to the problem, which is just what Kumar and I wanted to avoid. One of the main points of our research has been to indicate the general applicability of the basic evolutionary algorithm. 9. Lots of people have used evolutionary approaches to playing games.You aren't the first to do it. 312
APPENDIX
Kumar and I don't claim to be the first to use an evolutionary algorithm for computer games. People have been using evolutionary algorithms for playing games since the early 196os. In 1963 Nils Barricelli used John von Neumann's computer at Princeton to learn how to play a game that's much like Nim. 1 Later, in 1967, he published another evolutionary approach to learning how to play a simplified game of poker. 2 My father used an evolutionary approach in 1969 to learn to play such games as the prisoner's dilemma, a game made famous in The Evolution of Cooperation by Robert Axelrod. 3 There have been many other noteworthy efforts since then, some even applied to complex games such as Othello, backgammon, and checkers. Many have relied on features that people believe are important and have used an evolutionary algorithm to find weights for those features (thus relying on human expertise). Others have used neural networks to invent features, but the resulting performance has not been demonstrated to be of expert caliber. 4 I O. Other methods, such as reinforcement learning, could be used to achieve the same performance. There's nothing very special about the evolutionary approach that you've implemented.
Recall that I stated in this book that other methods might indeed serve to facilitate machine intelligence (computers that adapt behavior to meet goals in a range of environments). The evolutionary approach, however, offers the possibility of adapting even when the final outcome of any particular game is unknown, when only an overall point score is available for a series of games. This may be a distinct advantage. YOUR HONOR, I OBJECT!
313
With respect to reinforcement-learning methods, specifically, they seek to reward an algorithm when it is doing well and punish it when it is doing poorly. Often, the behavior of an algorithm will be defined by various parameters. W h e n the algorithm is doing the right thing, the reinforcement-learning procedure generally leaves the parameters alone. W h e n the algorithm fails, however, reinforcement learning changes the parameters. More sophisticated methods are available for adjusting the parameters appropriately, both in direction (larger or smaller) and magnitude (big changes or little changes). Perhaps a reinforcement-learning approach might be able to generate an expert checkers player without using explicit features of the game. But it's di~cult to imagine how this would work when the final outcome of any single game is unknown to the algorithm. Remember, our evolutionary program had to work with a single point score that described the relative success or failure of each neural network in the population over every game that it played. There was never an explicit record of which games were won, lost, or drawn. How, then, would we know which moves to reinforce positively or negatively? The only remaining reinforcement that can be made is to simply reinforce the winners and eliminate the losers, followed by making random variations to the w i n n e r s ~ a n d that's evolution. II.
Your e x p e r i m e n t s are n o t repeatable b e c a u s e y o u had to
use j u d g m e n t a b o u t w h e n to a c c e p t a draw and also a b o u t h o w l o n g to allow the search to e x t e n d . 314
APPENDIX
True, but this is most likely a moot point. We applied the same decision-making process in four evolutionary trials and do not believe that our decisions made any material difference to the final rating. To the extent that our judgment did enter the process, it likely served to lower the neural networks' ratings. For example, with regard to our decisions to cut off the search or wait for completing a higher ply, we evaluated each neural network with a consistent time frame in mind. If we didn't expect the alphabeta search to come back with another two ply in the desired time, we bailed out. It's very likely that we bailed out too early in some cases, but again, this would only have hindered our final result, giving us more chance of making inferior moves. The fact that each experiment is not exactly repeatable does not alter the final assessment of Blondie's performance. Moreover, the experiments are repeatable in the statistical sense. Flipping a coin to generate a sequence of heads and tails does not yield a repeatable experiment, but the possible outcomes follow a probabilistic distribution that can be described precisely. Likewise, each evolutionary trial and each evaluation during many games cannot be replicated precisely, but four separate trials and evaluations have shown increasing degrees of success. I 2 . T h e ratings on Internet g a m i n g sites aren't really valid. You earned a rating o f about 2,050 on z o n e . c o r n , but that doesn't m a k e your neural network an "expert." After all, the top-rated players on that s a m e site have scores that are even
higher than C h i n o o k , w h i c h implies that they are better than the world's best player.Your rating isn't any m o r e valid than theirs. YOUR HONOR, I OBJECT!
3113
When we first started evaluating our evolved neural networks on zone.corn, the best players were rated just above 2,400. Since then, the ratings of the best players have climbed much higher, even temporarily above 3,000, which is far better than Chinook, the world champion. (Most recently, the top players were rated in the 2,7oos.) The reason that these players can earn such high ratings is that they can play any number of much poorer players and keep racking up points with wins. They might earn only a single point for a victory, but they hardly ever lose. (Perhaps some players take "extraordinary measures" to prevent losing, but let's assume they are all legitimate.) In contrast, tournament players and the people who enter their computer programs in tournaments would never take on such opposition. They wouldn't have the chance. It would be like tennis champion Pete Sampras earning rating points by showing up at the local recreation center and trouncing a weekend club player 6-0, 6-0. There's no way that the average amateur can ever get to play Sampras for rating points. Only sanctioned tournaments count toward a player's rating. On zone.corn, however, and other Internet sites like it, every contest counts, and really good players can boost their ratings artificially by beating up on lesser competition. Having admitted the disparity in the ratings at the very highest end of the scale, there is compelling evidence that Blondie's rating of 2,o45 is justifiable. Consider, for example, that even though the top ratings on zone.corn have climbed without bound, the ranking for a player with a 2,o45 rating, like Blondie, has hovered between 48o and 52o for most of the past year. That means that even though the highest ratings have skyrocketed, the percentiles haven't changed very much. 5 3 ! 6
APPENDIX
This consistency shows that the ratings of the highest players really don't make much difference. The much larger population of players around the 2,000 level makes the ratings around that same level more reliable. As independent corroborative evidence, recall that Jonathan Schaeffer described Chinook as being a strong player when played at the novice level but not quite a master-level player. The results of headto-head competition between Blondie and Chinook show Blondie about 12o points behind Chinook. If we accept that Chinook at the novice level is rated about 2,15 0--2,175, then by subtracting 120 points we find that Blondie is rated about 2,o3o-2,o55, which matches up very well with the rating she received on zone.com. If you are still not satisfied that Blondie is an expert, then perhaps I'll just have to leave it with Blondie being better than 99.61 percent of all players on the zone and let you assign the moniker. 13. B l o n d i e c a n n o t defeat C h i n o o k . Y o u r p r o g r a m isn't a world champion.
True, and she likely never will be. But demanding that Blondie be a champion reflects a common error in assessing the results of artificial intelligence programs. In essence, it's a version of the Turing Test taken to the extreme. It's not enough that we must compare the program's performance to that of a human being; we also have to compare it to the absolute best human player (or the best program) who ever lived. Such objections have been around since the early days of artificial intelligence. But, as Paul Armer of R A N D Corporation argued back in 1963, W h o says that we must play piano like Chopin, write operas YOUR HONOR, I OBJECT!
31 7
like Mozart, or compose symphonies like Beethoven to be judged a success? 6 How many people can play piano at Chopin's level? Beyond that, the quality of performance that can be achieved by a computer program is only part of the equation. How the performance level was achieved is also important. Whether the program can adapt its behavior to meet goals in a range of environments is paramount. Blondie can't defeat Chinook, but what she can do is defeat many people who are very good at checkers, even occasionally beating masters. Blondie rose to the expert level not by being told how to do it but rather by discovering how to do it. If artificial intelligence is ever to achieve its promise of generating machines that "think," we need to tackle the issue of what the AI program has learned, not simply the level of its achievement. I4. Blondie isn't really any different from Deep Blue. Each program contains some knowledge o f the game, and each relied on some machine learning. The fact that Blondie used less initial knowledge and learned more doesn't make it better.
Blondie isn't better than Deep Blue, at the very least because Blondie (the neural network) and Deep Blue (the computer program) play different games. But the approaches used to design each player are notably different. The approach used to create Deep Blue relies on human beings already able to play chess at the highest levels. The approach that evolved Blondie does not rely on any one knowing how to play checkers, at any level beyond a simple knowledge of the rules (and in our first attempts, she did not even know that the game was played on a two-dimensional board). The Deep Blue approach is lim3 | 8
APPENDIX
ited to addressing situations that human beings already understand. The Blondie approach, the evolutionary approach, uses nature's design principle to address situations that we might not understand. This versatility greatly increases the applicability of a computational approach to problem solving and points the way toward the future of artificial intelligence.
YOUR HONOR, I OBJECT!
319
This Page Intentionally Left Blank
Notes
Chapter 1 : Intelligent Machines i. Alan M. Turing, "Computing Machinery and Intelligence," Mind 59 (1950): 433-60. 2. Whether the machine should then be judged capable of thinking was left unanswered. In fact, Turing actually dismissed the original question, Can machines think? as being "too meaningless to deserve discussion." In essence, Turing argued that if a machine were as good as a man at pretending to be a woman, then it wouldn't matter if we said that the machine was "thinking." Indeed, assigning a label like thinking would be a moot point. 3. In fairness, Turing's own writings on the methods involved in the test were contradictory. He first wrote, as I noted in the text, "We now ask the question, 'What will happen w h e n a machine takes the part of [the man] in this game?' Will the interrogator decide wrongly as often w h e n the game is played like this as he does w h e n the game is played between a man and a woman?" This question sets up a scientific experiment in the classic sense. The hypothesis is that the computer will perform as well as the man at fooling the interrogator. The experiment seeks to gain evidence against this hypothesis. The "control setting" involves the frequency with which the man can fool the interrogator into believing that he is the woman. The "alternative setting" then replaces the man with the machine, 321
leaving the other parameter, that is, the woman, as a constant. The experiment has us measure the frequency with which the machine can fool the interrogator into believing that it is the woman. In doing so, we make an "apples-to-apples" comparison of the data gathered in each setting. Turing proceeded to offer a series of comparisons between the machine and the man: "The game may perhaps be criticized on the grounds that the odds are weighted too heavily against the machine. If the man were to try and pretend to be the machine he would clearly make a poor showing." By.juxtaposing the man and machine in this way, perhaps Turing led himself astray. To keep the experiment honest, we have to note the subtle point that the machine isn't just pretending to be a man, it's pretending to be a man who is pretending to be a woman. This subtlety eluded Turing's final description of the test: "Let us fix our attention on one particular digital computer C. Is it true that by modifying this computer to have an adequate storage, suitably increasing its speed of action, and providing it with an appropriate program, C can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man?" (Recall that in Turing's original description, '•' is the man and "B" is the woman.) There is nothing equivocal about this description; it's simply poor science, or more likely a simple slip. By directly substituting the man for the woman, we face an uncontrolled experiment. What does it mean now for the program to "play satisfactorily"? Before, when there was a legitimate control setting, the definition of
satisfactorily
was clear. If the machine could take the part of the man
in the imitation game and perform as well or better at fooling the interrogator, the machine would be declared to have played satisfactorily. In the new version of the game, when the man replaces the woman, we face an "apples-to-oranges" comparison. Suppose the man could fool the interrogator into believing that he was 322
N O T E TO PAGE 7
the woman 3 o percent of the time. In addition, suppose that the machine could fool the interrogator into believing that it was the man 31 percent of the time. Would the machine pass the test? There's no right answer anymore, because the control in the experiment has been removed. The control and the alternative are now unrelated, because each has two parameters and both parameters have been changed. The logical graveyard spiral here can be made more evident by substituting different familiar comparisons. Suppose the original comparison involves assessing how well a square imitates an equilateral triangle. A square has four vertices and four equal-length sides. An equilateral triangle has three of each of these. Thus we might say the ratio of 4:3 describes how well a square imitates an equilateral triangle. Ratios that are closer to Iii indicate greater similarity. N o w suppose that we substitute a pentagon for the square and replace the triangle with the square. Here we have to ask how well a pentagon imitates a square. By the same yardstick, a pentagon imitates a square with the ratio 5:4. The ratio 5:4 is closer to I'I than 4:3, so what do we conclude? Surely we can't conclude that the pentagon is more like an equilateral triangle than a square is. In fact, any conclusion that we make about the pentagon must be completely unrelated to the results of our comparison between the square and the triangle. That comparison becomes irrelevant and any experiment that incorporates an irrelevant comparison is designed poorly. The ground comes up on our logical graveyard spiral even faster if we think about comparisons using a simple game of poker. Suppose the original comparison involves assessing how well a flush, five cards of one suit, imitates a straight flush, five cards of one suit in an ascending series. The rough or "fuzzy" answer is that a flush halfheartedly imitates a straight flush; after all, it's essentially half of a straight flush. N o w suppose that we substitute a straight for the flush and then replace the straight flush with the N O T E TO PAGE 7
:32:3
flush. We are left with asking how well a straight imitates a flush. The answer is not at all. So what do we conclude? Does the five-card straight measure up "satisfactorily"? It seems not. Ah! But if we go back and ask the question we should have asked, namely, how well does a straight imitate a straight flush, the answer is halfheartedly just as it needs to be. Only when we focus on Turing's primary description of his test, in which a machine is substituted for the man directly, is the test itself logically sound. 4- Furthermore, since the initial efforts in getting machines to play chess and other games of skill were not particularly successful, this endeavor seemed all the more appropriate and worthy. The issue of "thinking" was difficult. So was the challenge of getting machines to play a good game of chess. It seemed that we surely had to be on the right track. 5. What difference does the decision make if its impact cannot be measured? We might as well flip coins to decide how to allocate our resources if we don't care enough about the outcome even to measure it. In contrast to assigning equal likelihood to every possible option, the intelligent decision maker must instead ensure that the probabilities of choosing alternative options are not equal and in fact are suitabt7 unequal. The probabilities of choosing alternative options should be set to favor those options that promote achieving the decision maker's goal. 6. Nature's processes are always impossible to summarize in short sentences, because there are always necessary exceptions and caveats. For example, reproduction is also a goal that is reinforced by natural selection. Those individuals that do not reproduce may live long lives, but from an evolutionary perspective they are genetic "dead ends." Here I subsume the concept of reproduction under the broader concept of survival, because reproduction is required for the survival of the species, which comprises its individuals. In addition, individuals that demonstrate behaviors associ324
N O T E S TO PAGES 9 - - 1 3
ated with goals other than survival may be "successful." The man who seeks to commit suicide by jumping into a lion's cage may be very successful. Again, natural selection will view him as a genetic dead end unless he reproduces before he achieves his own version of nirvana. The outcome of millions or even billions of generations of variation and selection is a collection of individuals who are dedicated to their own survival and/or the survival of their close relatives. 7. This definition was first offered in J. Wirt Atmar, "Speculation on the Evolution of Intelligence and Its Possible Realization in Machine Form" (Sc.D. Diss., N e w Mexico State University, 1976). 8. A related definition was offered in Lawrence J. Fogel, Alvin J. Owens, and Michael J. Walsh, Artificial Intelligence Through Simulated Evolution (New York: Wiley, 1966), p. 2. 9. This insight was offered by Norbert Weiner, Cybernetics, part 2 (Cambridge, MA: M I T Press, 196 I), and restated in Atmar's dissertation (see note 7).
Chapter 2" Deep Blue i. Alan M. Turing, " C o m p u t i n g Machinery and Intelligence," Mind 59 (I 950): 433-60. 2. Herbert Simon and Allen Newell, "Heuristic Problem Solving: The Next Advance in Operations Research," Operations Research 6 (I 958): I o. 3. ABCNews.com, "Supercomputer or Superstar?" wuav.abcnews.com, August 28, 2ooo. 4. Guinness World Records z ooo: Millennium Edition (New York: Bantam Books, 2ooo), p. 289. 5. Claude E. Shannon, '~utomatic Chess Player," Sdentific American 182, no. 48 (195o). Shannon passed away on February 24, 2ooi, at the age of eighty-four. N O T E S TO PAGES I 3 - - 2 I
32S
6. I detail the minimax principle in chapter 6. 7. Cited in David N. L. Levy and Monty Newborn, How Computers Play Chess (New York: Computer Science Press, 199 I), pp. 35-38. 8. Pravda reported an unconfirmed account of a running program in the Soviet Union earlier. A technical reviewer of Blondie24 remarked that one of his colleagues suggested a connection between this unconfirmed report and the work of Mikhail Botvinnik (I9I I--I995) , the three-time former Soviet chess champion. In my attempt to document any possible connection in light of this hearsay, I found no evidence of any effort by Botvinnik to work with computer chess programs before 1963. Botvinnik did meet with Claude Shannon in 1965 in Russia and discussed chess programs via their English-Russian translators. (See wun~.research.att.com~,'rajas
/doc /shannonbio. html. ) Botvinnik's efforts in computer chess programs received mixed reviews. Alper Ere, the fortieth-ranked junior player in Europe in 1995 remarks that Botvinnik's "contributions have been meagre" (www.geocities.com/ TimesSquare/Ring /4 86o/botwinnik.html), and a remark found at ummzishipress .corn/botuinni.html, hosted by expert-rated Sam Sloan, reads, "Hans Berliner recently characterized Botvinnik as a 'fraud' in the [field] of computer chess." Yet David Levy and Monty Newborne wrote that Botvinnik's "ideas were highly original and led to many stimulating publications on computer chess." Sloan's website also offers that the first world computer chess championship, held in I974 in Stockholm, Sweden, was won by a program developed by a team led by Botvinnik. Botvinnik's contributions to computer chess programs are no doubt controversial, but I've been unable to connect any of them to efforts in the Soviet Union to program a computer to play chess before 1956. 9. See R. Greenblatt, D. Eastlake, and S. Crocker, "The Greenblatt Chess Program," FJCC 31 (I 967): 80 I - I o. 326
N O T E S TO PAGES 2 1 - - 2 2
I o. The rating system uses the following formula:
Rnew = Rota+ C(Outcome- W) where W = [1 + 10(Ropp -Rold)/400]-l,
Outcome - 1 if win, 0.5 if draw, 0 if loss, and C = 32 for ratings less than 2,100. is the computed new rating based on an old rating of Rota. Your rating increases when you win and decreases when you lose, but the amount of the increase or decrease depends on the difference in rating between you and your opponent. The constant factor C is lowered when your rating goes above 2, I oo, and then again above 2,4oo, making it harder to gain or lose points. Rnew
I I. Chess games are typically played with a time limit for completing a preset number of moves. The players must manage their time accordingly, spending more time on difficult situations. 12. The data for various chess programs and their ratings can be found in a paper by Hans Moravec at www.transhumanist.com/volumel/moravec
.htm. This source has Deep Thought with a rating of 2,51 o in 1989. Other sources have listed Deep Thought with a rating of 2,745 at the time. Despite the discrepancy, the general upward trend in the rating of computer chess programs is accepted universally. Determining the rating of a computer program is problematic if it doesn't compete in sanctioned tournaments or match play. Defeating a single opponent with a high rating is impressive, but it doesn't give sufficient substance to validate a rating. It's best NOTES TO PAGES 2 3 - - 2 7
327
to view the ratings for programs as estimates and to use caution when treating specific instances. I 3" Recent challengers for this description include Vladimir Kramnik, one of Kasparov's former students, who defeated Kasparov two wins to none with thirteen draws in a head-to-head match completed November 3, 2ooo, in London, England. Kramnik became the unofficial world chess champion by winning the match. The title is informal owing to a prior split between Kasparov and the FIDE, the official world chess federation. The official FIDE 2ooo championship was held from November 2 7 to December 16, 2ooo, in N e w Dehli, India, and in Tehran, Iran. The winner was Viswanathan Anand of India, who had long been regarded as the world's second-best player behind Kasparov. 14" See um~. research,ibm.corn/deepblue/meet/html/d. 1.4. html. i5. See wumzresearch.ibm.com/deepblue/meet/html/d.4.3.a.html. Murray Campbell and Joel Benjamin provided chess expertise to the Deep Blue team in adjusting Deep Blue's evaluation function. As I mentioned in the text, perhaps Deep Blue's blazing hardware contributed more to its success than did the human expertise in its evaluation function. 16. See umav.sciam.com/explorations/o42197chess/o42197hsu.html. See ws.cnet.com/news/o-1 oo3-20o-318867.html?st.nefd.mdh. See ummz usatoday.com/sports /other/chesso 1 .htm. See umav.research.ibm.com/deepblue/meet/html/d.1.6.html. See www.research.ibm.com/deepblue/meet/html/d.3.2.html. See wunv.sciam.com/explorations/o42197chess/o42197hsu.html. See www.research.ibm.com/deepblue/meet/html/d.3.html. 23. The match itself, too, wasn't all that convincing. IBM won by only two games to one, with three draws. It wasn't as if Deep Blue had used Kasparov as a dishrag to clean the chessboard. Kasparov used an unorthodox style, trying to gain an advantage over 17. 18. i9. 20. 2 I. 22.
328
NOTES TO PAGES 2 9 - - 3 2
Deep Blue by playing "bizarre openings" (see Bruce Pandolfini, Kasparov and Deep Blue: The Historic Chess Match Between Man and Machine [New York: Fireside, 1997], P. 16 I). Perhaps this deviation from his normal approach worked to his disadvantage in the end, since he had to cogitate over lines of play that were ultimately unfamiliar. The sixth and final game was an aberration, poorly played. Pandolfini, a national master in U.S. chess competition, described Kasparov's chances as "bleak" just before he resigned the fateful sixth game, writing that this match was the "worst beating world-champion Kasparov has ever suffered" (pp. I44, 159). Deep Blue did clear the hurdle, but not by much, and Kasparov seemed to give the program a bit of a leg up. What's more, Deep Blue didn't clear the hurdle with elegance, but rather by brute force. In fairness, IBM's Web page wumJ.research.ibm.com/know /blue.html offers that "Deep Blue used more than brute computing force. It combined the power of its processors and a highly refined evaluation function that captured human grandmaster chess knowledge--including Kasparov's." Certainl;fi that's true, but Deep Blue evaluated almost one hundred million times more positions than Kasparov each second. IBM also says that Deep Blue applied "brute force aplenty" at wumJ.research.ibm.com /deepblue /meet / html /d. 3 .3 a. html #ai. IBM's effort assumed the porportions of a Manhattan Project, so much so, in fact, that it was easy to view the contest in terms of David and Goliath. In case there's any doubt in your mind, IBM was Goliath. I was struck by this view of Garry Kasparov, a lone man against state-of-the-art machines, superfast programs, and a team of computer hardware experts coupled with a grand master chess player. It takes a lot to make Garry Kasparov seem like a sympathetic figure. Winning is winning, but if ever the term winning ugly applied, to me this was it. I had the same feeling when, in 1995, the scientific community finally N O T E TO PAGE 3 2
329
accepted that Fermat's Last T h e o r e m had been proved by Andrew Wiles in 150 pages of mathematics (see wumJ.pbs.org/wgbh/nova/proof/wiles.html). OK, fair enough, as far as a mathematical achievement goes, but what a letdown, given that Fermat had written, "I have discovered a truly remarkable proof which this margin is too small to contain." Yes, 15 o pages don't quite fit in a margin, but if Fermat did in fact have a proof, it wouldn't have required a tome. It makes you wonder if maybe there isn't some easier way of doing it. (Many mathematicians who've worked on Fermat's Last T h e o r e m have suggested that Fermat's "proof" must have been incorrect, since Fermat never mentioned it in subsequent work. Also see
wu,w.pbs, org/wgbh /nova ~proof~wiles.html.) 24. The rules stipulated that each side would be allotted forty moves in the first two hours, followed by an additional twenty moves in the next hour, with all remaining moves to be completed in thirty minutes. Time from one segment could be carried forward to the next. 25. See unvw.research.ibm.com/deepblue/meet/html/d.3.html. 26. You can imagine the discussion between the software engineer and the grand master: "OK, but what rules do you use to search for the best move?" "What do you mean search for the best move?! I just know it!" 27. To their credit, at least IBM is up front about Deep Blue's limited "intellectual" capabilities. Their own Web pages (wunv.research.ibm.com
/deepblue /meet /html /d.3 .2 .html) state: "Artificial Intelligence" is more successful in science fiction than it is here on earth, and you don't have to be Isaac Asimov to know why it's hard to design a machine to mimic a process we don't understand very well to begin with. H o w we think is a question without an answer. Deep Blue could never be a HAL-2ooo if it 330
N O T E S TO PAGES
33-34
tried. N o r would it occur to Deep Blue to " t r y . " . . . Deep Blue's strengths are the strengths of a machine. It has more chess information to work with than most computers and all but a few chess masters. It never forgets or gets distracted. IBM goes on to boast: Solutions to problems far beyond the chessboard are closer than ever before as a result of the research that has gone into the Deep Blue system. And who knows? As more possibilities open before us, some of those science fiction predictions may come true. But it won't be because of any artificial intelligence. It will be because systems like Deep Blue helped us make better use of the real thing. The self-congratulatory pat on the back is forgivable, even well deserved, but it's difficult to forgive IBM for actually dismissing the potential of an artificially intelligent system that could discover for itself how to solve real challenging problems! See uavw. research, ibm/corn/deepblue/meet/html/d.3.3 ahtml#ai. Also, note that in Arthur C. Clarke's 2 o o 1 : A Space Odyssey, HAL was a HAL-9ooo, not a HAL-zooo, as indicated on the Web page. IBM contradicts this on a different Web page: "Deep Blue had, for a moment, used its incredible processing power and the accumulated knowledge of computer scientists and chess champions to engage in something resembling 'thought.'" See uavw.research.ibm.com/know/blue.html. IBM's comment that Deep Blue is not a learning system likely refers to the final version of the program. As noted here, the IBM team did use a form of machine learning to adjust the weights of the terms in Deep Blue's evaluation function, but this form of learning requires examples taken from games played by grand masters. N O T E TO PAGE 3 4
334
28. Not that recapitulating what we already know is easy. Murray Campbell of the Deep Blue team said: "There are sometimes things that a grandmaster knows that are difficult to put into a computer program. We are working hard to get to know as much about chess as possible." See
wun~. research,ibm.corn/deepblue/meet/html/d.3, html. 29. It's funny, but these two responses are in direct contradiction, for if every key decision were already made there would be no point in looking further to optimize those decisions. Furthermore, the thought that every conceivable nuclear crisis could be tested out by computer within our lifetimes is completely implausible.
Chapter 3: Building an Artificial Brain i. Warren S. McCulloch and Walter Pitts, 'Tk Logical Calculus of the Ideas Immanent in Nervous Activity," Bulletin of Mathematical Biophysics 5 (I943): I 15-33. 2. Frank Rosenblatt, "The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain," Psychological Review 65 (1958): 386--408.
3. Here's a detailed description of how the multilayered neural network works. First we provide the input data, which enter at the input neurons of the artificial neural network (shown to the left in figure 6). These input neurons act as the neural network's sensors. Each input neuron connects to neurons in an intermediate layer. Those neurons in turn connect to the output neurons, where we see the network's response to the input. We call the neurons in the intermediate layer "hidden neurons," because the network's behavior at this level remains hidden from our view. We only get to observe what emerges from the output neurons. The figure shows a general case in which there are many inputs, rep12132
NOTES TO PAGES 34---40
resented by the variable n, and many hidden neurons, represented by the variable k. The symbol u, represents the weight that connects neurons in the input layer to neurons in the hidden layer. For example, the symbol Wli represents the weight on the connection between the first hidden neuron and the first input neuron. Likewise, the symbol w~2 represents the weight on the connection between the first hidden neuron and the second input neuron. There's a weight for every connection that links any two neurons (even if no symbol is shown explicitly in the figure). The weights between the input and hidden neurons amplify or reduce the incoming signals by variable amounts. (Each weight can assume a different value.) Each hidden neuron adds up its incoming signal strengths, weighted by each signal's associated connection strength. Each hidden neuron also has its own threshold for firing. A hidden neuron fires if the weighted sum of its inputs exceeds that threshold. You might think about the hidden neurons as weighting the incoming information in terms of its relative importance. This analogy is intuitive, but be careful not to go too far with it. There are potentially many hidden neurons, and each one can instigate its own "interpretation" of the relative importance of the available information. Just as the input neurons are connected to the hidden neurons, the hidden neurons are connected to the output neurons in the second layer. (Some researchers describe this type of neural network as a three-layer network, in which the input neurons make up the first layer, even though they don't really perform any computation.) Here, the variable O represents the number of outputs. The output neurons weight the incoming data (from the hidden neurons) by their associated connection strengths and in turn fire or fail to fire depending on whether their internal thresholds are exceeded. You can think of a neural network as something of a telephone exchange. Messages in the form of input data are passed along from neuron NOTE TO PAGE 4 0
333
to neuron, and each neuron decides whether to continue forwarding the message based on the strength of the message. The outputs of the neural network can be used to control a robot's arms or legs, identify parts of a face, predict future values of the stock market, or numerous other possibilities, depending on the inputs. One such possibility, which we'll see later, involves assessing the value of alternative positions in a game of checkers, where the number, type, and position of pieces on the board constitute inputs, and the neural network outputs the associated worth of that position. 4. Under this definition of behavior, no attention is paid to the mechanism or medium that translates inputs into outputs. In the context of adaptive behavior, a system would seek to adjust its outputs for various inputs to meet goals in a range of environments. 5. Note that as you move in the direction of the steepest ascent, you might vary your position on two axes simultaneously. For example, if the axes were north-south and east-west, you might walk in a northeasterly direction. That's the equivalent of adjusting two knobs on a Sinister television, or multiple weights of a neural network, at the same time. 6. If you wear glasses, you'll note that your optometrist performs a timilar procedure when looking for the best prescription for you. 7. There are different formulas for computing functions that have a general sigmoid shape. The specifics of those formulas really aren't important here. The pertinent point is that whereas before we generated steep cliffs on our landscape because neurons were either "on" or "off," now with sigrnoid neurons each neuron is never completely "on" or "off." Instead, it's always somewhere in between. 8. Evolution can be viewed as progressing on an "adaptive landscape" (a landscape that measures how well adapted each individual is). The population geneticist Sewall Wright offered this term in 1932. As the genetic 334
NOTES TO PAGES
45-56
makeup of individuals in a species changes, each individual's fitness changes too. Selection then eliminates those individuals (and eventually entire species) that cannot compete beyond some basic level of quality. The concept of an adaptive landscape is very similar to the concept of a function that measures the quality of a complex device, like a Sinister Electronics television or an artificial neural network. In nature, the adaptive landscape measures the fitness of alternative individuals in a population, based in part on their genetic composition. Likewise, we can measure the fitness of the picture on a television screen or the fitness of the output of a neural network. A note of caution should be offered here, because the word fitness has many possible definitions within biology. Most often it refers to the probability of survival over a period of time, the expected number of offspring, or the expected number of offspring that survive to reproduce. The main reason that different definitions of fitness exist is that there are always exceptions in nature that violate c o m m o n sense under one definition or another. Fitness can be applied to individuals as well as to species and to individual genes. It should not be surprising that with so many possible definitions of fitness, the definitions are often quite controversial. Nevertheless, for the argument here, the specific definition of fitness is not relevant. Any definition that can be applied consistently is sufficient. It's also important when making analogies to be up front about the limitations of those analogies. This is particularly true with scientific analogies, because it's too easy to accept that a particular analogy is not an analogy but rather the truth. With a bit of good arguing, even poor analogies can seem apt. With regard to adaptive landscapes then, what are the limitations? First, it's difficult to measure the behavioral quality of an individual or of an entire species. We can often only infer that quality or assess it post hoc. In the most basic case, the individual survives or dies. The individual NOTE TO PAGE 56
33S
reproduces or fails to reproduce. The species continues or goes extinct. This dynamic need not, however, be such a dichotomy of death and life. An individual can be varying degrees of sick or healthy and can give rise to zero, one, or more offspring. A species can increase or shrink in number, cover a wider or smaller area, and so forth. Based on these types of behavioral traits, there are many ways to ascribe a value to the quality of individuals and species in nature. Second, there is no single static adaptive landscape. The landscape is always changing. It changes because the measure of how well suited a behavioral response is depends on the particular environment, which includes all the other organisms that reside in that environment. What passes as a sufficient solution to the current problem of survival may not meet with success tomorrow. A well-known and critically important example of this phenomenon is found in our assault on bacterial infections. As we develop new antibiotics to combat strains of bacteria, the bacteria evolve resistance to those antibiotics through random variation and natural selection. Those bacteria with sufficient resistance to survive a treatment of antibiotics pass along genes that are associated with that resistance to their offspring. The bacteria rather quickly invent countermeasures to defeat our efforts. Evolution doesn't require millions of years, just several generations, which for bacteria are often measured in minutes, not millennia. Third, evolution isn't searching for a peak on an adaptive landscape, whether or not that landscape is changing in time. Evolution doesn't care about finding the "best" species, if such an entity could even be defined. Evolution is simply a consequence of physical laws acting on and within populations and species. It's the result of reproduction, random variation with inheritance, competition, and selection in a finite environment. Physical laws don't "care" who wins or loses in the everyday struggle 336
NOTE TO PAGE 56
for life. Nature was no more pleased to create the dinosaurs than it was to wipe them out. Nevertheless, we have the intuition that individuals competing for survival based on their fitness measured by an adaptive landscape would tend to climb the peaks of that landscape, because the lesser-valued individuals would die off and the remaining survivors would pass along their genes to their offspring. It's not that evolution is searching actively for these peaks, it's just the result of evolution culling out the losers in life's lottery and making random variations of the winners. Even in the case of an individual who fails to reproduce, that individual may serve to benefit other individuals of the same species and enhance their survival. For example, female worker ants do not have o ~ p r i n g . Only the queen has offspring with male drones. Nevertheless, the female workers are critical to the survival of the colony. Other arguments can be made from the perspective of individuals assisting their kin, with the hypothesis that individuals will tend to aid those who are more likely to share their genes. These nuances are not necessary for the argument offered here, however. 9. Recall that a gene is a region of D N A that encodes for a protein, which often comprises three hundred to four hundred amino acids (see Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raft, Keith Roberts, and James D. Watson, MolecularBiology of tke Cell, 2d ed. [New York: Garland Publishing, ~989], p. 486). The function of a protein depends largely on its eventual three-dimensional conformation and the other molecules to which it might bind. In this way, the analogy to the individual cards in a hand of poker is apt. I o. The complexity of natural landscapes also stems from the one-tomany and m a w - t o - o n e mappings between genes and traits. A single gene can affect many traits, and a single trait can be affected by many genes. The situation is depicted in figure I ~. (This scenario is similar to the effects in NOTES TO PAGE 5 9
337
a neural network, in which a single neuron can influence many neurons and can itself be influenced by many other neurons.) These effects are very c o m m o n in nature. For instance, in mice, at least five primary genes interact to determine fur color. See library.thinkquest. org/18258/noframes/geneint2.htm. In cats, those with all-white coats and
yellow, blue, or "odd-eye" combinations (yellow and blue) are often deaf. The deafness is more common in blue-eyed cats and tends to occur on the side of the blue eye in odd-eyed cats. See wunv.southwestern.edu~-waittd /cat-pages/pleio.html. In humans, sickle-cell anemia is primarily a genetic
disorder of malformed blood cells, but it's also accompanied by weakness, pain, and skin ulcers. See www.socialscience, eku. edu /Ant /sA rAGE/PhysicalAN T / 201-PL EIOTR 0 eI SM.H TML . I I. If you're wearing glasses or contact lenses, or if you've undergone an appendectomy or perhaps endured a life-threatening condition such as breast or lung cancer, then you have firsthand experience with your own "suboptimality." Individuals in nature compete only with one another, not with any presumed "right answer" to the problems of survival that they face. Chapter 4: Evolutionary Computation
i. The number of distinctly different tours of ten cities is actually only I81,440 for two reasons: (I) each complete tour is a circuit and it doesn't matter in which direction you traverse the circuit, dividing the total in half, and (2) any of the ten cities could serve as the starting city without affecting the tour, reducing the total by a factor of ten. 2. A few attempts at using supercomputers have resulted perfect solutions to specific problems comprising over ten thousand cities, but surely there are better uses for supercomputers. 3. David B. Fogel, "What Is Evolutionary Computation?" I E E E 338
NOTES TO PAGES 6 4 - - 7 2
Spectrum (February 2ooo): 26-32. The article was the cover story of this issue. 4. Recent work in evolutionary computation has shown that for an optimization algorithm to be successful, whether or not it is an evolutionary algorithm, there must be, in some sense, an "informational match" between the algorithm and the problem. More precisely, if you have an algorithm that searches for solutions to a problem and never revisits any solution, then that algorithm will perform just as well (or poorly) as a blind random search when applied across all possible problems. This is known as the "no free lunch theorem," which is detailed in David H. Wolpert and William G. Macready, "No Free Lunch Theorems for Optimization," IEEE Transactions in Evolutionary Computation I, no. I (I 997): 67-82. Tom English has also offered important related work, published in the Proceedings of the 2 o o 1 Congress on Evolutionary Computation. Thus even though the evolutionary algorithm I offered for the traveling salesman problem contained no explicit information about the problem, we can judge from its good performance that it did contain implicit information about the problem. Describing just what comprises that informational match, for any particular problem, is a current challenge in computer science. 5. For example, in the United States, pioneering efforts were made by Nils Barricelli (1953), George Friedman (1956), Richard Friedberg (1958), Hans Bremermann (I958), Lawrence Fogel (I96O), John Holland (I962), Michael Conrad (I 969), and others. In Germany, Ingo Rechenberg and HansPaul Schwefel collaborated in 1964. In the United Kingdom, George Box began in 1953, and in Australia, Alex Fraser first published in 1957. Details and reprints of original papers are found in Evolutionary Computation: The Fossil Record, David B. Fogel, ed., (Piscataway, NJ: IEEE Press, 1998). 6. He pursued this idea through the culmination of his doctorate degree in engineering at UCLA in 1964. His dissertation was titled On the N O T E S TO PAGES 7 4 - 7 9
3a9
Organization of Intellect. My father demonstrated a series of successful experiments in which an IBM 7o9o computer evolved its own programs for predicting different time series. (A time series is an indexed sequence of symbols or numbers. Typical examples include daily stock prices or IQ tests that require you to predict the next number in a sequence, such as I o I oo I ooo I .) The computer was often competitive with or exceeded human subjects at predicting the same data. Two years later, collaborating with colleagues A1 Owens and Jack Walsh, w h o m he met earlier at General Dynamics in San Diego, he authored Artificial Intelligence Through Simulated Evolution, which was published by John Wiley and Sons. It was the first book in the field of evolutionary computation. The three of them left General Dynamics and formed their own company, Decision Science, Inc., relying on the evolutionary programming technique as a novel means for solving problems. It was the first company dedicated to putting evolution to work at engineering. Their company was later acquired by Titan Systems, Inc. (which later became the current Titan Corporation) in i982. 7. I can recall the first time my father told me about the concept of simulating evolution on a computer. We were waiting in the airport at Colorado Springs, Colorado, after I had visited the Air Force Academy in early 198 I. I wanted to be a fighter pilot and had been accepted to the academy. My father wisely suggested that we go see what the place was like before I signed up. I recall the beauty of the mountains and the campus, but I also recall being disappointed in the rules and regulations that lay ahead of me. I wanted to fly, not be pushed around by upperclassmen asking me questions like how many days there were before they graduated! While we waited for my flight back to San Diego, connecting through Denver, my father asked me how I would design the best fly. Yes, that's right, the insect. This was sort of a strange application, but still, some flies 340
N O T E S TO PAGE 7 9
are better at being flies than others. He took me through the process of creating a fly that had legs that were the right length, wings that were the right shape, eyes that were optimally designed, and so forth, all by evolution. I was of course familiar with evolution as the explanation of the diversity of life on the planet, but I hadn't considered it as a design principle or something that could be captured in a computer. My father took a different flight to Washington, D.C., to solicit more business for his company. I flew back home alone, thinking about computer programs that could evolve. 8. In contrast to the evolutionary approach my father suggested, the typical approach to AI in the early ~98os centered on expert or other knowledge-based systems. The hope was that by developing a program that mimicked an expert's knowledge, fact by fact, the program could be as smart as the expert. Unfortunately, there were numerous problems with this approach. One was that people don't always reason using rules. In fact, we pay experts in many fields a lot of money to know when to break the rules. We want pilots to know when not to deploy the flaps on final approach. We want medical doctors to know when not to suggest surgery. W h e n an expert reverts to the rules he's learned, his expertise may in truth degrade to mere competence, because it's often difficult to describe our expert behaviors in rules. What's more, a database of rules to follow can only be as good as the number of situations that it covers. For most real-world problems it's simply impossible to imagine all the potential situations that you might face. A programmer can write literally thousands of rules to cover everything he or she anticipates and still leave out critical possibilities. (Imagine how many rules it would take to navigate and safely land a commercial airliner at a busy airport in N e w York City or Los Angeles without omitting any unforeseen eventuality.) N O T E S TO PAGE 7 9
341
W h e n faced with some new situation that lies outside the domain of knowledge, the expert system is left to make a "best guess" at what to do. That wouldn't sound very comforting if you were on a plane piloted by an expert system and the programmer had failed to consider the possibility that the plane would run out of fuel while in a holding pattern waiting for a thunderstorm to pass. These di~culties are c o m m o n knowledge now, yet great efforts have been made to engineer just such computer programs rule by rule, with each condition having a corresponding action, each response programmed in bit by bit. N o doubt some of these efforts have generated very successful engineering projects. They are often very limited in scope but nevertheless can be useful. The question of whether such knowledge-based programs constitute a contribution to AI depends on the definition adopted. Under the definition that I proposed in chapter I, in large measure, these programs do not exemplify intelligent behavior, because they employ a static knowledge base. It is possible, however, to construct AI programs that vary the rules they use (or other data structures) to adapt their behavior depending on the goals and environment they face. Relying on rules is no sin, but if the rules are static, then any discussion of the "intelligence" embodied in those rules can be set aside. The rules may embody knowledge, but not intelligence. 9. David B. Fogel, "Evolving Artificial Intelligence" (Ph.D. diss., University of California, San Diego, 1992).
Chapter 5: Blue Hawaii I. Why the army? Breast cancer had become a political hot potato in Washington, D.C. Senator Patrick Leahy of Vermont, among others, favored devoting more money to breast cancer research, but this money was to be redirected from the Department of Defense. After all, the cold war 342
NOTES TO PAGES 8 2 - - 8 7
was over, and Congress instructed the military to search for "dual-use" technologies that could be applied in both the defense and the civilian sectors. The U.S. Army Medical Research and Materiel Command, in Fort Detrick, Maryland, secured the responsibility for administering the breast cancer program. Close to one billion dollars have been disseminated under research grants since the program's inception. 2. This is a typical approach. If you train a neural network to learn patterns using all the available data, you won't know how well it will do on data that it hasn't seen before. A better way to evaluate a neural network's performance in an application like this is to divide the data into two sets. O n e is used for training the neural network, and the other is used to evaluate the neural network. In this way, ultimately, the neural network is evaluated based on its ability to recognize patterns in new data. The protocol is appropriate whether or not an evolutionary algorithm is used for training. Other, more sophisticated, procedures can be used as well (e.g., crossvalidation), but a discussion of those techniques is beyond our scope here. 3. Kasparov would go on to play more exhibitions and appear in television commercials. I never thought I'd see a champion like Kasparov capitalizing on his fame by pushing Pepsi. Deep Blue would go on to make a guest appearance in the cartoon The Simpsons. Surely it's di~cult to remain serious once you inject H o m e r Simpson into the discussion. 4. I wish I could tell you that I uttered some profound remark on this occasion, but I didn't. I am happy to say that I didn't lose control of the car and swerve off the road either. To me, Deep Blue's win wasn't the surprise that it was to the general public (per the polling data offered in chapter 2). It was the predictable culmination of a forty-year effort. With computer speeds increasing rapidly, the outcome was inevitable. 5. The journal covers all aspects of research in simulating evolution on NOTES TO PAGES 8 9 - - 9 3
343
computers or other devices and has a circulation of several thousand IEEE members. The IEEE is the Institute for Electrical and Electronic Engineers, the largest professional organization of engineers, with more than 350,000 members worldwide. 6. In addition to the new IEEE Transactions on Evolutionary Computation, I was also co-editor-in-chief of a new publication from the Institute of Physics titled the Handbook of Evolutionary Computation. This was a compilation of more than five hundred thousand words covering every aspect of the field, with contributions from the leading scientists in our community. My two co-editors, Professor Zbigniew Michalewicz, from the University of North Carolina at Charlotte, and Professor Thomas B~ick, from the University of D o r t m u n d in Germany, were excellent collaborators, but still there was a great deal of writing and editing on this project, which had taken more than three years to complete. 7. A mathematical filter is used to change the frequency spectrum of a signal. One of the most c o m m o n uses of such filters is found in a graphic equalizer, which allows you to make a "low-pass filter" by turning up the gain on the low frequency bands and reducing the gains on the high frequency bands.
Chapter 6: Checkers I. The game can also end in other ways: (I) a player can resign, (2) a draw may be declared when no advancement in position is made in forty moves by a player who holds an advantage, subject to the discretion of an external third party, and if in match play, or (3) a player can be forced to resign if he or she runs out of time, which is usually limited to sixty minutes for the first thirty moves, with an additional sixty minutes being allotted for the next thirty moves, and so forth.
3,4.4
N O T E S TO PAGES
95-98
2. Jonathan Schaeffer, One Jump Ahead: Challenging Human Supremacy in Checkers (New York: Springer, 1996), p. 43. 3. O n e exception was a paper published at the I997 IEEE International Conference on Evolutionary Computation that used an evolutionary algorithm to learn how to weight features offered by people. See Kenneth Chisholm and Peter Bradbeer, "Machine Learning Using a Genetic Algorithm to Optimise a Draughts Program Board Evaluation Function," in Proceedings of the 1997 IEEE International Conference on Evolutionary Computation (Piscataway, NJ: IEEE Press, I997), pp. 715-2o. 4. Note that I put the word merely in quotes. The effort that's required to make a world-champion checkers program, even when working from everything we know, is significant and can easily take several years and thousands of hours. 5. This is shorthand. The neural networks are used to evaluate alternative positions in a game of checkers; they themselves don't play the game. They are coupled with other algorithmic routines, like minimax, that search ahead and also choose which move to make. 6. In the early and rnid-199os, I often started lectures by playing a game of what looked to be tic-tac-toe with someone in the audience. I allowed my opponent to move first on a three-by-three grid but didn't offer what the objective of the game might be. The objective was quite simple: A win was achieved when a player placed three markers on the outer squares of the grid. That is, the player moving first can force a win simply by avoiding the middle square. After we completed two or three games, I'd ask if anyone in the audience had any idea what the object of the game might be. The usual response was laughter. I carried out the experiment for as many as ten games, but no one ever offered me the correct objective, despite its obvious simplicity.
N O T E S T O PAGES 9 9 - - 1 0 9
:S4S
7. Marvin Minsky, "Steps Toward Artificial Intelligence," Proceedings of the IRE 49 (1961): 8-3o.
Chapter 7: Chinook i. In OneJump Ahead: Challenging Human Supremacy in Checkers ([New York: Springer, 1996], pp. 99-1oo), Jonathan Schaeffer argued for including all the knowledge that we have about how to play, asking, "Why should a computer have to discover all this [knowledge] on its own?" From the perspective of designing the world's best checkers player, not including human knowledge places the program at a significant disadvantage. 2. Schaeffer, One Jump Ahead, p. 59. 3. Schaeffer, One Jump Ahead, pp. 63-65. 4. Schaeffer, One Jump Ahead, pp. I 13-14. 5- Fortman's Basic Checkers is a seven-volume standard treatise that all aspiring players study. When I first began exploring the idea of evolving a checkers player, I looked for books like this in my local bookstores and couldn't find them. When I later became aware of Basic Checkers and a few other publications, I avoided them purposely so as not to unconsciously bias our design to capture some knowledge that I might have gleaned from reading them. I wanted to ensure that I remained a novice at checkers so that if our program evolved to become a superior player it would be impossible to claim that I was using my own knowledge to engineer such a result. Kumar, as well, had no prior knowledge of checkers tactics and made no attempt to learn them during our effort. 6. The details are more complicated than this, as related to me by Jonathan Schaeffer in personal communication. Tournament matches in checkers often use three-move openings, in which the opening combination of moves is chosen at random. The competitors take turns playing each side. Some of the possible openings are lopsided, favoring one or the 346
N O T E S TO PAGES I I I - - I 2 0
other side. Before mid-1993 and after early 1995, Chinook relied generally on an opening book for the moves on the weaker side, while being free to "innovate" on the stronger side. Schaeffer also used Chinook to analyze possible errors in opening books that might be used to trap players. 7. In fact, one opening (known as the "white doctor") requires the player moving first to sacrifice a checker. There's been extensive analysis on this opening, but no one has yet found a way to survive the opening without going down a piece. 8. However, if you're facing Chinook and you stick to known lines of play, then you relegate yourself to repeating a sequence of moves that you and Chinook both know by heart. It would sort of be like watching old reruns of Gilligan's Island: Unless you change the sequence of events, the castaways will never make it off the island. Knowing when and how to go against the grand masters is key to superior performance. 9. Schaeffer, One lump Ahead, p. 156. Note that 0.17 is the key value because 1.34 is 1.0 + 2 x 0.17, 1.17 is 1.0 + 1 x 0.17, and 0.83 is 1.0 - 1 x 0.17. IO. Schaeffer, One lump Ahead, pp. 227-29. Just before taking on the "The Terrible Tinsley" in head-to-head competition, Schaeffer competed against Chinook in the 1992 U.S. Championship. Chinook finished in sixth place, behind the winner Ron King, Elbert Lowder, the computer program Checkers 3.0 (written by Gil Dodgen), Richard Hallet, and Don Lafferty. By this time, Chinook's endgame database included all the positions with seven pieces and a significant portion of the positions with eight pieces. As documented on page 282 of One Jump Ahead, Chinook was oflficially tied for points with the four other top finishers behind King, but was adjudicated to sixth place based on a tie-breaking procedure. The procedure uses "honor points," whereby competitors earn points based on their opponents' performances. There are some anomalies to this sort of NOTES TO PAGES I 2 0 - - I 2 3
347
tie-breaking routine, but officially Chinook did finish behind Checkers 3 .o in the tournament. (Early in the contest, Schaeffer uncovered a problem in Chinook where it was discarding a critical position as "irrelevant" during its search owing to a parameter that was set inappropriately. The final results might well have been different had this problem been remedied earlier.) By finishing first behind Tinsley in the U.S. Championship of 1990, Chinook had seemed to earn the right to compete for the world title, but was not allowed to compete for that honor. Thus Schaeffer and Tinsley agreed to compete for a new man-machine championship, which eventually earned ACF and EDA recognition. Before the Tinsley match, Schaeffer competed Chinook against Colossus, perhaps Chinook's closest computer competitor. Chinook defeated Colossus during a series of more than fifty games (Jonathan Schaeffer, personal communication, 2oo I). I I. Schaeffer, One.JumpAhead, p. 25 o. At this time, Chinook was rated sixty points ahead of D o n Lafferty, the second-ranked human checkers player. 12. Personal communication with Jonathan Schaeffer, University of Alberta, April 2ooI. 13. The match was halted after thirty-nine games, because the final outcome could not have been altered by the fortieth game (Schaeffer, One
Jump Ahead, p. 333). I4. Schaeffer indicated that this improved machine ran Chinook at a rate that's comparable to a I GHz PC. The reason that the speedup does not correlate linearly with the increased number of processors, as well as their increased speed, is twofold: The evaluation function was changed and slowed the program down by a factor of two, and the parallel algorithm used for handling multiple processors did not scale well beyond eight processors. 348
N O T E S TO PAGE I 2 5
15-Derek Oldbury earned the world championship by defeating Richard Hallett in Novemeber 1992 after Tinsley resigned in protest. Thus Chinook actually defeated the world champion in preparation for the match with Tinsley, but this defeat did not take place within a sanctioned competition. Oldbury died shortly after the match with Chinook, in July 1994. 16. Schaeffer, One Jump Ahead, p. 442. 17. Schaeffer published Tinsley's entire tournament career record in One Jump Ahead 9From 1951 to 1994, Tinsley was undefeated in match play, losing only four games. 18. Schaeffer, One Jump Ahead, p. 447. 19. Recall that the 199o version of Chinook was good enough to finish second in the U.S. Championships and thereby earn the right to challenge for the world championship (even though this challenge was later disallowed by the ACF and EDA because Chinook was not human). 2o. The earliest posted victory over Chinook on its website was recorded on December 22, 1995, by John Gibson at the novice level. 2 I. Chinook is the first computer program to win a human world championship, a fact recognized by the Guiness Book of WorldRecords. See umav.cs
9ualberta,ca/-chinook. Chapter 8: Samuel's Learning Machine i. The terms Alpha and Beta here have no relation to the terms Alpha and Beta as used in alpha-beta pruning. 2. Proper English would dictate that this be described as "two plies." Occasionally, the term ply is used in this manner, but the shorthand version of two ply is often preferred. 3. Jonathan Schaeffer, One Jump Ahead: Challenging Human Supremacy in Checkers (New York: Springer, 1996), p. 93. 4. Looking back to the actual closing prices of IBM for the period NOTES TO PAGES 1 2 6 - 1 4 0
3,q.D
before and after February 24, it's difficult to really pinpoint such a fifteenpoint jump. Figure 73 shows the daily close for IBM from February I to March 19, 1956. Yes, the price of the stock did increase significantly, but so did prices in the rest of the stock market. This was a time when the bulls were running on Wall Street. The number of shares traded daily on the N e w York Stock Exchange was at an all-time high. Maybe some of the gain was attributable to Samuel's program, but fifteen points seems excessive. This exaggeration was a harbinger of others to come. 5. Arthur L. Samuel, "Some Studies in Machine Learning Using the Game of Checkers. I I - - R e c e n t Progress," IBM.Journal of Research and Development II (I967): 6 o i - I 7 . 6. The reference to 7o90 indicates the model of IBM computer Samuel used. 7. See Computers and Thought, edited by Edward A. Feigenbaum and Julian Feldman (New York: McGraw-Hill, I963), p. IO3. 8. Schaeffer, One.Jump Ahead, p. 94. Schaeffer went further, offering that no recognized master-level player had ever won that state championship and that Nealey apparently did not play in the U.S. Championships, so his reputation as a "foremost player" seems to have been earned more by word of mouth than by record. Stanford University, where Samuel was a professor from 1966 until his death in 199o, hosts Web pages that identify Nealey as having been the fourth-ranked player in the nation. (See uavwdb
.stanford.edu/pub/voy/museum/samuel.html.) I've been unable, however, to verify any data that would support this claim. 9. A subsequent rematch was held by mail, so this possibility is also a consideration, but there is nothing in the reprint of Samuel's 1959 paper in Computerand Thought suggesting that the original game took more than one day, and there are no remarks suggesting that the two players were in separate locations. 350
NOTES TO PAGES I 4 I - - I 4 2
450 440 430 a. 420 .c_
o
February
February
1, 1956
24, 1956 /
~
March 19,
410
m
400 390 380
. . . . . . . . .
' . . . . . . . . . . . . . . . . . . . . . . . .
Trading Days
FIGU Rs 73
IBM's closing stock price from February I, I 9 5 6 , to March I9, I 9 5 6 . Samuel's program debuted on television on February 24, I 9 5 6 . In contrast to the legend that IBM's stock rose fifteen points following this initial demonstration, the sequence of closing prices shows that the rise after the demonstration had the same slope as the rise that led up to the demonstration. It's easy to conclude that Samuel's television appearance had no effect on the stock's price at all.
I O.
All the moves in the contest b e t w e e n Samuel's p r o g r a m and
N e a l e y are p r i n t e d below. T h e r e ' s a curious a n o m a l y associated w i t h these moves. As p r i n t e d in m y 1963 copy o f
Computer and Thought, the listing stops Computers and Thought,
after m o v e fifteen. M y father has a n o t h e r copy o f
NOTES TO PAGE 143
351
and the moves of the Samuel-Nealey game end after the fifteenth move in his copy too. The sixteenth move is the critical move in which Nealey blunders and effectively loses the game. Neither Nealey's error nor any of the subsequent moves are reprinted in our copies of Computers and
Thought. What's more, if you try to play this game by hand, you quickly find that Samuel's listed double jump on move ten is a typographical error. It's actually a triple jump, I-I0-I9-26. I don't have any explanation for the apparent lack of attention to detail that cropped up here or for the missing moves, for that matter. I checked in the U C S D library and found a third copy of Computers and Thought from I963. It too omits all the moves beyond the fifteenth. Yet Jonathan Schaeffer informed me by email that his 1963 copy contains the complete set of moves and does not seem to be a second printing. I wonder if the copies I have and the one at U C S D will someday be collector's items, much like stamps that have part of the picture printed upside down. As a matter of interest, the 1995 second edition of Computers and Thought does print the entire set of moves and Samuel's annotations. The game occurred on July I2, I962, in Yorktown, N e w York.
352
Red (Samuel's Program)
White (Nealy)
I: I I - - I 5
I" 2 3 - I 9
2: 8 - I I
2:22-17
3:4-8
3:I7-I3
4:15--I 8
4:24--20
5:9-I4
5:26-23
6: I O - I 5
6: I9--IO
NOTE TO PAGE I43
7:6-15
7:28-24
8:I5-19
8:24-15
9:5-9
9:I3-6
I o: I - I 9-26
I o: 31-22-15
II: 11-18
II: 30-26
I2:8--11
I2:25--22
13:18-25
13:29-22
14: I I - I 5
14:27-23
I5:I5-19
I5:23-16
16:12-19
16:32-27
17:19-24
17:27-23
18:24--27
18:22--18
19:27--31
19:18--9
2o: 31--22
20" 9--5
21:22--26
21:23--19
22:26--22
22:19--16
23:22-18
23:21-17
24:18-23
24:17-13
25:2-6
25:16-11
26:7-16
26:20--11
27:23--19
27: Resign.
i i. Samuel's victory was big news to academicians, but apparently not to IBM's shareholders. O n July I I , 1962, I B M closed at 374 3/8, up a w h o p p i n g 15 5/8. Was this a r u n - u p on the stock in anticipation o f the NOTES TO PAGE 143
353
forthcoming victory over Nealey? IBM followed up on July 12, the day of the match, with another sizable 4 5/8 gain, closing at 379. On July 13, the final trading day of the week, IBM closed down 2 3/4. The pattern doesn't seem to fit the news, does it? Well, it depends on which news! Samuel's program was probably irrelevant to IBM's stock price, and the likely reason that the stock moved so rapidly on July I I and 12 was, as reported in the July 12 edition of the New York Times, that IBM's profits for the first six months of 1962 were $4.21 per share, up from $3.67 per share for the same time period of 196 I. That was headline news. In all my rummaging through old copies of the Times, I didn't see a single remark about Samuel's program defeating Robert Nealey. 12. Champions do make blunders like this against strong opponents, other champions. They don't make blunders like this against weak players. As I'll describe shortly, Samuel's program, was not a strong player. 13. Some people with w h o m I've spoken have offered the excuse that Nealey was blind and couldn't see where the checkers were. This isn't a valid excuse. Blind champions know where the pieces are. The fact that they're champions means they've demonstrated that ability. 14. The quote is taken from A. L. Samuel, "Some Studies in Machine Learning Using the Game of Checkers," in Computers and Thought, E. Feigenbaum and J. Feldman, eds. (New York: McGraw-Hill, 1963), P. I o. Nealey's remarks constitute the dogmatically repeated and widely accepted assessment of the strength of Samuel's program. As we'll see in the text shortly, it doesn't measure up to the truth, which has remained mostly unknown till now. The truth, however, is not completely unknown. In an amusing moment, when I found the 1963 copy of Computers and Thought in the U C S D library and flipped to pages IO3-5, as I mentioned before, I found the moves past number fifteen to be missing. But I also found copious pen354
NOTES TO PAGES 1 4 3 - - 1 4 6
cil scribblings left behind by an anonymous previous borrower of the book. N o doubt he or she would make a good movie critic. Where the appendix was titled "Game between R. W. Nealey a n d . . . " he or she had circled the word game and written skeptically, " W h y just one?" Where it was reported that IBM Research News had described Nealey as one of the nation's foremost players, an asterisk appeared with the cynical comment, "Do they know how to rate checkers players?" Next to Nealey's comment that all the moves up to 32-27 had been published previously, our commentator wrote, "Bull!" The coup de grfice came after Nealey wrote that he had not had such competition in the endgame since he lost his last match in 1954, whereupon our critic wrote, "He probably retired in 1954." Certainly, Schaeffer treated Nealey more kindly than this in One Jump Ahead. I hope I have, too, while still remaining fair and accurate. 15. What's more, Nealey's assessment of the program's endgame performance is questionable. Schaeffer (One Jump Ahead, p. 95) noted that Samuel's program made a glaring error on move 25:2-6. The program weakened the back row, allowing Nealey to get a king, but Nealey seemingly missed this opportunity. 16. For example, John Holland, a former co-worker of Samuel's at IBM, claimed in his 1998 book Emergence (Reading, MA: AddisonWesley, p. I9) that Samuel's results have yet to be surpassed. O n page 64, Holland also ascribes to Samuel's program an ability to play a winning game against champion players. As offered in the text, Samuel's program seems to have had only one victory against anyone designated as a "champion," and that game was less than impressive. To be clear, let me reemphasize that Samuel was repeatedly cautious about the achievements that he'd made and didn't strive to overemphasize his early success. Others in computer and popular science showed less discretion. For example, in 1979, Richard NOTES TO PAGE I 4 6
355
Restak wrote in The Brain: The Last Frontier on page 336: 'Tkn improved model of Samuel's checkers-playing computer today is virtually unbeatable, even defeating checkers champions foolhardy enough to 'challenge' it to a game." 17. Schaeffer, One Jump Ahead, p. 97. I8. Schaeffer, One Jump Ahead, p. 97. 19. Samuel, "Some Studies in Machine Learning," pp. 60 I - I 7. 20. Burke Grandjean, 'Tk Checker Debate," Personal Computing (May
1980): 83. 2 I. Schaeffer, One Jump Ahead, p. 97. 22. The total number of games that Samuel played before 1962 isn't readily available. An educated guess can be made based on the reprint of Samuel's 1959 paper in the edited volume Computers and Thought, compiled by Edward Feigenbaum and Julian Feldman in 1963. In the reprint: Samuel indicated the first twelve terms of the evaluation polynomial after forty-two games and wrote that twenty more games had been played recently. It seems unlikely, then, that Samuel was able to complete more than at most a few hundred games. If he had completed thousands of games, then it wouldn't be remarkable to comment on the addition of an extra twenty. 23. Samuel, "Some Studies in Machine Learning," pp. 60 I - I 7. 24. Schaeffer, One Jump Ahead, p. 99. 25. Schaeffer, One Jump Ahead, p. I oo.
Chapter 9: The Samuel-Newell Challenge I. This is true, in some cases, even for people. As reported in Robert B. Cialdini, Influence: The Psychology of Persuasion (NewYork: Quill, 1993), pp. 2-5, humans often respond to stimuli automatically. Their reactions are described as "click-whirr," in analogy to someone pressing a button 3S6
NOTES TO PAGES 1 4 7 - - 1 5 2
(giving the stimulus) and rolling video tape (corresponding to the behavior elicited). 2. J. L. Gould and C. G. Gould, "An Animal's Education: How Comes the Mind to Be Furnished?" The Sciences 25, no. 4 (1985): 24-3 I. 3. The wasp's robotic behavior was demonstrated by the French naturalist Jean Henri Fabr4. When Fabr6 studied the wasp, he moved the cricket slightly while the wasp was inspecting its burrow. Upon emerging, rather than drag the cricket inside, the wasp would reposition the cricket at the opening and go back inside, repeating the behavior. "No matter how many times Fabr6 moved the cricket, and no matter how slightly, the wasp would never break out of the pattern. No amount of experience could teach it that its behavior was wrongheaded: Its genetic inheritance had left it incapable of learning that lesson" (quoted from Gould and Gould, 'An Animal's Education," pp. 24-3 I). Fabr6 wrote, "The insect, which astounds us, which terrifies us with its extraordinary intelligence, surprises us, in the next moment, with its stupidity, when confronted with some simple fact that happens to lie outside its ordinary practice." Remember, it's not the individual wasp that demonstrates intelligence, but rather the evolving species of wasps. The individual wasp's hard-coded behavior is essentially a static genetic "expert system" that, under the right circumstances, can appear quite smart. The genetics of evolving wasps, however, is dynamic, undergoing random variation and selection based on the appropriateness of the individual wasp's behaviors. The species shows the ability to adapt behavior to meet the goal of survival in a range of environments, even if a single individual of that species might act like a classic robot. 4. You might find the details interesting. The bees communicate the distance by varying the speed at which the waggle dance is performed. Faster cycles of the dance mean that the food is closer. The direction is deNOTES TO PAGES 1 5 4 - - 1 5 6
:~57
termined by the angle of the center line of the dance relative to vertical. If the center line is displaced by, say, forty-five degrees to the right of the vertical, then the food source is forty-five degrees to the right of the sun. If the bee runs directly down, then the source is directly away from the sun.
Even if there's no observable sun for some period of time after a bee returns with a dancemsay, because of a rainstorm--the bees can later still fly in the proper direction. They seem to have an internal clock that allows them to keep time and relate it to the motion of the sun. Also, inside a dark hive, the bees can use other cues to determine the information in the waggle dance, including physical contact with the dancing bee and the buzzing sounds it makes. (See N. A. Campbell, Biology, 2d ed. [Redwood City, CA: Benjamin/Cummings, 1990].) 5. Unlike a person, however, the worker bee devotes its life to the reproductive success of the hive. Those colonies that more accurately and reliably communicate the sources of food in their neighborhood have a selective advantage and are less likely to perish. In turn, they pass along the genetic characteristics that underlie their behavior to their progeny. Here, the queen bee and her drones create the progeny; the workers, which are female, don't reproduce. Nevertheless, the workers' genetic wiring comes complete with the instructions for transmitting and receiving a coded message that details the location of food sources, even more than a kilometer from the hive. Endless other instances remain to be discovered. We've only known about the waggle dance of the honeybee for slightly more than fifty years, and we've only discovered a small percentage of the estimated species on the planet. We humans have to admit the beauty of the bee's ability to invent a means for describing its surroundings. After all, we did give von
3SI1
NOTES TO PAGE 157
Frisch the Nobel prize for discovering it. If we give a Nobel prize for discovering the honeybee's communication system, what honor should we give evolution for having invented it in the first place? 6. The cases in which individuals and groups learn from their experience only bolster the argument here: As mentioned in chapter I, individual and social learning are tricks of evolutionary learning that provide for more rapid adaptation. Patterns can often be discovered and stored in neuronal connections or the equivalent of written words more quickly and effectively than in genes. But individual, social, and evolutionary learning all share the fundamental similarity of a reservoir of learned knowledge and a means for varying that knowledge. The two-step process of variation and selection is a common theme.
Chapter 1 o: Evolving in the Checkers Environment i. The total number of weights is found by assigning one weight for every connection and one weight for every hidden and output neuron, which represents a threshold term for that neuron. Therefore, with thirtytwo input neurons connecting to forty neurons in the first hidden layer, there are 1,32o variable weights. The forty neurons in the first hidden layer connect to ten more neurons in the second hidden layer, yielding 41 o more weights. Finally, the ten neurons in the second hidden layer connect to the output neuron, giving eleven more weights. In total, there were 1,741 variable weights in the neural network. 2. The point values assigned were somewhat arbitrary, but our choice of assigning a greater penalty for losing as opposed to the gain for winning was simply to encourage the neural networks to avoid losing. We never experimented with alternative point values. 3. A game does not have to eschew randomness to be a game of skill.
NOTES TO PAGES 1 5 7 - - 1 7 6
359
Backgammon and blackjack are two good examples of games that combine skill and chance. It's easy to think of others: Monopoly, poker, and even sports such as tennis, baseball, football, and so forth, combine skill and luck. In contrast, there are games involving pure luck, such as roulette, craps, or slot machines, in which skill does not even enter into the picture. In this latter case, evolution cannot bootstrap and create better players over multiple generations. Competition would weed out those players who did poorly at any given generation, but there would be nothing for an offspring to inherit from its surviving parent that would better prepare it for the next competition. Evolution is a historical process. Each success leads to the next offspring. Games like roulette, craps, and slots are without a memory and thus are not historical processes. For evolution to "work," it must be possible to extract useful knowledge about the environment and pass that knowledge along from parent to offspring. There is no useful strategy in a game like roulette, unless your strategy is to be the casino. 4- In mathematics, a bell curve is known as a Gaussian distribution, after Carl Frederick Gauss. 5. You can find the details of the random variation procedure in Kumar Chellapilla and David B. Fogel, "Evolving Neural Networks to Play Checkers without Expert Knowledge," IEEE Transactions on Neural Networks I o, no. 6 (I 999): 1382-9 I, and Kumar Chellapilla and David B. Fogel, "Evolution, Neural Networks, Games, and Intelligence," Proceedings
of the IEEE 87, no. 9 (1999): 1471-96. 6. Remember that Samuel's 1959 version halted at twenty ply regardless of whether the board was quiescent. 7. There is a version of checkers called "suicide checkers" in which you must lose pieces as quickly as possible. Thus the piece differential might be correlated negatively with winning. 360
NOTES TO PAGES I 7 6 - - I 8 2
Chapter 1 1 : In the Zone I.
There are other accepted methods for rating a player who has never
played before. One method uses a different formula for the first twenty games, whereby the player has then established a rating, before switching the usual formula as offered in note I o to chapter 2. In essence, zone.com's method assumes that new players will rate at 1,6oo after their first twenty games and simply awards them that score to begin with. 2. Essentially, Kumar and I acted as surrogates for the best-evolved neural network. We weren't using a computer program to enhance our own play; instead, the neural network was using us to enter its moves and report back the moves of its opponents. Nevertheless, it's easy to speak of "our rating" or "our victory." Properly, the neural network deserves the credit. As an aside, I'll add that the neural network's program ran on a separate computer and never interacted directly with any computer server from zone.com. Kumar and I were the lifeline between the Internet and the neural network. 3. By settle, I mean the condition whereby the neural network's rating tends to fluctuate around some value. Suppose a player can defeat most opposition rated below 1,8oo but has trouble with higher-rated opponents. Since the player starts with a rating of 1,6oo, his or her rating is likely to climb rapidly at first but then will tend to settle at around 1,8oo. In this period of settling, the rating may overshoot 1,8oo. 4. Here is the complete move listing from the game against the human opponent rated 1,926, who played as red. I've offered comments on some moves along the way.
Red
White
(Human Opponent)
(Neural Network)
I" II--I6
I: 23--I 9
2:I6-23
2:26-I9
Comments
An early swap. NOTES TO PAGES I 8 7 - - I 9 9
S61
Double swap.
3: 8--II
3:19-15
4" I I--I 8
4:22-15
5:10--19
5:24-15
6:7--10
6:27-24
7:10--19
7:24-15
8:6--10
8:15-6
9: I--IO
9:25-22
I0:9--14
I o: 30-26
I I : 3--7
Ii: 22-17
12:4--8
12:26-23
13: 8--II
13:28-24
14: II--I 5
14:32-28
15: 7--II
15:29-25
16:15--I8
16:23-19
17:18--23
17:24-20
Move t o swap.
Red swaps.
Is the neural network trying for a king?
Neural net stops our opponent from getting a king.
I8" 5--9
I8" I 7 - I
I9:I4--I8
19" I 3 - 6
20: 2-- 9
20" 2 I - I 7
2I" 9--13
21" 19-15
22: IO--I 9
22" 17-14
3
Swap? Here's the swap.
H u m a n opponent makes a mistake here.
36Z
NOTE TO PAGE I 9 9
23:13--I7
23" 2 5 - 2 I
24:17--22
24:14-9
25: II--I 5
25:9-6
26:22--26
26:31-22
27:I8--25
27" 6 - I
Neural network gets a king.
28:15--I8
28" I--6
29" 2 3 - 2 7
29" 6--I0
30" I8--22
30" IO--I 5
Neural network sets up a double jump.
3I: I9--23
3I: 20--I6
32: I2--I 9
32:15--24--3 I
33:25--29
33:2I--I7
34:22-25
34: I7--I 3
35:25-30
35:I3--9
36:3o-25
36:9--5
37:23-26
37:31--22
H u m a n gets a king.
Exchanging a piece for a king.
38" 2 5 - - I 8
38" 5 - I
Neural network gets a king.
39:29--25
39:I-5
40:I8--I4
40:28-24
Neural network advances its remaining checker.
4 I" 25--22
41" 2 4 - I 9
NOTES TO PAGE I99
363
42" 22-18
42:19-16
43" 18-15
43" 16-12
Human opponent in pursuit.
44: I 5-I I
44:5-1
Opponent traps white's checker.
45" I 4 - 9
45" I-5
46" 9-6
46:5-1
47" 6-9
47:1-5
Moves forty-eight through fifty-seven repeated the toggling back and forth from moves forty-six and forty-seven. After the fifty-seventh move, the h u m a n player offered a draw, which I accepted on behalf of the neural network. 5- This is the complete move listing from the game against the h u m a n player rated 1,77I, w h o played as white. Again, I've offered comments on some moves.
364
Red (Neural Network)
White Comments (HumanOpponent)
I" 9-13
I" 22-18
2" I I - - I 5
2" I 8 - - I I
An early exchange.
3" 7-16
3" 25--22
Neural network likes to head to the side of the board.
4:5-9
4" 22-18
5" 3-7
5" 29-25
6" 1-- 5
6" 2 5 - - 2 2
NOTES TO PAGE 200
7:I6-I9
7:23-16
8: I2--I 9
8:24-15
9:IO-I9
9:27-24
I o: 7 - I
I o: 24-15
I
A double exchange.
Neural network sacrifices the piece on 19. Is it setting up something?
9--14
I I : 18-- 9
12: I I - I 8-25
12:26-23
I I:
Double jump, with a king to follow.
13:5--14
13:23--I9
Takes back the trade, overall up one piece.
I4:25--29
14:31--26
15:6--10
15-" 19--16
16:8--I2
I6: I6--I I
A simultaneous threat to two back-rank pieces.
17:12-16
17:28--24
18" 29-25
18" 32--27
19: 16-2o
19:24--19
20:13-17
20:26--23
21:25-22
21:19--16
22:22-26
22:23--I9
23:26-3 I
23:27--23
24:17-22
24: I I--7
NOTE TO PAGE200
365
25:2-11
25:16-7
26:31-27
26:7-2
27:27-18
27:19-16
28:18-23
28:16-11
29:22-26
29:2-6
Probably a mistake here for white.
30' 4-8
3o: 11-4
I'm not very happy.
31" 20-24
31" 6-15
And the neural network gives up the checker on 10 anyway! Oh, my.
32" 23-27
32:30-23
33" 27-18-11
33" 21-17
34:14-21
34:4-8
35:11-4
35" Game over
Human player gets king.
What a setup! Wow!
6. We constantly monitored our rating as we went. Once we broke the 1,8oo barrier, we could have quit, claiming that our neural network was in Class A. Furthermore, once we hit 1,825 on game eighty-five, we could have searched out weak opponents in an attempt to save our rating. But because we prescribed ahead of time that we would play a diverse range of opponents with different abilities, we continued to pick players whose ratings spanned the range from the 1,5oos to the high 1,8oos and low 1,9oos. Many of the final fifteen opponents were rated just below 1,8oo. We were a bit short on opponents of this caliber, and I wanted to be doubly sure that if our neural network broke 1,8oo it would have earned it. In the end, it couldn't hold on to the Class-A rating and fell back to the high range of Class B.
366
NOTES TO PAGE 2 0 5
Chapter 12:A Repeat Performance i. Here's how alpha-beta pruning works. Suppose the tree o f moves and their respective evaluations looks like that shown in figure 74. Suppose the circle at the top of the graph represents our current position. We have two possible moves, corresponding to the two branches leading to the squares in the second level. For each of our possible moves, our opponent has two possible replies. The perceived value of the board after his possible reply is indicated in the circles at the lowest level. If we take our first option for our move, then the opponent's replies will lead to a position that's valued at either 0.2 or 0.4. Presuming that the opponent is rational and is trying to minimize our score, he should reply with the move that leads to the position worth 0.2. Thus our first option is worth 0.2. If we were to take the second option, we'd end up with a case in which the opponent's first possible reply would lead to a board that's valued at -0.1. Here is where the alpha cutoff comes in: We don't have to evaluate the opponent's second possibility, because it doesn't matter. If the value represented by the "?" were greater than -0.1, then we'd assign-0.1 as the value to our second option, because the opponent would again choose the move that did us the most damage. If the value represented by the "?" were less than -0.1, then the opponent would choose that alternative, but either way, the value we have to assign in the square that corresponds to our second possibility is less than or equal to -0.1. That's worse than the 0.2 value we've already assigned to our first option. We should therefore choose our first option and save ourselves the effort of evaluating one of the four possible outcomes at two ply. To see how the beta cutoff works, take a look at the tree in figure 75. Again, suppose that the circle at the top of the graph represents our cur-
NOTE TO PAGE 2 I 3
367
0.2
<_0.1
FIGURE 74
Illustration of an alpha cutoff in alpha-beta pruning.
rent position. From the minimax rule, we can see that the value we'd assign to the square B is 0.25, because it's the lower of the values 0.25 and 0.4. What about the value for square C? To find that its value should be 0.2, we have to proceed down the left branch under square C to circle E To evaluate circle F, we have to proceed down to squares I and J, and then proceed again below those squares to the circles with indicated values. According to the minimax rule, we can assign square I the value 0.05 and square J the value 0.3. (Note, we could prune out the circles with values 0.1 and 0.7 under square I using an alpha cutoff.) Thus because we're maximizing over values at the positions that are shaped as squares, circle F gets a value of 0.3. 368
NOTE TO PAGE 2 I 3
B
D/ / " - ~
E
/
~
~
I
C
F, / ' - ~
/
/
J
G/'~
K
/
I-
\
L
M
\
~
N
FIGURE: 7 5
Illustration of a beta cutoff in alpha-beta pruning.
N o w comes the beta cutoff. W h e n we seek to value circle G, we have to look below to squares K and L. Square K has a value of 0.4. This means that even though I've listed square L as having a value of 0.5, the "?" means that we don't have to evaluate this position. We know that we'll choose the m a x i m u m of 0.4 or whatever the value in square L is, so the value assigned to circle G is at least 0.4. Remember, however, that the NOTE TO PAGE 2 1 3
369
opponent will be minimizing over the circles F, G, and H, so since F is valued at 0.3, the opponent will never let us proceed down the branch with circle G, which has a higher value of at least 0.4. Finally, we assign the value of 0.2 to circle H as the maximum of squares M and N. Therefore, square C gets the minimum of 0.3, a number that is at least as large as 0.4, and 0.2, which is 0.2. Since we should choose the maximum between square B and square C, we choose the move that leads to square B. 2. To understand this, consider the simple example in which you start with a rating of 1,6oo and defeat someone rated 1,7oo but lose to another opponent rated 1,5oo. If you win against the 1,7oo-rated opponent first, you'll gain 2o.48 points. Then, with your new rating of 1,62o.48, if you lose to the 1,5oo-rated player, you'll drop 21.34 points to a final rating of 1,599.14- Instead, if you lose to the 1,5 oo-rated player first, you'll drop 20.48 points, down to 1,579.52. But then, by defeating the 1,7oo-rated player, you'll jump 21.34 points to a total of 1,6oo.85. By winning first, then losing, you end up worse off than when you lose first, then win. True, the difference is only 1.72 points, but that's just for two games. W h e n taken for one-hundred games, that difference might add up to fifty or more points, and that's enough to be a difference that makes a difference. 3. Here are the moves between our evolved neural network and a human rated 2,134, ranked in the top fifty players on zone.com at the time of the game.
Red White (Human Opponent) (NeuralNetwork)
370
22-18
I: 1 1 - 1 5
I:
2:15-22
2:26-17
NOTES TO PAGES 2 1 8 - - 2 2 1
Comments
3: 8 - I I
3:17-13
4:9-I4
4:30-26
5:II--I5
5:25-22
6:I4-I7
6:21-14
7:IO-I7
7:24-19
8:I5-24
8:28-19
9 : 1 7 -21
9:22-I7
Neural network pinned red's checker on 2 I.
IO: 7--10
1o: 19-16
I I: 12--19
II: 23-16
12: IO--I 5
12:26-23
13: I 5 - - I 9
I3:31-26
I4:3-7
I4" I 6 - I 2
15" 7 - I o
15" 2 3 - 1 6
I6" IO--I 5
I6" I 6 - - I I
17" 1 5 - - I 9
17" I2--8
I8:6--I0
I8" 8-3
I9: IO--I 5
19:I7-I4
20:2-6
20:I4-9
2I: 5--I4
2I" 29-25
Neural network chooses not to swap.
H u m a n opponent gives up a piece? Was there no better option?
Maybe 15-18 would have been better for our opponent?
Neural network gives up a king.
NOTE TO PAGE 22I
37|
372
22:21--30
22:3--7
23:30--23
23: 27--I 8--9--2
24:I9--23
24:7--10
25:15--I8
25:2--6
26:18--22
26:6--9
27:22--26
27:9--5
28:26-3 I
28:II-7
29:23--26
29:7-3
3O: 26--3O
3o: 32-28
3I: 31--27
3I: I3-9
32:30--26
32:9-6
33:26--23
33:6-2
34:23--I9
34:3-7
35:I9--I6
35:IO-I5
36:27--32
36:15-11
37:I6--I2
37: 7-1o
38:32--27
38:11-7
39:I2--8
39:2-6
40:8--I2
4o: 7-3
41:27--23
41:28--24
42: I 2 - I 6
42:24--20
43:I6-I2
43:6--9
44:23 - I 8
44:9 -13
N O T E TO PAGE 2 2 1
45:18-23
45:13-17
46:23-19
46:17-13
47:19-24
47:IO-7
48:24-19
48: 7-1o
Both players toggle positions.
49:19-24
49:13-9
An eight-ply search in under two minutes leads to 13-9.
5 TM 2 4 - - 1 9
50:20--16
Neural network's remaining checker moves toward becoming a king.
51:19--24
51:16-11
52:12-16
52:11-7
53:16--11
53:7-2
54: I I - I 6
54:2-7
55:24--20
55:7-2
56:20--24
56:2-7
57:24--20
57:7 -2
58:16-19
58:9-14
59:2o-16
59:I4-I7
60:16-20
65: I7--2I
61:2o-16
6I: 2--6
62:16-11
62: I0--I 5
63:11-18
63:3-7
Five kings for the neural network.
Neural network sacrifices the king on 6.
NOTE TO PAGE 2 2 I
373
64: I - I o
64:7-14-23-16
65" 4-8
65" 21-25
66: Resign
Triple jump! Wow!
Neural network wins.
4. Game seventy-three pitted our neural network against a master, rated 2,207, ranked n u m b e r eighteen on the website. This time the neural netw o r k played red, and the h u m a n o p p o n e n t played white.
374
Red (Neural Network)
White Comments (HumanOpponent)
I: 11--16
I: 22--18
2:16-19
2:24-15
3:lO-19
3:23-16
4:12-19
4:25-22
5: 7-1o
5:21-17
6:9-14
6" 18-9
7:.5-14-21
7" 22-18
8: 8-1I
8:27-24
9: I I - I 6
9:24-1:5
IO: lO-19
IO: 31-27
II: 3-7
II: 18-15
12:I-5
12:29-25
13:4-8
13:27-24
NOTES TO PAGE 2 2 5
N o t a gooo1 move for a master.
Neural network up one checker.
14:6-9
14:24-20
15:8--12
I5: 2 0 - - I I
Neural network seems content to do another swap here, up one checker.
16:7-16
16: I 5 - - I I
Human player goes for a king.
17:19--23
17:26-19
18:16--23
18: I 1--8
19:2--6
I9:8-3
20:23--26
20:30-23
21:21--30
2 I : 3--8
22:30--26
22:23--19
23:26--23
23:19--15
24:12--16
24: 8--I I
25:23--19
25: I 1--20
26: 19-1o
26:32-27
27:9-13
27: 20--I 6
28" IO--I 5
28:28-24
29:15-18
29:16--19
30:13-17
30:24-20
31:17-22
3I: 20-16
Opponent gets a king.
Neural network gets a king.
Neural network goes for a swap of checkers.
Human opponent blocks neural network's king from threatening his remaining checkers.
NOTE TO PAGE 225
375
32. 22--26
32. 19-23
33" 26--3I
33" 2 3 - I 4
34:3 1--24
34: I 6 - I I
35" 24--27
35" I I - 7
Neural network plays very defensively here.
36:6--10
36:I4-I7
37:27--3 I
37:7-2
38: IO--I 5
38:2-7
39:15-I8
39:7 - I I
40:I8--23
40: I 7 - I 3
4I: 23--27
4I: I I - I 5
42:27--32
42:15-19
43:32--27
43: I 9 - I 5
44:27--23
44: I 5 - I O
O p p o n e n t gets a king.
Neural network gets a king.
Neural network starts to toggle back and forth.
45:23-27
45:IO-I4
46:27-23
46: 14-1o
47:23-27
47: l O - I 5
48:27-23
48:15-IO
49:23-27
49: I O - I 5
50:27-23
50:13--17
Six-ply search.
5I: 23-27
51:17--13
Eight-ply search.
52 : Draw
Ten-ply search shows
accepted
nothing new.
52:27--23
376
N O T E TO PAGE 2 2 5
5. To double-check the validity of the 1,9oi.98 rating, take a look at the histogram in figure 49, which shows how well the evolved neural network did against each different level of opponent. The neural network never lost to anyone rated below 1,7oo , racking up twenty-eight wins and five draws with no defeats. From 1,700
tO
1,9OO it cranked out fifteen wins, eight
draws, and nine losses, showing that it had a significant advantage. Beyond 1,9oo , its performance degraded. Against opponents in the upper level of Class A, between 1,9oo and 2,ooo, the neural network lost more than it won, and beyond 2,ooo it only won three games and played one to a draw, while losing eleven. The results go hand in hand with a rating of 1,901.98.
Chapter 13: A New Dimension i. The paper was published in the IEEE Transactions on Neural Networks i o, no. 6 (1999), a leading scientific journal in this particular area of computational intelligence. 2. The complete moves of game eleven against a h u m a n rated 2,024.
Red
White
(Human Opponen0
(Neural Network)
I: I I - I 5
I: 24-2o
2: 8-I I
2:23-I8
3:4-8
3:26-23
4:IO-I4
4:27-24
Comments
5:7 -IO
5:24-19
61 15-24
61 28-19
Swapping pieces.
71 I o - I 5
71 19-1o
Swapping again.
8:6-15
8:31-26 NOTES TO PAGES 2 3 1 - - 2 4 2
377
9 : 9 -13
9:18-9
IO: 5-14
IO: 23-18
II: I4-23
II" 2 6 - 1 9 - 1 o
12:2-6
12" 22-18
O n e more swap.
Neural network gives up two checkers to get one back, making the two players even again.
13" 6--I 5--22
I3:25-18
I4:3-7
I4:30-26
I 5" I-- 5
15:26-23
I6:7--I0
16:23-19
I 7 : IO--I 4
17:18-9
H u m a n player chooses to swap.
I8" 5 - - I 4
18" 3 2 - - 2 7
I9" I 4 - - I 8
I9:29-25
20" I I--I 5
20:I9--I0
Neural network takes the lead.
2I: 8 - I I
2I: IO-7
22:1 I--I 5
22:27--24
I ' m befuddled here.
23:I8--23
23:7--2
A king, but one move late?
24:23--27
24:2--7
25:27--3 I
25:25--22
26:31--27
26:7--10
O u r opponent resigns after 7-IO.
3'7'8
NOTE TO PAGE 242
3. T h e c o m p l e t e moves o f game sixty against a h u m a n player rated 2,210.
Red (Neural Network)
White Comments (HumanOpponen0
I: 9 - I 3
I: 2 3 - I 9
2: I o - I 4
2: 2 2 - I 7
3:I3-22
3:25--I8--9
4:5--14
4:29--25
An early swap of two checkers.
5" I - 5
5:25-22
6: 7-IO
6:26-23
7:6-9
7:24-20
8:3-7
8:28--24
Neural network indicates that it has a strong position.
9:9-I3
9:22-I8
IO: 5 - 9
Io: 32-28
Neural network avoids the swap and will soon go up one checker.
II" I I - - I 6
II" 20--II
I2:8--15--22
12:30--26
13" 7--II
13" 26--I 7
Neural network moves to swap out a checker instead of going for instant gratification.
NOTE TO PAGE 246
379
380
I4:13--22
14" 24--20
15" I I - - I 5
15" 2 7 - 2 4
I6:4--8
I6:2o-I6
17: 8--I I
I7:I6-7
18: 2 - - I I
I8:24-20
19:I5--24
19: 2 8 - I 9
20: I I--I 5
2o: 2 o - I 6
2 I : 15--24
21: I 6 - I I
22:24--28
22: 1 1 - 7
23: I O - I 5
23:7--2
24:9 -13
24:2--7
25:22--25
25 97--11
26:15-18
26:23--19
27:25-29
27:19--15
Neural network gets a king.
28:28-32
28: I 5--I0
Neural network gets another king.
29:18-22
29:I0-6
30:22-25
30:6-1
3I: 2 5 - 3 0
3I: I-6
32:I4-I8
32: Resign
NOTE TO PAGE 246
Again, the neural network could move ahead for a king but chooses alternative, control of center?
Swapping pieces. Swapping again.
Human opponent gets a king.
Opponent gets a king.
Chapter 14: Letting the Genie Out of the Bottle i. That effort led to a publication in the Proceedings of the IEEE, the flagship technical journal of the Institute of Electrical and Electronics Engineers. 2. At first you might not see the connection between evolutionary neural networks that play checkers and photo-optics, but the association is easy to find. The theme of the conference concerned different aspects of intelligent sensors on, primarily, military weapons. There's great interest in improving the ability of those weapons to recognize targets in a variety of environments, as on the surface of the ocean or hidden in forests. Artificial intelligence plays a key role in recognizing patterns that indicate friendly or opposing forces. Any new advance in getting machines to learn how to find patterns in data without being told about what patterns they're looking for is of direct concern. Finding patterns in the game of checkers isn't the same as finding patterns in radar returns, but if the evolutionary program doesn't know that it's playing checkers, who's to say that it won't do well when looking for patterns in new data from other sources that are more directly relevant to the military? 3. There are lots of examples of these solutions, even going back to the early days of evolutionary computation. In the 196os, my father used an evolutionary program to evolve an algorithm that could predict sequences in I Q-type tests as well as many graduate students could. Also in the 196os, Hans-Paul Schwefel used an evolutionary algorithm to design the shape of a nozzle for controlling fuel flow that was superior to the bestavailable human design. 4. The Congress on Evolutionary Computation is jointly sponsored by the IEEE, the Evolutionary Programming Society, and the Institution of
N O T E S TO PAGES 2 5 7 - - 2 5 9
381
Electrical Engineers (IEE) based in the United Kingdom. The IEEE had sponsored an annual IEEE International Conference on Evolutionary Computation since 1994, the Evolutionary Programming Society had sponsored its own annual meeting since I992, and the lEE has sponsored a meeting called GALESIA (Genetic ALgorithms in Engineering and Science: Innovations and Applications) since 1995. 5. Having the program run twice as fast with straight-line coding probably meant that the original code was not optimized efficiently at compile time. 6. Moves from the game against a human rated 2,I 92. The neural network moved first.
382
Red (Neural Network)
Comments White (HumanOpponent)
I: 9-14
I: 22-18
2:6-9
2:24-19
3:11-15
3:18-11
4:7-16
4:25-22
5:1-6
5:29-25
6:3-7
6:22-18
7:I6-20
7:25-22
8: 8-1I
8:19-16
9:12-19
9:23-16
Swapping checkers.
IO: I4-23
IO: 26-19
Another swap.
NOTES TO PAGES 2 6 0 - - 2 6 6
Swapping checkers.
Neural network moving up back pieces.
II" II--I 5
II" 27-23
O p p o n e n t elects for another swap.
I2" 15--24
I2:28--19
I3:I0--I4
13:31--27
I4:9--I3
14:16--11
15" 7--I6
15:19--12
I6:6--9
16:30--25
I7:2--7
17:32--28
18:7--11
I8:I2--8
19:14--18
19: 22--I 5
Swap again.
Neural network forces a j u m p reply, no king for white yet.
20" I I - - I 8
20" 2 3 - I 4
2I" 9--18
21" 8-3
22" 5-9
22" 3-7
23" 4-8
23" 28-24
A few more swaps, but now white has a king.
White moved the king so the neural network advanced the checker trapped on 4.
24:I8-23
24:27-18
25" 2o-27
25" 7 - I 0
Nice swap.
26:27-3 I
26: I 0 - I 5
Neural network gets a king.
27:31-27
27:15-I9
White's king moves to trap the neural network's king in a corner.
NOTE TO PAGE 2 6 6
383
28" 8--II
28" 25--22
29:II--I6
29:I9--I2
Forces opponent's king to jump.
30:27-23
30:12-16
Opponent gives up piece on I8.
3I: 23-I4
3I: I 6 - I 9
32: I4-IO
32:22-18
33:IO-I4
33:I8--I5
34:14--18
34:15-11
35:9-14
35:19-15
36:14-17
36:21-14
37:18-9
37:I5--I8
Human player offers draw, accepted.
7. The moves of the game against a human opponent rated 2,054. The neural network played red and moved first.
384
Red (Neural Network)
White Comments (HumanOpponent)
I: 9-14
I: 22-18
2:6-9
2:25-22
3:1-6
3:24-20
4:II-I6
4: 2O-ll
NOTES TO PAGE 2 6 9
Neural network forces an early exchange.
5" 7 - I 6
5" 2 8 - 2 4
6" 1 6 - 2 0
6" 2 2 - I 7
7" 9 - I 3
7" I 8 - 9
N e u r a l n e t w o r k forces a n o t h e r e x c h a n g e series.
8" I 3 - 2 2
8" 2 6 - I 7
9" 5 - 1 4
9" 2 9 - 2 5
IO" 6 - 9
I0" 2 4 - I 9
II" 8 - I I
II" 2 5 - 2 2
N e u r a l n e t w o r k ' s shoring up position.
12:4-8
I2" 22--18
13" 9 - I 3
13 918-- 9
14" i 3 - 2 2
14" 30--26
H u h ? 3 0 - 2 6 ? H u m a n error?
15" 2 2 - 2 5
15" I 9 - - I 6
O p p o n e n t forces d o u b l e exchange.
I6: I 2 - I 9
16:23--I6--7
17" 2 - 1 1
17:9--6
I81 3 - 7
18:6--2
H u m a n o p p o n e n t gets a king.
19:25-29
19:21--17
N e u r a l n e t w o r k gets a king.
2o1 8 - I 2
20:17--13
2I: l O - 1 4
21: 13-- 9
O p p o n e n t seems to have the edge now.
22" I I - I 5
22" 2 - I 1-18
23" I 4 - 2 3 - 3 o
23" 9 - 6
24" 3 0 - 2 5
24" 6 - 2
NOTE TO PAGE 269
385
25:25-22
25" 2-7
26" 22-17
26: 7-IO
27:29-25
27:IO-I5
28:I7-I4
28" I 5 - I 9
29:14-9
29:31-26
30:25-21
30:26-22
31:9-14
31:22-18
32:I4-23-I6
32: Resign
The neural network has the "move" on white's king, forcing it back.
Chapter 15" Blondie2 4 i. T h e moves b e t w e e n Blondie24 and her h u m a n o p p o n e n t rated 2,I 73. T h e o p p o n e n t played red and w e n t first.
386
Red
White
(Human Opponent)
(Blondie24)
I: 11-15
I: 23-19
2:9-14
2:27-23
3:8-11
3:22-18
4: 1.5--22
4:25--18--9
5:5-14
5:26-22
6: I I--I 5
6:32-27
7:4-8
7:22-17
8:8--11
8:17--13
NOTES TO PAGE 279
Comments
Blondie starts a double swap.
9" I I - I 6
9" 24-2o
IO" 15-24
IO" 2 o - I I
Red's forced move. Blondie chooses which j u m p to play.
I I" 7 - - 1 6
I I" 2 7 - - 2 0 - - I I
Again, red's forced reply. Blondie goes up one piece with the double jump.
12" 12--16
12" 28--24
13" 16--20
13" 31--27
I4: IO-I 5
14" 30-26
15" 6-IO
I5" I 3 - 9
I6" 1- 5
I6" 29-25
Red seems to box in Blondie's checker on square 9.
17" 3--7
17" 11--8
O p p o n e n t yields a king to Blondie's checker on square I I.
I8" 1 4 - - 1 7
I8" 2 1 - - 1 4
19" 1 0 - - 1 7
19: 8-- 3
20:5--14
20:3--IO--I9
2 I" Resign
Blondie decides not to swap here.
Game over; Blondie wins.
2. Just as w i t h statistical surveys p e r f o r m e d in presidential elections, we can use statistics to g e n e r a t e a m a r g i n for error, also k n o w n as a 95 p e r c e n t c o n f i d e n c e interval, a r o u n d B l o n d i e 2 4 ' s estimated rating. U s i n g standard statistical formulas, the true rating for the neural n e t w o r k lies NOTES TO PAGE 283
387
between 2,o44.9I and 2,046.79. The interval is quite short because of the large sample size of five thousand different orderings of the games played. 3. See
www.members.tripod.com/s2checkersfor
a listing of checkers
openings. The website is run by Sherman Gardner, the 2ooo ACF District 6 Mail Play champion. His site lists recent world championship matches for the "Go As You Please" (GAYP) form of checkers, in which you can make any opening moves you like. Almost all the GAYP championship games open with I I - I 5 or 9-14, but I I - I 5 is much more popular. 4. Actually, we made more than a dent in Chinook. After contacting Jonathan Schaeffer in May 2ooo to clarify some details about his program, I mentioned that our evolved neural network had earned a win over Chinook at the novice setting. He was thoroughly encouraging and suggested that we might test our program against Chinook in a ten-game match. Schaeffer told me in an email that when played at the novice setting, Chinook is a very strong player, with wins over master players, but it's not quite at the master level itself. Two months later, I used Blondie to play five games as red and five games as white. Chinook won the contest four wins to two, with four draws. These results provide good support for the expert-level rating that the neural network earned on zone.com as Blondie24. With two wins in six decisions, the neural network's rating would be about I2O points below Chinook's. Based on Schaeffer's assessment, Chinook at the novice level is not a master-level program, but it has wins over masters. It is probably rated in the high expert level, perhaps in the range of 2,15o to 2,I 75. By subtracting 12o points from this range, we wind up in the range of 2,o3o to 2,o55, which matched the 2,o4o rating that the neural network earned in 16 5 games on the Internet. 388
NOTES TO PAGES 2 8 7 - 2 9 8
Epilogue: The Future of Artificial Intelligence r. I'm borrowing my father's words from page 8 of his book Artificial Intelligence Through Simulated Evolution (New York: Wiley, I966), co-authored by Alvin J. Owens and Michael J. Walsh. The words are as true today as they were thirty-five years ago. 2. A. L. Samuel, "Some Studies in Machine Learning Using the Game of Checkers. I I m R e c e n t Progress," IBMJournal of Research and Develop-
ment (November I967): 6 o i - I 7 . 3. The Turing Test is a prime example. I have no difficulty envisioning a program that could pass the Turing Test and yet not be able to adapt its behavior to meet goals in a range of environments (following the definition of intelligence that I offered in chapter I). In fact, I fully expect that the first program that really does well in the Turing Test won't incorporate any learning at all. It will be a static rule-based program with canned responses. The program will have a table look-up procedure for choosing which response to choose or construct based on the particular sequences of words that the interrogator uses. Once such a program passes the Turing Test, the test itself will become plainly useless as a criterion for evaluating the intelligence of a program. My hope is that we won't have to wait for such an event to put the Turing Test behind us. 4. H o d Lipson and Jordan Pollack demonstrated that an evolutionary algorithm could design small robots for locomotion without using human expertise. See "Automatic Design and Manufacture of Artificial Lifeforms,"
Nature 4o6 (2ooo): 974-78. It would be a significant step to extend this work to have a robot evolve robots that could reproduce themselves. 5. Some initial experiments show promise in this regard. I've recently coupled Blondie with Cake++, a checkers program that uses the six-piece endgame databases compiled in Chinook. I use Cake++ to essentially N O T E S TO PAGES 2 9 9 - - 3 0 3
389
"look over Blondie's shoulder" after each move by examining the possible outcomes of playing Blondie's selection. If Cake++ sees into the endgame database and recognizes that Blondie's move could result in a position that is a known loss, I go back to Blondie and look for other moves she found while searching to alternative ply depths. I repeat the process whenever I find a move that leads to potential disaster or w h e n Blondie offers up no other alternatives. In essence, then, the perfect knowledge of the endgame is being used not to tell Blondie what to do, but rather to tell Blondie what not to do. This combination of Blondie and Cake++ was able to play C h i n o o k to a draw at the amateur level (one notch above Chinook's novice level), and Cake++ overrode only three of Blondie's moves. I continue to pursue the merits of this and other alternative methods for combining perfect or human knowledge with evolved expertise. 6. Simply combining our own knowledge independent of any machine's knowledge is a formidable challenge. At the 2ooo Congress on Evolutionary Computation, held July 16-I 9, 2ooo, in San Diego, I offered a "checkers challenge" whereby registrants could play against Blondie (then known as Anaconda) for the chance to win $ I oo. The first person to defeat Blondie using eight minutes or less of thinking time would earn the $ I oo. Twenty-five people tried, but no one was successful. (Another eleven people tried and failed at the 2oo I Congress on Evolutionary Computation, held May 27-30, 2oor, in Seoul, Korea.) For entertainment, during one of the lunch sessions, I offered everyone interested the opportunity to band together in a collective effort to defeat the machine without any imposed time limit. In the first game, eighteen people collaborated against Blondie, but the result was the same: Blondie I, people o. A rematch followed between twelve people teaming up against Blondie, but the result was again a win for the machine. 390
NOTES TO PAGE 3 0 3
7. The propensity to hype possibilities must be human nature. Back in
1828, Charles Babbage, Lucasian Professor of Mathematics at Cambridge, suggested that his "Analytical Engine" might serve as a thinking machine. The analytical engine was in essence a universal digital computer, capable of the calculations of modern-day computers, albeit without electronics. Consider, then, the following quote: It is desirable to guard against the possibility of exaggerated ideas that might arise as to the powers of the [machine]. In considering any new subject, there is frequently a tendency, first to overrate what we find to be already interesting or remarkable; and, secondly, by a sort of natural reaction, to undervalue the true state of the case, when we do discover that our notions have surpassed those that were really tenable. The [machine] has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis; but is has no power of anticipating any analytical relations or truths. Its province is to assist us in making available what we are already acquainted with. These words of caution seem well applied to what amounted to the hype of artificial intelligence in the I95OS and I96os. Remarkably, these words weren't written in the last few decades but rather by Augusta Ada Byron (Lady Lovelace) in 1842 in describing Babbage's Analytical Engine.
Appendix: Your Honor, I Object! i. Nils Aall BarriceUi, "Numerical Testing of Evolution Theories. Part II: Preliminary Tests of Performance. Symbiosis and Terrestrial life," Acta Bio-
theoretica I6, no. 3-4 (I963): 99-I26. 2. Jon Reed, RobertToombs, and Nils Aall Barricelli, "Simulation of Biological Evolution and Machine Learning. I. Selection and Self-reproducing NOTES TO PAGES 3 0 3 - - 3 1 3
:591
Numeric Patterns by Data Processing Machines, Effects of'Hereditary Control, Mutation Type and Crossing,"_]. Theoret. Biol. 17 (I 967): 319-42. 3. Lawrence J. Fogel and George H. Burgin, "Competitive Goal-seeking Through Evolutionary Programming," Final Report, Contract AF 19(628)5927, Air Force Cambridge Research Laboratories, 1969; and Robert Axelrod, The Evolution of Cooperation (New York: Basic Books, 1984). 4. See Jordan B. Pollack and Alan D. Blair, "Co-evolution in the Successful Learning of a Backgammon Strategy," Machine Learning 32, no. 3 (I 998): 225-4o; and David E. Moriarty and Risto Mikkulainen, "Discovering Complex Othello Strategies Through Evolutionary Neural Networks," Connection Science 7 (1995): 195-2o9. 5. In late April 2ooi, a player rated about 2,045 was ranked between 8oo and 9o0; however, the total number of registered players was more than 18o,ooo, up from I2O,OOo when Blondie finished her one hundred and sixty-fifth game. Thus Blondie's percentile remained almost unchanged, from 99.61 to 99.56. 6. Paul Armer, '~Attitudes Toward Intelligent Machines," in Computers and Thought, Edward A. Feigenbaum and Julian Feldman, eds. (New York: McGraw-Hill, I963), pp. 389-4o5.
392
NOTES TO PAGES 3 1 3 - - 3 1 8
Index
ACF (American Checkers Federation), I24--25 Adaptation: complexities of, 58-59, 337-38nlo; examples of, in nature, 6o-62, 62-63 fig., 64, 65 fig., 66 fig., 67 fig., 154-55, 357n3; intelligence defined as, 12-14, 324n5, 325n7, 359n6; to match intended function, 44-45, 334n I ; as measurement of fitness, 56-56, 334-37n8; repeated pattern of, 14-16 Adaptive landscapes, 58-60, 334-37n8, 337-38nlo AI. See Artificial intelligence Alpha-beta pruning: example of, 3677onI, 368 fig., 369 fig.; iterative deepening with, 261-62; minimax efficiency with, 213-15; Samuel on, 3o0 American Checkers Federation (ACF), I24--25 Analytical Engine (Babbage's machine), 39In7 Anand, Viswanathan, 328ni 3 Anantharaman, Thomas, 27
Armar, J. Wirt, 325n7 Armer, Paul, 317-I 8 Artificial intelligence (AI): achievement issue of, 3 I7-I8; combination human-evolved model of, 302- 3 , 389-9on5, 39on6; credit assignment problem in, I I I; Deep Blue's failure as, 34, 92, 93, 33o-3In27; evolutionary approach to, 76, 79-85, 3oi-2,339-4on6, 389n4; gameplaying focus of, 9, 2o, 324n4; Hollywood versions of, 3-4, 35-36, 3738; human expertise model of, xiv, 4-5, 9-I2, I6-I7, 34, 332n28, 34142n8; neural network model of, 3944, 41-42 fig., 43 fig., 332-29n3; propensity to hype, I46, 355-56ni6, 39In7; Turing's approach to, 5-7, 8 fig., 32Inn2,3,389n3
Artificial Intelligence Through Simulated Evolution (Fogel, Owens, and Walsh), 339-4on6, 389ni Artificial neural networks. See Neural networks 393.
AT&T Bell Laboratories, 22 Axelrod, Robert, 313 Babbage, Charles, 391 n7 B~ick, Thomas, 344n6 Barricelli, Nils, 313, 339n5 Basic Checkers (Fortman), I I9, 346n5 Behavioral traits: adaptive measurement of, 334-37n8; genetic influence on, 59, 6o-6I, 6o-6I fig., 61-62 fig., 62-63 fig., 64, 154-55, 337nn9,IO; as suboptimal, 338ni I Bell curve, 36on4; mutating weights with, 176-78 , 178 fig. Belle (chess program), 22, 25 Benjamin, Joel, 328ni 5 Berliner, Hans, 27 Bierman, Alan, 147 Blitz98 (checkers program), 268-69, 3o6-3o7, 3o8 Blondie24 (evolutionary neural network): achievement objection to, 3 I7-I8; with alpha-beta pruning, 2I 3-15; as Anaconda, 286; checkers and Darwin engines of, 178--80; computer used for, I91-92; at Congress on Evolutionary Computation, 39on6; criticisms of, 3o5-3 I9; endgame weakness of, 196-97, I98-99 fig.; evaluation function of, 165-67, 168 fig., 169-71 , 169 fig., 172-73 fig.; evaluation function of, with storing, 262-65,263 fig.; evolutionary approach to, xiii-xv, 318-19; 93 9 4
INDEX
ghost-king bug of, 181; human expertise coupled with, 389-95n5; human interaction with, 276, 277 fig., 278; human surrogates for, 188, 361 n2; improved generations of, 174-75,183-85,193-94, 2o7-8, 231; input/output neurons of, 16365; iterative deepening routine of, 26o-62; kings' value in, 164-65; mobility strategy of, 286; neural network design of, 166-67, 168 fig., 3o9; number of weights in, I69, 359ni; as Obi_WanTheJedi, 198 , 275; offspring neural networks of, 176-78, 178 fig.; papers/lectures on, 257 , 38Ini; persona of, xiii, 274-76, 278; photo-optics application of, 38In2; piece-count feature of, 182-83, 31o, 36on7; random variation in, I71-72, 175-76; restarting, 185-88; Samuel-Newell challenge to, 151-52, 157, 159, 208; search algorithm of, 180-8 I, 19293, 24o, 382n5; sigmoid function of, 167, 169 fig.; with spatial handicap, 209, 21O-ll fig., 211, 233; with spatial inputs, 234-36, 236-37 fig., 237-39,241-42, 3o9; timed moves of, 310-21; win/loss values for, 17 I, 31 o, 314, 359n2. See also Checkers games of Blondie24; Ratings of Blondie24; Zone.com games Botvinnik, Mikhail, 326n8 Boughton, Mike, 85-86, 87, 89
Box, George, 339n5 The Brain: The Last Frontier (Restak), 355-56ni6 Breast cancer detection, 86-88, IO6, lO7, 342-43ni Bremermann, Hans, 78, 339n5 Bridge (checkers position), 132, 133 Byron, Augusta Ada, 391 n7 Cake++ (checkers program), 389-9on5 Campbell, Murray, 328ni 5,332n28 Cancer Letters (journal), 89 Carl Buddig website, 281-82 Carnegie Mellon University, 27 Checkers: board position combinations in, 99, 114; diagram of board, 98 fig.; endgame positions in, 121 ; opening moves in, 346-47n6, 347n7, 388n3; rules of, 97-98,344ni; thought experiment with, I o7-1o, 312-13; tie-breaking routine in, 34748ni o; time limits in, 192, 31 o - I I; as unsolved game, I I4; websites for playing, 126-27 , 187-88, 189 fig., 19 ~ fig.
Checkers engine (Blondie24), I 7 8 80, 181
Checkers games of Blondie24: at Carl Buddig's website, 281-82; against Chinook, 286-87,288 fig., 28990, 290 fig., 291 fig., 292-95,293 fig., 294 fig., 295 fig., 296 fig., 29798, 317; against piece-count program, 305-302; at Playsite.com, 282-
83; at Zone.corn (see Zone.com games) Checkers 3.0 program, 347-48nlo Checkers programs: without credit assignment, I I I - I 2; with endgame database, 121-23; without endgame database, I96-97, I98-99 fig.; enumeration feature of, I o I; evolving neural networks for, I o 5-9, I I o, 345n5; with human expertise features, 99-1Ol, lO5, 115,119-23, 345nn3,4, 346ni, 347n8; minimax principle of, lO1-1o4, lO2-1o3 fig.; with reinforcement learning, 13638,148-49; with term replacement, 134-36, 136 fig.. See also Blondie24; Chinook; Samuel's program Chellapilla, Kumar, xiv; background of, 95-96, 344n7; checkers engine prototype of, 179-78; game-playing role of, 36In2; novice player status of, 184-85,346n5; search algorithm of, 180--8 I Chess matches: with Deep Blue, 20, 29-30, 32-33, 91-92,343nn3,4; with HAL, 3; time limits in, 26, 33, 327ni I, 33on24 Chess 4.7 program, 22, 25 Chess programs: early designers of, 21-22, 326n8; evaluation function of, 3o-32; evolutionary approach to, 94-95, 96; optimized with hardware, 25-30, 28 fig., 3 2 - 3 3 , 3 2 8 3on23; rating system for, 22-25, 23 INDEX
395
Chess programs (continued) fig., 24 table, 26 fig., 327nnio,I2. See also Deep Blue Chinook: Blondie24 against, 2o9-I I, 255-56, 286-87, 288 fig., 289-90, 290 fig., 29I fig., 292-95,293 fig., 294 fig., 295 fig., 296 fig., 297-98, 3 I7; endgame database of, 120--21, 389-9on5; evaluation function of, I 15-I 9; evaluation function of, modified, I23-25,348ni4; at novice setting, 317, 388n4; obscurity of, I29; opening game database of, I I 9 - 2 I , 346-47n6, 347nn7,8; ratings of, I25, I26, 316, 348nii; Schaeffer's goals for, I I4-I 5,346ni; spatial network against, 255--56; Tinsley's matches with, I24-26, 347-48nio, 348nni 3, I4; in U.S. Championship matches, I23,34748ni o, 349ni 9; website for playing against, 126-27, 349n2o ChipTest (chess program), 27 Clarke, Arthur C., xv, 3-4 Colossus (checkers program), I25, 347-48nio Computer Olympiad (I989), I23 Computers and Thought (Feigenbaum and Feldman), I4I, I42, 35on9, 351-53nio, 354-55ni4, 356n22 "Computing Machinery and Intelligence" (Turing), 5-6 Congress on Evolutionary Computation, 259, 381-82n4, 39on6 396
INDEX
Conrad, Michael, 339n5 Cook (checkers move), I I9 Credit assignment problem, I 11-12 Crocker, S., zz Darwin engine (Blondie24), 179 Decision-making behavior, 12-14, 324nn5,6 Decision Science, Inc., 339-4on6 Deep Blue (chess program): as AI failure, IO, 92, 93, 33o-3In27; Blondie24 versus, 318-19; chess position options of, 32-33,3283on23, 33 on26; evaluation function of, 3o-31,118,328ni5; Kasparov's defeat by, 2o, 29-3o, 91-92, 343nn3, 4; optimized with hardware, 27, 3o; supervised learning by, 31-32 Deep Thought (chess program), 27, 327-28ni2 Dodgen, Gil, 347-48nlo Dog hole (checkers position), I 16, 117 fig., 133-34 Duke University, 147 Eastlake, D., 22 EDA (European Draughts Association), 124-25 Efe, Alper, 326n8 Emergence (Holland), 355-56ni 6 Endgame (checkers): of combination human-evolved program, 389-95n5; with database versus evaluation function, I21-23; without expert data-
base, I96-97, I98-99 fig.; number of positions in, 120--21 ; o f piece-count programs, 3o6; of 7o9o-Nealey match, 145-46, 354-55ni4, 355ni5; with spatial inputs, 253, 254-56 Enumeration: in checkers programs, I O I ; evolutionary computation versus, 72-74; for traveling salesman problem, 69-72, 71 fig., 73 fig., 338nni,2 EUROGEN99 (Finland), 257-58 European Conference on Genetic Programming (Sweden), 257 European Draughts Association (EDA), 124--25 Evaluation function: with alpha-beta pruning, 213-I 5,261-62, 3677onI, 368 fig., 369 fig.; of Chinook, I I 5 - I 9 ; of Chinook, modified, I2324, 348ni4; of Deep Blue, 29, 3o31, 118,328ni5; endgame database versus, 121--22; o f evolutionary network, 165-67, 168 fig., I69 fig., 16971 , 172-73 fig.; of evolutionary network, with storing, 262-65,263 fig.; of Samuel's program, 13o-33, 3oo; Samuel's two versions of, 136-37; of Shannon's automatic method, 2 I; with term replacement, 134-35, 136 fig.; with two data sets, 88-89, 343n2 Evolution: as biased random process, 175-76; examples of, in nature, 5962, 62-63 fig., 64, 65 fig., 66 fig., 67 fig.; versus hill climbing, 57-58;
pattern-making feature of, I4-I6; survival goal of, I3, 56-57, 32425n6, 334-37n8. See also Adaptation; Pattern recognition; Random variation; Selection Evolutionary computation: author's first exposure to, 34o-4In7; in breast cancer detection, 87-89; in checkers, lO5-7, IiO, 345n5; in chess, 94-95, 96; intelligence/learning goal of, 3Ol-2, 389n4; in neural networks, 57, 59-6o, 66-68, 74-78; for pattern recognition, lO6-9, 153-54, 155, 157, 159; pioneers in, 78-79, 313, 339nn5,6, 38In3; SamuelNewell challenge to, 151-52, 207-8; in tic-tac-toe,8o-8I fig., 82-83; for traveling salesman problem, 72-74, 75 fig., 76 fig., 77 fig., 339n4 Evolutionary neural network. See Blondie24 Evolutionary Programming Society, 381-82n4 The Evolution of Cooperation (Axelrod), 313 Evolving Artificial Intelligence (Fogel), 80 Expert checkers players: Blondie24 against, 221--22, 222--23 fig., 224, 2 6 5 - - 7 I , 266--67 fig., 270 fig., 278-79; and list of moves, 372-74n3, 382-84n6, 384-86n7, 386-87ni Expert systems: AI models using, xiv, 4-5, 9-12, 34, 332n28; of checkers programs, 99-1oo, 345n3,4; of Chi INDEX
397
Expert systems (continued) nook, II5, II9-23,346ni; combined with evolved network, 3899on5; problems with, 16-17, 341-42n8 Fabr6, Jean Henri, 357n3 Feigenbaum, Edward, 141-43 Feldman, Julian, 141-43 Feng-hsiung Hsu, 27 Fermat's Last Theorem, 328-3on23 FIDE, 328ni 3 First position (checkers), I21--22, 123 fig. Fitness: adaptive measurement of, 56-57, 334-37n8; definitions of, 334-37n8 Flower fly, 62, 64, 67 fig., 153 Fogel, David B.: and Mike Boughton, 85-86; breast cancer detection research of, 87-89; on Deep Blue's win, 91-92, 93,343nn3,4; father's influence on, 79, 34o-4In7; first "intelligent" program of, 79-8o; game-playing role of, 188, 36In2; on Maui road trip, 9o-9I; novice player status of, 185,346n5; technical publications of, 93, 343-44n5, 344n6; thought experiment of, 345n6; tic-tac-toe machine of, 8o81 fig., 8 2 - 8 3 Fogel, Lawrence, 79, 339nn5,6, 34o4In7, 38In3, 389ni Fortman, Richard, I 19,147, 2o8,346n5 398
INDEX
Fraser, Alex, 339n5 Friedberg, Richard, 339n5 Friedman, George, 339n5 Game-playing programs: AI's focus on, 9, 20, 324n4; credit assignment problem of, I I I-I2; evolutionary approaches to, I75-76, 312-I3, 359-6on3; of skill versus luck, I76, 359-6on3. See also Blondie24; Chinook; Deep Blue; Samuel's program Gardner, Sherman, 388n3 GAYP (Go As You Please) checkers, 388n3 Gedanken experiment. See Thought experiment Genes: adaptive changes in, 56-57, 334-37n8; and behavioral traits, 59, 6o-62, 6o-6I fig., 62-63 fig., 64, 65 fig., 154-55, 337nn9,I o; survival goal of, 13,324 -25n6 Gibson, John, 349n2o Grandjean, Burke, I47 Greenblatt, R., 22 Guinness Book of World Records, 20, 34 HAL: Deep Blue versus, 33o-3In27; prospects for creating, xv-xvi, 3-4, 3o2-3 Hallett, Richard, 347-48nlo, 349ni 5 Handbook of Evolutionary Computation, 344n6 Hanson, K. D., 147
Hardware. See Processing speeds Hash table, 264-59 Hellman, Walter, 147 Hidden neurons, 4I-4z fig., 42-43, 43 fig., 332-34n3; of tic-tac-toe neural network, 8o-8I fig. Hill climbing technique, 5 I-53, 334n5; evolution versus, 57-58 Hitech (chess program), 27 Holland, John, 339n5, 355-56ni6 Honeybees, communication system of, 156-57, 157 fig., 158 fig., 35758n4, 358-59n5 Hunting wasp (Sphex flavipennis), 15453,357n3 IBM, 130; on Deep Blue, 34, 33031 n27; and Samuel's program, I37, I42, I47, 349-5on4, 35I fig., 353-52nii IEE (Institution of Electrical Engineers), 381-82n4 IEEE Spectrum (journal), 72 IEEE Transactions on Evolutionary Computation (journal), 93, 343-44n5 Input neurons, 41-42 fig., 42-43, 43 fig., 332-34n3; as checkerboard positions, I64, I67, I68 fig., I7O7I, I72-73 fig.; as checkerboard subsections, z 34-35, z 36-37 fig., 237-39; of tic-tac-toe neural network, 8o-81 fig. Institution of Electrical Engineers (IEE), 38 I-82n4
Intelligence: adaptation model of, 1315,325n7, 359n6; decision-making behavior of, I2-I 3,324n5; machine's illusion of, I o-I 2; pattern-making behavior of, I4-I6; species' demonstration of, 154-55, 357n3; unaddressed in Turing Test, 5-7, 321 n2, 389n3. See also Artificial intelligence Iterative deepening routine, 26o-62 Jensen, Eric, I47 Kasparov, Garry, IO; chess position options of, 33; chess rating of, 29; Deep Blue's defeat of, 20, 29-30, 32-33, 91-92, 343nn3,4; Kramnik's defeat of, 328ni 3; unorthodox style of, 328-24n23 King (checkers): unimpeded path to, Ioo; value of, II6, I64-65 King, Ron, Iz6, 347-48nio King safety (chess), defined, 3 I Knowledge-based programs. See Expert systems Kramnik, Vladimir, 328 n 13 Kubrick, Stanley, 3 Lafferty, Don, I24, I26, 347-48nio, 348nii Leafy sea dragon, 62, 65 fig. Leahy, Patrick, 342-43ni Levy, David, 22, 326n8 Lipson, Hod, 389n4 Lowder, Elbert, 347-48nio INDEX
399
Machack VI (chess program), 22, 25 Mammography, 86-89, 106, 107 Master checkers players: Blondie24 against, 224-27, 225-26 fig., 24648,248-49 fig., 281-82; and list of moves, 374-76n4, 379-8on3 Material (chess), defined, 3o Mathematical filters, 95,344n7 McCulloch, Warren, 38 McCulloch-Pitts neurons, 38-39, 39 fig., 53-54 Michalewicz, Zbigniew, 344n6 Microsoft Corporation, gaming website of, 1 8 7 - 8 8 , 189 fig., I 9 0 fig. Mind (journal), 5 Minimax principle: with alpha-beta pruning, 213-I 5,261-62, 3677onI, 368 fig., 369 fig.; in checkers games, IO1--4, IO2-- 3 fig.; searches using, 139, 17 I, 181 ; o f Shannon's automatic method, 2 I Minsky, Marvin, I I I Mobility (checkers), defined, 99 Moravec, Hans, 327-28ni2 Natural selection. See Selection Nealey, Robert W.: blindness of, 354ni 3; in checkers matches, 14247, 144 fig., I45 fig., 35on9, 35 I 53nI o, 354ni2; checkers ranking of, 142 , 35on8; on the endgame, 145-46, 354-55ni4, 355ni5; Samuel's choice of, 141-42 Neumann, John von, 313 400
INDEX
Neural networks: for breast cancer detection, 86-88; coupled function of, 49-5 I, 5o fig.; decoupled function versus, 47, 48 fig., 49; described operation of, 4o-44, 41-42 fig., 43 fig., 332-34n3; evaluation protocol for, 343n2; evolutionary approach to, 57, 59-60, 66-68, 76, 79, I74-75; evolved for checkers, I o5-7, I I o, 345n5; evolved for chess, 94-95, 96; evolved for tic-tac-toe, 8o-81 fig., 82-83; hill climbing analogy of, 5 I-53, 334n5; Hollywood versions of, 37-38; number of neurons in, 166; pattern recognition in, I53-54, 155, 159; Rosenblatt's model of, 394o; sigmoid functions of, 54, 55 fig., 56, 334n7; survival mechanism of, 57; weighted connections of, 44-47 Neurons: artificial network of, 394o, 41-42 fig., 42-44, 43 fig., 33234n3; McCulloch-Pitts model of, 38-39, 39 fig., 53-54; number of, and computation function, I66; weighted connections between, 44-47. See also Neural networks Newborne, Monty, 326n8 Newell, Allen, I I 0 - - I I, 151--52, 208, 303 "No free lunch theorem," 339n4 Northwestern University, 22 Oldbury, Derek, I26, 147, 349ni 5 One Jump Ahead: Challenging Human
Supremacy in Checkers (Schaeffer), IIS, I42,346ni, 347-48nlo, 349ni7 On the Organization of Intellect (Fogel), 339-4on6 Opening moves (checkers): Chinook's database of, I I9-2o, 347nn7,8; randon'fly chosen, 346-47n6; website list of, 388n3 Orderings variable, and ratings, 21820, 220 fig., 247-49, 25o fig., 251 fig., 372n2 Output neurons, 41-42 fig., 42-43, 43 fig., 332-34n3; of checkers neural network, 165,167, 168 fig.; of tictac-toe neural network, 8o-81 fig. Owens, A1, 339-4on6 Paaslow (checkers program), I47 Pandolfini, 328-3on23 Pattern recognition: as adaptive behavior, 14-16; evolutionary algorithm for, 106--7, 153--54 , 155 , 157, 159; in nature, 152--57, 157 fig., 158 fig., 356-57ni, 357nn3,4, 358-59n5, 359n6 Photo-optics, 381 n2 Piece-count feature: Blondie24's evolution beyond, 3o5-3o8, 3 IO; of Chinook, I 15-16; decision to include, 182--83, 36on7 Pitts, Walter, 38. See also McCullochPitts neurons Playsite.com, 282-83
Ply (term), 139, 349n2 Pollack, Jordan, 389n4 Position (chess), defined, 3 I Pravda, 326n8 Processing speeds: and chess program performance, 25-3o, 28 fig., 32728ni2; and evaluation function, 3233,328-3on23, 33on24; running Chinook, 125,348ni4 Random variation: in alternative genes, 56-58,334-37n8; as biased process, 175-75, 359-6on3; in evolutionary algorithms, 73-74, 77 fig., 78 fig., 83, IIO, I71-72; learning system's process of, 14-16; and pattern recognition, 152-53 Ratings: by category of chess player, 24 table; chess programs' gain in, 23 fig., 27-28, 28 fig., 327-28ni2; formulas for, in chess, 22-25, 26 fig., 327nlo; with orderings variable, 218-2o, 22o fig., 247-49, 25o fig., 251 fig., 372n2; and settling period, 194, 36In3; Zone.com's system of, 187, 317, 36Ini, 392n5 Ratings of Blondiez4: author's explanation of, 31o, 315-17, 366n6, 392n5; at expert level, 271-72,283, 284 fig., 285 fig., 286, 317, 38788n2, 388n4; and opponent's rating, by game number, 2o4, 2o4 fig., 229 fig., 252 fig.; with orderings variable, 218--20, 220 fig., 247-49, 25o fig., INDEX
401
Ratings of Blondie24 (continued) 251 fig.; with spatial inputs, 24142, 243 fig., 245 fig., 245-50, 250 fig., 252 fig., 253, 254-55 fig-; without spatial inputs, 228 fig., 228-29, 229 fig., 23 I; and wins/draws/losses, 206 fig., 230 fig., 231,249-44, 253, 254-55 fig., 283, 285 fig., 286, 377n5 Rechenberg, Ingo, 339n5 Reinforcement learning: evolutionary approach versus, 313-14; Samuel's form of, 129, 136-38, 148-49 Reproduction, 324-25n6 Restak, Richard, 355-56nI 6 Rosenblatt, Frank, 39-4o Rote learning (Samuel's program), 139-4o
Runaway checker feature, I 16 Samuel, Arthur, I29, 15 I; on his program, 137-41; learning approach of, 148-49 Samuel's program: appraisals of, 129, 137-41,146, 3oo, 355-56ni6; defeats of, 147; evaluation features of, 13o-34; and IBM stock prices, 137, 349-5on4, 351 fig., 353-54nli; Nealey's matches with, 142-47, 144 fig., 145 fig., 35on9, 354nni2,I4, 355nI 5; reinforcement learning in, 136-38, 148-49; rote learning procedure of, 139-4o; search level of, 139, 1 8 0 - - 8 1 , 36on6; term replace402
INDEX
ment in, 134-36, 136 fig.; total games played by, 356n22 Schaeffer, Jonathan, I 13, 351-53nI o; Chinook goals of, 114-15,346ni; on Chinook's novice setting, 317, 388n4; evaluation modification by, 123-25,348ni4; opening game strategy of, I I9-2o, 346-47n6; on 7o9o-Nealey match, 144, 147, 355ni 5; weight revisions by, 117-19. See also Chinook Schwefel, Hans-Paul, 339n5, 38In3 Scientific American, 3 I Search algorithm: with alpha-beta pruning, 213-15, 3oo; with iterative deepening, 260-62; and level of search, 139, 18o-81,192-93, 24o, 36on6; with storing feature, 262-65, 263 fig. Selection: as biased process, 176; in evolutionary algorithms, 74, 77 fig., 78 fig., 83, 1IO, 171-72; examples of, in nature, 6o-62, 62-63 fig., 64, 65 fig., 66 fig., 67 fig.; versus hill climbing, 57-58; learning system's process of, 14-16; and pattern recognition, 152-53; and survival, 13, 5657, 324-25n6 7o9o-Nealey checker game, 35on6; choice of Nealey for, 141-42, 35on8; moves in, 143-45, 144 fig., I45 fig., 351-53nio, 354ni2; Nealey's assessment of, 145-46, 354-55ni4, 355ni5
Shannon, Claude, 2I, 325n5, 326n8 Sigmoid function: in evolutionary neural network, 167, 169 fig.; smoothing effect of, 54, 56, 334n7; versus threshold function, 55 fig. Silicon Graphics, 125 Simon, Herbert, I9-2o Sloan, Sam, 326n8 Society of Photo-Optics and Instrumentation Engineers (SPIE), 257 Spatial inputs: to checkerboard subsections, 234-35,236-37 fig., 23739; endgame with, 253,254-56; mean rating with, 248 , 25o fig.; mean rating without, 228 fig., 22829; one-dimensional vector versus, 2o9, 2IO-II fig., 2II, 233 , 3o9; ratings with, 241-42 , 243 fig., 245 fig., 245-50, 250 fig., 252 fig., 253, 254-55 fig.; ratings without, 22829,228 fig., 229 fig., 23 I Sphexflavipennis (hunting wasp), 154-55, 357n3 SPIE (Society of Photo-Optics and Instrumentation Engineers), 257 Supervised learning (Deep Blue), 3 I-32 Survival: of alternative genetic combinations, 56-57, 334-37n8; decision-making's goal of, 13-14, 324-26n6; in evolutionary algorithm, 171-72; in nature, through adaptation, 6o-62, 62-63 fig., 64, 65 fig., 66 fig., 67 fig., 154-55,338ni I, 357n3
Tempo (chess), defined, 3 I
Terminator 2:Judgment Day (film), 37 Term replacement, 134-36, 136 fig. Thinking. See Intelligence Thought experiment: with checkers, 107--10, 3 I I--I 3; with tic-tac-toe, 345n6; of Turing Test, 6-7, 32Inn2,3 Threshold function: assigning values to, 4o, 42-47; of McCulloch-Pitts neuron, 38-39, 39 fig., 53-54; sigmoid effects on, 54, 55 fig., 56; of tic-tac-toe neural network, 80-81 fig. Tic-tac-toe: evolutionary algorithm for,8o-8I fig., 82-83; thought experiment with, 345n6 Time limits: in checkers, I92, 3 I OII; in chess, 26, 33,327 nII, 33on24 Tinsley, Marion, I24-26, 347-48nlo, 348ni3, 349nni5,I7 Trapped king (checkers position), I 16 Traveling salesman problem: enumeration approach to, 69-72, 71 fig., 73 fig., 338nn1,2; evolutionary algorithm for, 72-74, 75 fig., 76 fig., 77 fig., 339n4 Treloar, Norm, I 17, 123-24 Triangle of oreo (checkers position), I32, I33 fig., I43, 354ni2 Truscott, Tom, 147 Turing, Alan, 5, I9, 22 Turing Test, I9; contradictory methods of, 321-24n3; intelligence issue of, 5-7, ii, 115, 32In2, 389n3; objective of, 6, 8 fig. INDEX
403
2 o o I : A Space Odyssey (film), xv, 3,
33o-3 In27 U.S. Army Medical Research and Materiel Command (Maryland), 342-43ni U.S. National Checkers Championship: in 199o, I23,347-48nlo, 349ni 9; in 1992, 347-48ni o U S A Today, 30
White doctor (checkers move), 347n7 Wiles, Andrew, 328-3on23 Wright, Sewall, 334-37n8
Von Frisch, Karl,
Yellow jacket, 62, 66 fig., 152-53
I 56
Walking stick, 62-63 fig. Walsh, Jack, 339-4on6 War Games (film), 35-36, 332n29 Wasson, Gene, 86, 87-88, 89, 90, 94, IO6
Watson, Thomas, 137 Websites: for checkers openings, 388n3; for checkers tournaments, 28I, 282; Chinook, I26-27; Microsoft checkers page, I87-88, I89 fig., 190 fig.. See also Zone.corn games Weights: assigning values to, 4o, 4247, 52-53; for checkerboard subsections, 235,237; in Chinook, I I 7 18; coupled effects of, 49-5 I, 50 fig.; Deep Blue's automatic tuning of, 3o-3I, I I8; evolutionary adjustment of, I64-67, I69-72, 359ni; in mammography neural network, 89; mathematical design of, I oo-I o I, 345n3; mutation of, with bell curve,
404
176--78, 178 fig.; number of, in evolutionary program, I69, 359ni; in Samuel's program, 134-38, 136 fig.; sigmoid effects on, 54, 55 fig., 56; in tic-tac-toe neural network,
INDEX
80--81 fig.
Zone.com games, I93-94, 216-17; against Class A player, 199-200, 201 fig., 361-64n4; against Class B player, 2o0, 202-3,202-3 fig., 3 6 4 66n5; against experts, 221-22,22223 fig-, 224, 265-7I, 2 6 6 - 6 7 fig., 270 fig., 278-79, 280 fig.; against experts, and list of moves, 37276n3, 382-84n6, 384-86n7, 38687ni; game room site for, 188, 189 fig., 19o fig.; human interaction in, I94-95, I97-98,273,274 fig., 275-76, 277 fig., 278; against masters, 224-27, 225-26 fig., 246-47,248-49 fig.; against masters, and list of moves, 374-76n4, 3798on3; ratings system for, I87, 317, 392n5; with spatial inputs, 241-42, 243 fig., 244; with spatial inputs, and list of moves, 377-78n2; time limits for, 192
About
the A u t h o r
David B. Fogel is the CEO of Natural Selection, Inc., a company that addresses complex, real-world problems in the areas of industry, medicine, and defense by applying the techniques of evolutionary computation, neural networks, fuzzy systems, knowledge-based systems, and stochastic processes, among other technologies. Dr. Fogel was the founding president of the Evolutionary Programming Society and was elected a Fellow of the IEEE in 1999-He is the founding editor-inchief of the IEEE Transactions on Evolutinary Computation and serves on the editorial boards of several other journals, including BioSystems,
Fuzzy Sets and Systems, and Journal of Scheduling.
Want
to C o m p e t e
with
Blondie24
?
David Fogel and the other scientists at Digenetics, Inc., have created a computer game in which you can pit your playing skills against the evolutionary program described in this book. For more information, contact Digenetics and www.digenetics.com.
Want to Know More? Explore artificial intelligence and evolutionary computation further with these titles from Morgan Kaufmann Publishers. For more information see our website: www.mkp.com.
Artificial Intelligence:A New Synthesis, by Nils Nilsson Genetic Programming, An Introduction, by Wolfgang Banzhaf, Peter Nordin, Robert E. Keller, and Frank D. Francone
Evolutionary Design by Computer, edited by Peter J. Bentley Swarm Intelligence, by James Kennedy and Russell C. Eberhart Creative Evolutionary Systems, edited by Peter J. Bentley and David W. Corne
Illustrating Evolutionary Computation with Mathematica, by Christian Jacob
Foundations of Genetic Algorithms Volumes 1-6